libgraph Poor performance of Graph.delete

I'm getting ~18 ms on vertex deletes in large graphs, is this expected? And if so, is it theoretically possible to do anything about it?

Apr 09 '24 19:04 bartblast

Large graphs in terms of number of vertices, number of edges, or both? Looking at the relevant code I would think just number of nodes should not cause an issue, but number of edges would be more impactful.

  def delete_vertex(
        %__MODULE__{out_edges: oe, in_edges: ie, edges: em, vertex_identifier: vertex_identifier} =
          g,
        v
      ) do
    vs = g.vertices
    ls = g.vertex_labels

    with v_id <- vertex_identifier.(v),
         true <- Map.has_key?(vs, v_id),
         oe <- Map.delete(oe, v_id),
         ie <- Map.delete(ie, v_id),
         vs <- Map.delete(vs, v_id),
         ls <- Map.delete(ls, v_id) do
      oe = for {id, ns} <- oe, do: {id, MapSet.delete(ns, v_id)}, into: %{}
      ie = for {id, ns} <- ie, do: {id, MapSet.delete(ns, v_id)}, into: %{}
      em = for {{id1, id2}, _} = e <- em, v_id != id1 && v_id != id2, do: e, into: %{}
      %__MODULE__{g | vertices: vs, vertex_labels: ls, out_edges: oe, in_edges: ie, edges: em}
    else
      _ -> g
    end
  end

Style-wise and performance-wise (though a tiny bit) I think pattern matching on the vertices and vertex_labels fields the way all other fields are matched in the function head would be better. For each vertex deleted it traverses 3 collections of edges (out edges, in edges, and edges). At some point that gets pricy. Can probably do a single pass over em, then reduce into {oe, ie, em} and you only have to MapSet.delete(ns, v_id) for ids you know were an edge with the deleted vertex. Something like:

  def delete_vertex(
        %__MODULE__{vertices: vs, vertex_labels: ls, out_edges: oe, in_edges: ie, edges: em, vertex_identifier: vertex_identifier} =
          g,
        v
      ) do
   
    with v_id <- vertex_identifier.(v),
         true <- Map.has_key?(vs, v_id),
         oe <- Map.delete(oe, v_id),
         ie <- Map.delete(ie, v_id),
         vs <- Map.delete(vs, v_id),
         ls <- Map.delete(ls, v_id) do
    {em, oe, ie} = 
        em
        |> Enum.reduce({em, oe, ie}, fn 
                                     {{v_id, id2} = k, _}, {e, o, i} -> {Map.delete(e, k), o, Map.update!(i, id2, fn ns -> MapSet.delete(ns, v_id) end)}
                                     {{id1, v_id} = k, _}, {e, o, i} -> {Map.delete(e, k), Map.update!(o, id1, fn ns -> MapSet.delete(ns, v_id) end), i} 
                                      _, acc -> acc 
                       end)

      %__MODULE__{g | vertices: vs, vertex_labels: ls, out_edges: oe, in_edges: ie, edges: em}
    else
      _ -> g
    end
  end

Alternatively change the data structure to just %__MODULE__{edges: %{v_id => out: %MapSet{}, in: %MapSet{}}} then retrieving all edges has to combine the out and in fields but calling all the out and in edges independently should be equivalent to current. Updating it one pass when deleting a vertex becomes simpler using Map.pop(edges, v_id) to get the sets of out and in vertices to visit and remove v_id. Not sure how changing that data structure would complicate the rest of the lib though.

EDIT Just did a dirty test in iex with my fork implementing the above.

g = Graph.new()
vs = 1..1_000_000 |> Enum.to_list()
vs |> Enum.each(fn i -> Graph.add_vertex(g, i) end)
vs |> Enum.zip(Enum.reverse(vs)) |> Enum.each(fn {i,o} -> Graph.add_edge(g, i, o) end)
vs |> Enum.zip(Enum.reverse(vs)) |> Enum.each(fn {i,o} -> Graph.add_edge(g, o, i) end)
:timer.tc(fn -> 1..1_000_000//7 |> Enum.each(fn i -> Graph.delete_vertex(g, i) end) end, :millisecond)

Weirdly it returns almost the exact same time for the current version and for my change. I think my test is flawed. 121-128ms current version, 118-125ms my version over ten runs of the delete call. Not sure if my example is too simplistic, not enough data, or I've just misdiagnosed the problem.

Jan 10 '25 02:01 stevensonmt

So the issue was the test I was using. With the following I get 25-30x speedup using my version versus the previous version:

g = (Graph.new() |> Graph.add_edges(Enum.map(1..1_000_000, fn i -> {i, 1_000_000 - i} end)))
:timer.tc(fn -> g |> Graph.delete_vertex(Enum.random(1..1_000_000)) end, :millisecond)
# {45, #Graph<type: directed, num_vertices: 999896, num_edges: 999998>}
:timer.tc(fn -> g |> Graph.delete_vertex_old(Enum.random(1..1_000_000)) end, :millisecond)
# {1228, #Graph<type: directed, num_vertices: 999896, num_edges: 999998>}

For graphs of 1000 nodes it shows ~8-10x speedup over the current implementation. For graphs of 100 nodes it shows ~3-4x speedup. For graphs of 10 nodes there is almost no significant difference, maybe 1.1-1.2x improvement.

PR submitted #84

Jan 11 '25 16:01 stevensonmt

I think there might be a similar performance issue with replace_vertex for the same reason. Will open new issue.

Jan 11 '25 20:01 stevensonmt

Large graphs in terms of number of vertices, number of edges, or both?

@stevensonmt The performance issue (~18ms for vertex deletes) is occurring on Hologram's call graph when installed in an application - so it's dealing with a large number of nodes (MFAs) and edges (function calls) in total.

Jan 12 '25 20:01 bartblast

@bitwalker Fixing this issue would speedup a lot Hologram incremental compilation and live reloading.

Jan 12 '25 20:01 bartblast

@bartblast I'll try and get to this today, sorry for the delay - unfortunately I have minimal time to maintain some of my open source projects for the time being.

Jan 13 '25 19:01 bitwalker

Thinking about this some more and I think there might be an even more efficient way to do this. Let's say there are 1000 vertices, each vertex has a 100 incident edges and 100 emanating edges. That's 200k edges for the whole graph. The old version was visiting each vertex twice for the incident and emanating edge collection and then all the edges in the graph once. The proposed version only has to visit the collection of all edges. I think there is a way to visit only the subset of edges involving the vertex of interest. Not where I can test this out until this evening, but something like this:

def delete_vertex(
        %__MODULE__{vertices: vs, vertex_labels: ls, out_edges: oe, in_edges: ie, edges: em, vertex_identifier: vertex_identifier} =
          g,
        v
      ) do
   
    with v_id <- vertex_identifier.(v),
         true <- Map.has_key?(vs, v_id),
         {outs, oe} <- Map.pop(oe, v_id),
         {ins, ie} <- Map.pop(ie, v_id),
         vs <- Map.delete(vs, v_id),
         ls <- Map.delete(ls, v_id) do

    {ie, em} = outs 
               |> Enum.reduce({ie, em}, fn id1, {i, e} -> 
                         {Map.update!(i, id1, fn ns -> MapSet.delete(ns, v_id) end), Map.delete(e, {v_id, id1})} 
                end)

    {oe, em} = ins
               |> Enum.reduce({oe, em}, fn id1, {o, e} -> 
                        {Map.update!(o, id1, fn ns -> MapSet.delete(ns, v_id) end), Map.delete(e, {id1, v_id})} 
                 end)

      %__MODULE__{g | vertices: vs, vertex_labels: ls, out_edges: oe, in_edges: ie, edges: em}
    else
      _ -> g
    end
  end

Now instead of visiting 200k edges it visits 200 that are identified as involving v_id.

Does this make sense?

Need to test it. Intuitively I think this might be faster for sparsely connected graphs (each vertex has relatively few edges) where the current proposal could be faster for densely connected graphs (each vertex is connected to nearly every other vertex).

Jan 13 '25 22:01 stevensonmt

So my intuition was correct for a sparsely connected graph the version above is vastly superior. This is for a sparse graph with many vertices, each with one in edge and one out edge.

g = (Graph.new() |> Graph.add_edges(Enum.map(1..1_000_000, fn i -> {i, 1_000_000 - i} end)))
i = Enum.random(1..1_000_000)
test = fn g, i, del_fun -> del_fun.(g, i) end
g1 = test.(g, i, fn g, i -> Graph.delete_vertex(g, i) end)
g2 = test.(g, i, fn g, i -> Graph.delete_vertex_alt(g, i) end)
g3 = test.(g, i, fn g, i -> Graph.delete_vertex_old(g, i) end)
# sanity check to make sure they all do the same thing.
g1 == g2 # true
g1 == g3 # true
g2 == g3 # true
i in Graph.vertices(g) # true
i in Graph.vertices(g1) # false
i in Graph.vertices(g2) # false
i in Graph.vertices(g3) # false
Graph.Edge.new(i, 1_000_000 - i) in Graph.edges(g) # true
Graph.Edge.new(i, 1_000_000 - i) in Graph.edges(g1) # false
Graph.Edge.new(i, 1_000_000 - i) in Graph.edges(g2) # false
Graph.Edge.new(i, 1_000_000 - i) in Graph.edges(g3) # false
# swapping order of vertices as {1_000_000 - i, i} was the same

funs = [fn g, i -> Graph.delete_vertex(g, i) end, fn g, i -> Graph.delete_vertex_alt(g, i) end, fn g, i -> Graph.delete_vertex_old(g, i) end]
funs |> Task.async_stream(fn fun -> :timer.tc(fn -> test.g, i, fun) end, :microsecond) end) |> Enum.to_list()
#[
#  ok: {1148844, #Graph<type: directed, num_vertices: 999896, num_edges: 999998>}, # current proposal in PR
#  ok: {116, #Graph<type: directed, num_vertices: 999896, num_edges: 999998>}, # new approach as above
#  ok: {2397265, #Graph<type: directed, num_vertices: 999896, num_edges: 999998>} # version in current release
#]

Subsequent runs were similar.

Changing the graph to ~2k vertices but ~1.9M edges yields:

vs = Stream.repeatedly(fn -> Enum.random(1..10_000) end) |> Enum.take(1_000)
g = (Graph.new() |> Graph.add_edges(1..1_000 |> Enum.flat_map(fn i -> vs |> Enum.map(fn v -> [{v,i}, {i, v}] end) end) |> List.flatten()))
[g1, g2, g3] = funs |> Enum.map(fn fun -> test.(g, 2513, fun) end)
g == g1
#false
g == g2
#false
g == g3
#false
g1 == g2
#true
g2 == g3
#true
g1 == g3
#true
# won't continue to bore you with evidence that I checked that the edges were actually deleted appropriately but they were
funs |> Task.async_stream(fn fun -> :timer.tc(fn -> test.(g, 2513, fun) end) end) |> Enum.to_list()
#[
#  ok: {1546322, #Graph<type: directed, num_vertices: 1869, num_edges: 1898944>},
#  ok: {10594, #Graph<type: directed, num_vertices: 1869, num_edges: 1898944>},
#  ok: {1578386, #Graph<type: directed, num_vertices: 1869, num_edges: 1898944>}
#]

So still significantly better. You also see the old version and the current PR version converging in a graph with fewer nodes but more edges. This is because the old version is more heavily impacted by the number of nodes with any edges but both are relatively equally affected by the number of total edges in the graph. The new version is affected only by the number of edges connected to the vertex being deleted. A more extreme graph with 1_000_000 vertices with 999_999 edges would likely show an even bigger improvement but my poor 14-year-old i5 w/16Gb slow RAM just can't.

EDIT Caught a bug with the above implementation. When the vertex being deleted has no in edges or out edges there's a problem piping nil into Enum.reduce. Fixing that now and will update PR.

Jan 14 '25 03:01 stevensonmt

@bartblast I'll try and get to this today, sorry for the delay - unfortunately I have minimal time to maintain some of my open source projects for the time being.

@bitwalker Got ya! Totally understandable!

Jan 15 '25 23:01 bartblast

Poor performance of Graph.delete_vertex/2 in large graphs