cugraph [QST]: Why does cuGraph BFS Return All -1 in Predecessors When Launched with All Vertices as Sources?

Why does cuGraph BFS Return All -1 in Predecessors When Launched with All Vertices as Sources?

I'm using cuGraph's BFS algorithm to explore a graph, and I've set up the BFS to start from all vertices in the graph. However, the output in the predecessors array consists entirely of -1s, which suggests that no vertices are reachable, even though the graph is connected. Here's the code snippet for the BFS call:

rmm::device_uvector<int64_t> d_sources(sources.size(), handle.get_stream());
raft::update_device(d_sources.data(), sources.data(), sources.size(), handle.get_stream());


cugraph::bfs(
    handle,
    *GraphX::graphView,
    distances->data(),
    predecessors->data(),
    d_sources.data(),
    d_sources.size(),
    false,
    std::numeric_limits<int64_t>::max(),
    false);

Code of Conduct

[X] I agree to follow cuGraph's Code of Conduct
[X] I have searched the open issues and have found no duplicates for this question

Apr 15 '24 17:04 Ibrahim25-ai

The current BFS implementation is not able to support multiple sources unless they are on different weakly connected components. This line of documentation mentions the limitation.

The algorithm is designed to find a shortest path from a seed vertex to every vertex in the graph. If your graph has multiple disconnected components then you can specify a seed on each component. The fundamental reason for this limitation is that the predecessors array contains a single vertex id which is the predecessor for arriving at the vertex. There's no good way to identify multiple paths in this data structure.

The algorithm does the following (logically):

Seed a frontier with the source vertices that you specify
Mark vertices in the frontier as visited
Traverse from each vertex in the frontier to its neighbors. If the neighbor has not been visited then we mark the neighbor as having come from the frontier vertex we are currently processing, and we add the neighbor to the next frontier. If the neighbor has already been visited then we drop that neighbor and do nothing.

Some observations:

If you send a single vertex or multiple vertices that are all on separate components you will get the desired traversal.
If you send all of the vertices in as seeds then step 2 will mark all of the vertices as visited and BFS will essentially do nothing.
If you send multiple vertices that are on the same component you will get results that are probably not particularly useful. Suppose you send vertex u and vertex v as seeds and they are on the same component. The predecessor array identifies a single vertex that led us to a particular neighbor. With multiple vertices on the same component you end up with a some vertices having been reached in a path from vertex u and other vertices having been reached in a path from vertex v.

What is it you are trying to accomplish by running a BFS from all vertices?

Apr 15 '24 20:04 ChuckHastings

Closing issue. Please post a new question if something else arises here.

May 22 '24 23:05 ChuckHastings