rigraph bfs might have a bug

From @tintinthong on September 20, 2017 17:54

Hi I do not understand why the output is as such. There is a possible bug in bfs (or I might not understand correctly)

tree<-graph_from_literal(3-+5:6,6-+2,3-+4)
obj<-graph.bfs(tree,root=3, neimode="out",father=TRUE,order=TRUE,unreachable = FALSE)

Observe that the output is obj$order = 6 2 <NA> <NA> <NA> and obj$father= 3 5 6 2 4. But I expect it to be obj$order = 3 5 6 4 2 and obj$father= NA 3 3 3 2

Copied from original issue: igraph/igraph#1032

Sep 21 '17 08:09 ntamas

From @tintinthong on September 21, 2017 4:15

I was looking at the graph.bfs code and the naming issue of father becomes a problem after .Call(C_R_igraph_bfs, graph, root, roots, neimode, unreachable, restricted, as.logical(order), as.logical(rank), as.logical(father), as.logical(pred), as.logical(succ), as.logical(dist), callback, extra, rho) is called in the graph.bfs function.

Sep 21 '17 08:09 ntamas

From @tintinthong on September 21, 2017 5:35

So although I see that the problem may be due to internals. It usually has to do with the naming of vertices. If the vertices are randomly generated by ID's, for example, using the make_tree function (not symbolic names as in graph_from_literal), the bfs function works perfectly.

Sep 21 '17 08:09 ntamas

igraph vertices always have numeric IDs from 1 to the number of vertices, inclusive (at least in R; in C or Python, the vertex IDs are zero-based). When you use graph.from.literal, the numbers that you use end up being used as vertex names, not IDs, and by convention, names are always strings in igraph. Try graph.bfs(tree, root="3").

Sep 21 '17 08:09 ntamas

From @tintinthong on September 21, 2017 8:10

This is what I obtain. I am unsure if the output is the named output or the some index(perhaps index of the vertex sequence , ie V(tree))

$root
[1] 1 #this is obviously an index

$neimode
[1] "out"

$order
+ 5/5 vertices, named, from 6fee7e1:
[1] 3 5 6 4 2 #this should be the names

$rank
NULL

$father
+ 5/5 vertices, named, from 6fee7e1:
[1] 3 5 6 2 4  #this should be a names

$order makes sense but $father does not. , it still makes no sense. The documentation says that The father of each vertex, i.e. the vertex it was discovered from.. Does this mean corresponding to $order or the index of the vertex sequence? Still don't I expect father to be NA 3 3 3 6

Sep 21 '17 08:09 ntamas

I think this is a bug in the printing routine corresponding to obj; the raw content of obj$father before the vertex names are resolved is correct (irrelevant parts omitted):

> dput(obj$father)
structure(c(NA, 1L, 1L, 3L, 1L), [...]

Resolving c(NA, 1, 1, 3, 1) to the corresponding vertex names yields the correct result:

> V(tree)$name[c(NA, 1, 1, 3, 1)]
[1] NA  "3" "3" "6" "3"

As I am not familiar with the internals of the R interface but it's clearly an R-specific issue, I will move this to igraph/rigraph

Sep 21 '17 08:09 ntamas

@gaborcsardi can probably take care of this.

Sep 21 '17 08:09 ntamas

There is a similar bug in dfs(), and the workaround in https://github.com/igraph/rigraph/issues/227 works.

Apr 28 '22 20:04 nacnudus

The current behavior still seems odd to me, in particular the difference between root = 3 and root = "3" .

Can we construct a smaller example?

library(igraph, warn.conflicts = FALSE)
tree <- graph_from_literal(3 -+ 5:6, 6 -+ 2, 3 -+ 4)

obj <- graph.bfs(tree, root = "3", mode = "out", father = TRUE, order = TRUE, unreachable = FALSE)
obj
#> $root
#> [1] 1
#> 
#> $mode
#> [1] "out"
#> 
#> $order
#> + 5/5 vertices, named, from e17c17b:
#> [1] 3 5 6 4 2
#> 
#> $rank
#> NULL
#> 
#> $father
#> + 5/5 vertices, named, from e17c17b:
#> [1] 3 5 6 2 4
#> 
#> $pred
#> NULL
#> 
#> $succ
#> NULL
#> 
#> $dist
#> NULL
#> 
#> $neimode
#> [1] "out"
obj$father
#> + 5/5 vertices, named, from e17c17b:
#> [1] 3 5 6 2 4
dput(obj$father)
#> structure(c(`3` = NA, `5` = 1L, `6` = 1L, `2` = 3L, `4` = 1L), class = "igraph.vs", env = <weak reference>, graph = "e17c17bd-15eb-411b-85d6-d3460710b26d")

obj <- graph.bfs(tree, root = 3, mode = "out", father = TRUE, order = TRUE, unreachable = FALSE)
obj
#> $root
#> [1] 3
#> 
#> $mode
#> [1] "out"
#> 
#> $order
#> + 5/5 vertices, named, from e17c17b:
#> [1] 6    2    <NA> <NA> <NA>
#> 
#> $rank
#> NULL
#> 
#> $father
#> + 5/5 vertices, named, from e17c17b:
#> [1] 3 5 6 2 4
#> 
#> $pred
#> NULL
#> 
#> $succ
#> NULL
#> 
#> $dist
#> NULL
#> 
#> $neimode
#> [1] "out"
obj$father
#> + 5/5 vertices, named, from e17c17b:
#> [1] 3 5 6 2 4
dput(obj$father)
#> structure(c(`3` = NA, `5` = NA, `6` = NA, `2` = 3L, `4` = NA), class = "igraph.vs", env = <weak reference>, graph = "e17c17bd-15eb-411b-85d6-d3460710b26d")

^{Created on 2023-03-31 with reprex v2.0.2}

Mar 31 '23 04:03 krlmlr

I now agree that bfs() and dfs() are correct, but the printing is off. It's unclear why we're using NaN instead of NA or Inf, but that's a different issue.

library(igraph, warn.conflicts = FALSE)
tree <- graph_from_literal(a -+ b -+ c)
obj <- bfs(
  tree,
  root = 2,
  mode = "out",
  unreachable = FALSE,
  order = TRUE,
  rank = TRUE,
  father = TRUE,
  pred = TRUE,
  succ = TRUE,
  dist = TRUE
)
obj
#> $root
#> [1] 2
#> 
#> $mode
#> [1] "out"
#> 
#> $order
#> + 3/3 vertices, named, from ce1de61:
#> [1] b    c    <NA>
#> 
#> $rank
#>   a   b   c 
#> NaN   1   2 
#> 
#> $father
#> + 3/3 vertices, named, from ce1de61:
#> [1] a b c
#> 
#> $pred
#> + 3/3 vertices, named, from ce1de61:
#> [1] a b c
#> 
#> $succ
#> + 3/3 vertices, named, from ce1de61:
#> [1] a b c
#> 
#> $dist
#>   a   b   c 
#> NaN   0   1 
#> 
#> $neimode
#> [1] "out"
V(tree)[obj$father, na_ok = TRUE]
#> + 3/3 vertices, named, from ce1de61:
#> [1] <NA> <NA> b
as.integer(obj$father)
#> [1] NA NA  2

^{Created on 2023-03-31 with reprex v2.0.2}

Mar 31 '23 16:03 krlmlr

Those vectors contain NaN because the C core fills them with NaN and I think we don't make any conversion to NA when we wrap the C vector in an R vector.

Note that this will change in igraph 0.10 where the rank and dist vectors are converted to igraph_vector_int_t so we use -1 to indicate these slots, not NA.

Apr 01 '23 15:04 ntamas

To be precise, in 0.10 we use negative values for non-mapped vertices, and the specific value shows the reason for why the vertex is not mapped.

-1 indicates that the vertex is a root. Note that in a forest there will be more than one root.
-2 indicates that the vertex was not processed, typically because the algorithm did not reach it.

More details: https://github.com/igraph/igraph/issues/1880

I wrote about this here just now: https://github.com/igraph/rigraph/issues/522#issuecomment-1493081825

@krlmlr The following probably won't work, but thinking aloud for a bit. Perhaps you have a good idea.

Parent vectors are essentially a vertex to vertex mapping. What's the best way to represent such a mapping in R? Would named vectors work? I guess not since names are strings, but in igraph some graphs do not have string vertex names?

In C, we use a vector p of size vcount. A different way to interpret this vector is as follows: p[i] == -1 when vertex i is not mapped to anything, but some other vertices are mapped to i. p[i] == -2 when i is not mapped to anything, and no other vertices map to i. Well, this way of thinking about it is a good approximation, but not entirely true: there may be roots that form their own component, i.e. no one maps to them. This is why in https://github.com/igraph/rigraph/issues/522#issuecomment-1493081825 I wrote that the information encoded by -1 vs -2 is not recoverable otherwise. But going ahead with this picture anyway, in R one could use a named vector which at at most of vcount size. -1 would be translated to NA and vertices with -2 would simply be omitted.

Except that I think this is not workable because now the vector can only be indexed by name, not by integer vertex ID ...

Apr 01 '23 19:04 szhorvat

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

Apr 01 '24 00:04 github-actions[bot]