InMemoryDatasets.jl byrow (sum) on a column containing vectors of numbers

I don't explain the reason for the following differences

julia> modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Array…?
─────┼────────────────────────────────────────────
   1 │     true      true     false  [1, 1, 0]
   2 │     true     false      true  [1, 0, 1]
   3 │     true      true      true  [1, 1, 1]
   4 │    false      true     false  [0, 1, 0]
   5 │    false     false     false  [0, 0, 0]
   6 │     true     false     false  [1, 0, 0]

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(x->sum(x)))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Int64?
─────┼────────────────────────────────────────────
   1 │     true      true     false             2
   2 │     true     false      true             2
   3 │     true      true      true             3
   4 │    false      true     false             1
   5 │    false     false     false             0
   6 │     true     false     false             1

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>x->sum.(x))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Int64?
─────┼────────────────────────────────────────────
   1 │     true      true     false             2
   2 │     true     false      true             2
   3 │     true      true      true             3
   4 │    false      true     false             1
   5 │    false     false     false             0
   6 │     true     false     false             1

julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(sum))
6×4 Dataset
 Row │ a_lim=>a  b_lim=>b  c_lim=>c  row_function 
     │ identity  identity  identity  identity
     │ Bool?     Bool?     Bool?     Array…?
─────┼────────────────────────────────────────────
   1 │     true      true     false  [1, 1, 0]
   2 │     true     false      true  [1, 0, 1]
   3 │     true      true      true  [1, 1, 1]
   4 │    false      true     false  [0, 1, 0]
   5 │    false     false     false  [0, 0, 0]
   6 │     true     false     false  [1, 0, 0]

Apr 03 '22 09:04 sprmnt21

byrow is fine tuned for a set of functions and operations (see its docstring for more details). For generic functions, byrow assumes the passed function accepts the row as a vector of values, and x->x .* 1 falls in this category, see ?byrow(x->x .* 1)

Apr 03 '22 10:04 sl-solution

Thanks.

Here https://docs.juliahub.com/InMemoryDatasets/cS87e/0.6.10/man/byrow/#User-defined-operations here I read that

For user defined functions which return a single value, byrow treats each row as a vector of values, thus the user defined function must accept a vector and returns a single value.

So in the case of the function byrow(x->x .* 1), I understand that the single value is a vector. That is, that the vector, resulting from function, is intended as a single value. This explains the result of applying the sum function. In fact sum ([[1,2,3]]) = [1,2,3]. But so, I can't explain to myself why byrow (x-> sum(x)) seems to work instead.

While the situation of x-> sum.(x) is really different.

Apr 03 '22 12:04 sprmnt21

But so, I can't explain to myself why byrow (x-> sum(x)) seems to work instead.

This is something that I should add to the documentation. "byrow with a generic function and a single column acts like fun.(col)." docstrings fixed in master.

Apr 03 '22 12:04 sl-solution