byrow (sum) on a column containing vectors of numbers
I don't explain the reason for the following differences
julia> modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1))
6×4 Dataset
Row │ a_lim=>a b_lim=>b c_lim=>c row_function
│ identity identity identity identity
│ Bool? Bool? Bool? Array…?
─────┼────────────────────────────────────────────
1 │ true true false [1, 1, 0]
2 │ true false true [1, 0, 1]
3 │ true true true [1, 1, 1]
4 │ false true false [0, 1, 0]
5 │ false false false [0, 0, 0]
6 │ true false false [1, 0, 0]
julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(x->sum(x)))
6×4 Dataset
Row │ a_lim=>a b_lim=>b c_lim=>c row_function
│ identity identity identity identity
│ Bool? Bool? Bool? Int64?
─────┼────────────────────────────────────────────
1 │ true true false 2
2 │ true false true 2
3 │ true true true 3
4 │ false true false 1
5 │ false false false 0
6 │ true false false 1
julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>x->sum.(x))
6×4 Dataset
Row │ a_lim=>a b_lim=>b c_lim=>c row_function
│ identity identity identity identity
│ Bool? Bool? Bool? Int64?
─────┼────────────────────────────────────────────
1 │ true true false 2
2 │ true false true 2
3 │ true true true 3
4 │ false true false 1
5 │ false false false 0
6 │ true false false 1
julia> modify(modify(compare(ds[!, r"lim"], ds[!, Not(r"lim")], on = 1:3 .=> 1:3, eq = !isless), 1:3=>byrow(x->x.*1)),4=>byrow(sum))
6×4 Dataset
Row │ a_lim=>a b_lim=>b c_lim=>c row_function
│ identity identity identity identity
│ Bool? Bool? Bool? Array…?
─────┼────────────────────────────────────────────
1 │ true true false [1, 1, 0]
2 │ true false true [1, 0, 1]
3 │ true true true [1, 1, 1]
4 │ false true false [0, 1, 0]
5 │ false false false [0, 0, 0]
6 │ true false false [1, 0, 0]
byrow is fine tuned for a set of functions and operations (see its docstring for more details). For generic functions, byrow assumes the passed function accepts the row as a vector of values, and x->x .* 1 falls in this category, see ?byrow(x->x .* 1)
Thanks.
Here https://docs.juliahub.com/InMemoryDatasets/cS87e/0.6.10/man/byrow/#User-defined-operations here I read that
For user defined functions which return a single value, byrow treats each row as a vector of values, thus the user defined function must accept a vector and returns a single value.
So in the case of the function byrow(x->x .* 1), I understand that the single value is a vector. That is, that the vector, resulting from function, is intended as a single value.
This explains the result of applying the sum function.
In fact sum ([[1,2,3]]) = [1,2,3].
But so, I can't explain to myself why byrow (x-> sum(x)) seems to work instead.
While the situation of x-> sum.(x) is really different.
But so, I can't explain to myself why
byrow (x-> sum(x))seems to work instead.
This is something that I should add to the documentation. "byrow with a generic function and a single column acts like fun.(col)." docstrings fixed in master.