get_group unpredicted behaviour in case of Sorting applied
When group_by applied on sorted DataFrame get_group will return wrong entries in DataFrame
df=Daru::DataFrame.new([
10.times.collect{|i| i},
10.times.collect{|i| "b"},
10.times.collect{|i| i%2 == 0 ? "c" : "d"},
],
order: [:a,:b,:c]
)
#Works Properly
grouped=df.group_by([:b,:c])
grouped.get_group(["b","c"])
=> #<Daru::DataFrame(5x3)>
a b c
0 0 b c
2 2 b c
4 4 b c
6 6 b c
8 8 b c
#Corrupted after sort applied to DF
df.sort!([:c])
grouped=df.group_by([:b,:c])
grouped.get_group(["b","c"])
=> #<Daru::DataFrame(5x3)>
a b c
0 0 b c
2 4 b c
4 8 b c
6 3 b d
8 7 b d
As I understand reindexing after sorting may help. df.index = Daru::Index.new(Array.new(df.size) { |i| i })
I'm running into a similar issue that occurs when you remove rows from a dataset using filter before calling group_by - it looks like get_group does not respect non-standard indices on rows, so grouping operations will only work if your rows are indexed the default way (zero-based, consecutive integers). I don't know the Daru internals well, but the issue appears to be here: https://github.com/SciRuby/daru/blob/v0.2.2/lib/daru/core/group_by.rb#L258-L267
The conversion of @context to elements throws away @context's original indices, and references in to elements.transpose assume that the indices are the defaults (i.e. 0, 1, 2, 3, ...).