danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Feature: Iterable Groupby Class

Open andria-dev opened this issue 2 years ago • 0 comments

What is the feature?

I'd love to be able to iterate over the groups in a Groupby. The only way that I know how to iterate over a Groupby currently is to use GroupBy#apply() — which is not documented, does not give the grouped value, and does not offer the same loop control as an iterable. I would prefer a more natural and readable approach for iteration (preferably a `for ... of`` loop).

Solution

I'd like the Groupby class to implement the iterable protocol with Symbol.iterator so that someone can use a for ... of loop (or even the spread syntax ...) to iterate over each group's value that it was grouped by and the data in that group as a DataFrame. This could be implemented similarly to how For example:

let dataframe = new dfd.DataFrame([
  {category: 'a', valueA: 123, valueB: 456},
  {category: 'a', valueA: 22, valueB: 56},
  {category: 'b', valueA: 11, valueB: 314},
  {category: 'b', valueA: 155, valueB: 2222},
])
for (const [category, dataframe] of dataframe.groupby(['category'])) {
  console.log(category) // a
  dataframe.print()
  /**
    ╔═══╤══════════╤════════╤════════╗
    ║   │ category │ valueA │ valueB ║
    ╟───┼──────────┼────────┼────────╢
    ║ 0 │ a        │ 123    │ 456    ║
    ╟───┼──────────┼────────┼────────╢
    ║ 1 │ a        │ 22     │ 56     ║
    ╚═══╧══════════╧════════╧════════╝
   */
}

The documentation would also need to be updated to inform people that Groupby is iterable, preferably with an example of such.

Alternatives

You could also provide only each group's DataFrame (i.e. no value that it was grouped by), but that would be a bit annoying since I can't think of any situation where you'd want to group the data and then ignore the value that was used for grouping.

A complete alternative could be implementing a forEach method, but it would be quite odd to only implement the forEach method without implementing the iterable protocol given that implementing either of them is almost the same process while the iterable protocol gives greater control.

andria-dev avatar Aug 27 '23 06:08 andria-dev