Oleksiy Kononenko comments

Results 133 comments of


                                            Oleksiy Kononenko

Implement cumulative functions

@samukweku we need to address https://github.com/h2oai/datatable/issues/3081 to improve performance in the case when there is no group-by context. For grouped frames we're fully parallel now. For cumulative functions we actually...

Implement cumulative functions

@samukweku >maybe you can explain more what you mean by parallelisation in terms of the actual data. It means that we parallelize loops to go over the frame rows, currently...

Implement cumulative functions

@samukweku What functionality you want to achieve with this function?

Implement cumulative functions

Actually, in datatable there is already a function called `count()` that is used to >Calculate the number of non-missing values for each column see https://datatable.readthedocs.io/en/latest/api/dt/count.html for more details. So the...

Implement cumulative functions

@vopani Well, I'm not sure why you think it is an unnatural and unintuitive name. The same name/behavior is used in, at least, [pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html) and [pyarrow](https://arrow.apache.org/docs/python/generated/pyarrow.compute.count.html). datatable just sticks to...

Implement cumulative functions

I guess it all depends on the definition. If we define `count()` as a function to count values, then obviously it should skip missing values — and then this is...

Implement cumulative functions

@samukweku I guess ```python DT[:, dt.cummax(f[:]), by('D')] ``` and ```python df.groupby('D')[['A','C']].cummax() ``` are doing different things. Just compare the results ```python | D A B C | str32 int32 void...

Implement cumulative functions

@samukweku It all depends on how you build the code, you could either do `make build` or `make debug`: https://datatable.readthedocs.io/en/latest/start/install.html#install-datatable-in-editable-mode To test performance, you need to build it in the...

Implement cumulative functions

@samukweku I guess there is a ticket https://github.com/h2oai/datatable/issues/1070

Error when `f[:]` is used in the sort context

Actually, even ```python DT[:, :, sort(f[:], reverse=[True, False])] ``` will error as ```python ValueError: Mismatch between the number of columns (ncols=1) to be sorted and number of elements (nflags=2) in...