Oleksiy Kononenko

Results 133 comments of Oleksiy Kononenko

@samukweku we need to address https://github.com/h2oai/datatable/issues/3081 to improve performance in the case when there is no group-by context. For grouped frames we're fully parallel now. For cumulative functions we actually...

@samukweku >maybe you can explain more what you mean by parallelisation in terms of the actual data. It means that we parallelize loops to go over the frame rows, currently...

@samukweku What functionality you want to achieve with this function?

Actually, in datatable there is already a function called `count()` that is used to >Calculate the number of non-missing values for each column see https://datatable.readthedocs.io/en/latest/api/dt/count.html for more details. So the...

@vopani Well, I'm not sure why you think it is an unnatural and unintuitive name. The same name/behavior is used in, at least, [pandas](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html) and [pyarrow](https://arrow.apache.org/docs/python/generated/pyarrow.compute.count.html). datatable just sticks to...

I guess it all depends on the definition. If we define `count()` as a function to count values, then obviously it should skip missing values — and then this is...

@samukweku I guess ```python DT[:, dt.cummax(f[:]), by('D')] ``` and ```python df.groupby('D')[['A','C']].cummax() ``` are doing different things. Just compare the results ```python | D A B C | str32 int32 void...

@samukweku It all depends on how you build the code, you could either do `make build` or `make debug`: https://datatable.readthedocs.io/en/latest/start/install.html#install-datatable-in-editable-mode To test performance, you need to build it in the...

@samukweku I guess there is a ticket https://github.com/h2oai/datatable/issues/1070

Actually, even ```python DT[:, :, sort(f[:], reverse=[True, False])] ``` will error as ```python ValueError: Mismatch between the number of columns (ncols=1) to be sorted and number of elements (nflags=2) in...