[REVIEW] NULL support for hash-based group by
This PR adds support for NULL values to the hash-based groupby implementation.
The build_aggregation_table kernel was updated such that if a row is invalid, then it is not inserted into the hash table.
Are there any tests that should be added here? Or is everything covered by existing tests?
Code LGTM otherwise. Thanks for putting this in @jrhemstad!
This requires changes on the PyGDF side in making sure to allocate validity buffers for all columns so please coordinate before merging.
This requires changes on the PyGDF side in making sure to allocate validity buffers for all columns so please coordinate before merging.
@kkraus14 is this handled in pygdf now? can we merge?
@nsakharnykh I believe there's still issues with binaryops if we always always allocate a validity mask, please hold off for the time being.
@kkraus14 ping, are there still issues with binaryops?
@nsakharnykh just pulled in latest changes and running tests now. I know sort based groupbys don't handle validity masks properly, but I believe binaryops were actually okay. I apologize for being mistaken earlier.
@nsakharnykh This PR needs to be merged first: https://github.com/gpuopenanalytics/pygdf/pull/246 Which will always allocate validity masks for you so you won't encounter a situation where validity masks aren't allocated and you need to create one.