Pavithra Eswaramoorthy
Pavithra Eswaramoorthy
@ba05 Thanks for reporting! I can reproduce this, and it does look like a bug. Note that the exact code above will NOT fail in previous versions of Dask because...
The core issue is in how the categories are represented in Dask. The `make` column always retains metadata about *all* the categories, even if we wish to work only on...
@rjzamora Thank you for opening this PR, and apologies for not looking at this sooner! Are you still looking for a review? @ian-r-rose and I went through this briefly. What...
@pp-mo Thanks for opening this! > It's possible that this behaviour is essential, e.g. because a memory-copied object contains O.S. resources (like file handles) which cannot function in a forked...
@ian-r-rose Do you have thoughts on this issue and how we can improve the docs?
@andrewbarisser Welcome! I think this might be intentional because [the docstring for `bind` says](https://docs.dask.org/en/stable/graph_manipulation.html#dask.graph_manipulation.bind): > All keys of children will be regenerated, up to and excluding the keys of omit....
@andrewbarisser Ah, I see what you mean. I think you're absolutely right that the root cause is `key_split`. However, `key_split` seems to be intentionally trimming any numbers after hyphens. I...
Thanks for reporting! This only seems to affect the column we're grouping on (e.g., `ddf.groupby('xx').y.cumsum()` works fine). I'll keep looking into it.
@jrbourbeau and I looked into this. The issue seems to be in: https://github.com/dask/dask/blob/8b95f983c232c1bd628e9cba0695d3ef229d290b/dask/dataframe/groupby.py#L1274-L1281 Consider the following example (same example as OP, only the values are hardcoded for understanding): ```python import...
Thanks, @ncclementi! I agree the API section can benefit from a better structure. I think @scharlottej13 and @jsignell might have thoughts on this as well. :)