Philippe Moussalli
Philippe Moussalli
- PR that implements a notebook to demonstrate the usage of the beam dataframe API as a preprocessing tool for ML training WIP: - [ ] **Find a method to...
### What would you like to happen? We would like to use the DataFrame API to perform one-hot encoding on categorical columns. Currently, this can be done with `pd.get_dummies()` method...
This issue seems to be Dask related. More info in [this](https://github.com/dask/dask/issues/11021) ticket
Related to https://github.com/ml6team/fondant/pull/832#pullrequestreview-1855968909
More info [here](https://github.com/ml6team/fondant/pull/802)
In order to propagate errors that occur in docker back to python, we need to enable `abort-on-container-exist` flag. However, because we're running containers in sequence, this means that the whole...
TDD can make the development of component faster (no need to run within a pipeline) and more robust (unit test). We can have a command that generated a generic boilerplate...
Most pandas transform components seem to be implemented differently (way of passing columns, how transform is applied, where transform functions are defined, ...). It would be nice to revisit the...