Philippe Moussalli

Results 15 issues of Philippe Moussalli

- PR that implements a notebook to demonstrate the usage of the beam dataframe API as a preprocessing tool for ML training WIP: - [ ] **Find a method to...

examples

### What would you like to happen? We would like to use the DataFrame API to perform one-hot encoding on categorical columns. Currently, this can be done with `pd.get_dummies()` method...

new feature
P2
dsl
dataframe

This issue seems to be Dask related. More info in [this](https://github.com/dask/dask/issues/11021) ticket

Related to https://github.com/ml6team/fondant/pull/832#pullrequestreview-1855968909

Testing

More info [here](https://github.com/ml6team/fondant/pull/802)

In order to propagate errors that occur in docker back to python, we need to enable `abort-on-container-exist` flag. However, because we're running containers in sequence, this means that the whole...

TDD can make the development of component faster (no need to run within a pipeline) and more robust (unit test). We can have a command that generated a generic boilerplate...

Ease of use

Most pandas transform components seem to be implemented differently (way of passing columns, how transform is applied, where transform functions are defined, ...). It would be nice to revisit the...

Components