fondant
fondant copied to clipboard
Production-ready data processing made easy and shareable
#853 introduced a dataset-first interface. Some ideas aren't implement yet. We should pack more functionality in the Dataset class: - view data preview (html formatted like pandas) - view data...
With the new interface we can initialise a dataset from a manifest. The manifest are currently located in the working directory. If we want to share the dataset/manifest we have...
The new dataset interface allows to read a dataset using a manifest from cloud storage buckets. Also we support initialise a dataset from a manifest file on your local machine...
This issue seems to be Dask related. More info in [this](https://github.com/dask/dask/issues/11021) ticket
The name of components when run using local and remote runner is a bit unreadable since it follows the class name when using lightweight components. (e.g. LoadDocumentFromJson) and does not...
Currently it is not possible to apply custom configuration to the components local dask cluster. However, this might be useful in for some use cases and components. Mentioned in [#15](https://github.com/ml6team/fondant-usecase-controlnet/pull/15)
When using the `load_from_parquet` component, it is not possible to keep the original index. If the `id_column` argument is not set, Fondant will automatically generate a new unique index. But...
With the introduction of the lightweight components, our `component` and `pipeline` SDKs are no longer cleanly split. We should either split them again, or include the optional installs for both...
In order to further optimize the development cycle eager execution will be big feature. The idea is that you can run partial pipelines / single components easily and get instant...