podium
podium copied to clipboard
Podium: a framework agnostic Python NLP library for data loading and preprocessing
We need a way to automatically test examples to ensure they work with framework core changes. One solution would be marking these tests can be marked with `slow` to avoid...
Closes #273 Draft, function calls and names subject to change, but the gist is here.
Discussed this change with @mttk on Slack. Features: * lazy module loading * makes it possible to import DiskBackedDataset, HFDatasetConverter and YAKE from the top level `__init__.py` but only if...
Currently, `get_dataset_splits()` in our datasets is a static method (`@staticmethod`), but it would be more appropriate to have it marked as a class method (`@classmethod`). The following example shows the...
## 🐛 Bug This is a serious bug. If `ExampleFactory` is instantiated with fields in the dict format, calling `from_list` will throw an error. This line in `ExampleFactory.from_list` is the...
At some point, we could implement this (and the creation of `data`) using views. Now is not the time though. Some performance metrics would be interesting to compare between this...
Refactors `sort`, `shuffle` and `filter` in DatasetBase/Dataset.
ArrowDataset.from_tabular_file is similar to TabularDataset's `__init__`. I think it's safe to remove this function. The same effect can be achieved with: ```python ArrowDataset.from_dataset(TabularDataset(...), ...) ``` E.g. #267 introduced some changes...