bio-datasets
bio-datasets copied to clipboard
Free collection of Bio datasets and embeddings
The main idea (to be confirmed though) is to have for the user the following process: - The user adds raw data files such as (csv + npy for embeddings...
In order to be able to load our data with `to_npy_array` in memory
- The test coverage can be computed thanks to pytest job. - It is always nice for a user to know what is the test coverage of the library used.
An issue template is a good way to define the structure of the issue based on the type: bug, feature request, documentation, ...
I've successfully uploaded a dataset (subset of PDB) but it has unusual labels in that they are matrices. Storing matrices/ndarrays/sparse arrays as a column in a `.csv` is not ideal....
We should clarify the structure of the `description.md` file for a dataset. Given the structure, we would have different functions (i.e. `display_description()`, `display_summary`, etc..) that would display different parts of...
Configuration file to define the `dataset` and `embeddings` files as well the inputs/targets variable names (add them as attributes). - Also add an attribute when there is only one input...