easydata issues

Issues with multiple local copies of the src module

Multiple forks of the same repo locally lead to multiple `src` modules, with only one of them installed with the "correct" paths. Hmmmm....how to avoid this? Or make it robust...

acwooding

bug

question

Can't add metadata from a URL

I should be able to download a LICENSE or README from a URL and add them to a datasource

hackalog

With `hash_value=None` it should compute the hash and store it. It's expecting a value anyway... ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 dsrc.fetch() .../src/data/datasets.py in fetch(self,...

acwooding

Fix add_dataset

`workflow.add_dataset `should wrap the appropriate dag calls. For example: `workflow.add_dataset(dataset_name='wine_reviews_130k', datasource_name='wine_reviews')` should simply wrap `dag.add_source(output_dataset='wine_reviews_130k', datasource_name='wine_reviews')`

acwooding

Dataset.from_catalog doesn't use the catalog

in the docstring is says > If a cached copy of the dataset is present on disk, (and its hashes match those in the dataset catalog), > the cached copy...

acwooding

Create environment with lock file?

2

When running `make create_environment` it seems to be using the lock file: ``` /bin/conda env update -n covid_nlp -f environment.i386.lock.yml Collecting package metadata (repodata.json): done Solving environment: done ``` is...

acwooding

bug

Remove/warn Hard-coded conda path

In Makefile.Include there is a hard-coded CONDA-EXE path. Is there a way to at least issue a warning if you try to make it using someone else's path? (like when...

acwooding

question

Error if no process function

`make data` and `make sources` both end in an error if there is no process function: ``` python3 -m src.data.make_dataset process 2020-03-21 12:35:54,219 - datasets - INFO - Running process...

acwooding

bug

Make clean via file names

Make clean currently runs rm -rf commands. Bad things can happen if either your paths aren't set right, or you share your data directory. Clean based on file names.

acwooding

bug

Make data on demand

Make data on demand. That is, if the instructions are there in the catalog, Dataset.load("datasetname") should just work. Even if no fetching, unpacking, or processing has happened yet.

acwooding

enhancement

easydata
easydata copied to clipboard

Metadata

Issues with multiple local copies of the src module

Can't add metadata from a URL

Error on first fetch

Fix add_dataset

Dataset.from_catalog doesn't use the catalog

Create environment with lock file?

Remove/warn Hard-coded conda path

Error if no process function

Make clean via file names

Make data on demand

← Metadata

Owner

Metadata

easydata easydata copied to clipboard

Metadata

← Metadata

Owner

Metadata

easydata
easydata copied to clipboard