easydata icon indicating copy to clipboard operation
easydata copied to clipboard

A flexible template for doing reproducible data science in Python.

Results 72 easydata issues
Sort by recently updated
recently updated
newest added

Multiple forks of the same repo locally lead to multiple `src` modules, with only one of them installed with the "correct" paths. Hmmmm....how to avoid this? Or make it robust...

bug
question

I should be able to download a LICENSE or README from a URL and add them to a datasource

With `hash_value=None` it should compute the hash and store it. It's expecting a value anyway... ``` --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in ----> 1 dsrc.fetch() .../src/data/datasets.py in fetch(self,...

`workflow.add_dataset `should wrap the appropriate dag calls. For example: `workflow.add_dataset(dataset_name='wine_reviews_130k', datasource_name='wine_reviews')` should simply wrap `dag.add_source(output_dataset='wine_reviews_130k', datasource_name='wine_reviews')`

in the docstring is says > If a cached copy of the dataset is present on disk, (and its hashes match those in the dataset catalog), > the cached copy...

When running `make create_environment` it seems to be using the lock file: ``` /bin/conda env update -n covid_nlp -f environment.i386.lock.yml Collecting package metadata (repodata.json): done Solving environment: done ``` is...

bug

In Makefile.Include there is a hard-coded CONDA-EXE path. Is there a way to at least issue a warning if you try to make it using someone else's path? (like when...

question

`make data` and `make sources` both end in an error if there is no process function: ``` python3 -m src.data.make_dataset process 2020-03-21 12:35:54,219 - datasets - INFO - Running process...

bug

Make clean currently runs rm -rf commands. Bad things can happen if either your paths aren't set right, or you share your data directory. Clean based on file names.

bug

Make data on demand. That is, if the instructions are there in the catalog, Dataset.load("datasetname") should just work. Even if no fetching, unpacking, or processing has happened yet.

enhancement