mlvtools
mlvtools copied to clipboard
Public repository for versioning machine learning data
I'm following the use-case 1 tutorial and got to this step: `gen_dvc -w . -i ./poc/pipeline/steps/mlvtools_01_extract_dataset.py` The generated command does not appear to populate anything for the `-n` argument, which...
It will be nice to see the tools as a conda-forge package
I have multiple string parameters in notebooks. In the beginning I thought that all the params I declare as :param in docstrings should be automatically passed to dvc bash script...
When I have multiple notebooks it is very annoyng to repeat ipynb_to_dvc multiple times, what about just having a -R flag and providing notebooks folder (explicetly or via configuration) and...
If a .mlvtools configuration file is present, the current behaviour is that any command will first check that the configuration file is correctly formatted and validate that all the paths...
For pipelines producing metrics as their last step and including a train-test-split, we want to get cross-validated metrics.
Building up on feature https://github.com/peopledoc/ml-versioning-tools/issues/48, if the experiment contains some stochasticity (random train / validation split, random initalization, stochastic gradient descent), we want to reiterate several runs of the experiments...
Building up on feature https://github.com/peopledoc/ml-versioning-tools/issues/48, we want to compare metrics between experiments (git branches). For example, MLV-tools could produce a table of hyperparameters / input data and output metrics in...
The DVC `metrics` command offers the possibility to save some evaluation metrics in a file; however, we would like to have some capabilities similar to MLFlow to store side-by-side hyperparameters...
See for example [these guidelines](https://hynek.me/talks/python-foss/), to make our repository more contributor-friendly.