mtdata icon indicating copy to clipboard operation
mtdata copied to clipboard

[WIP] 0.3.8 development

Open thammegowda opened this issue 3 years ago • 0 comments

Change log

  • CLI arg --log-level with default set to WARNING
  • progressbar can be disabled from CLI --no-pbar; default is enabled --pbar
python -m mtdata -h    
usage: __main__.py [-h] [-vv] [-v] [-ll {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-ri] [-pb | -no-pb] {list,get,report,list-recipe,get-recipe,stats} ...

positional arguments:
  {list,get,report,list-recipe,get-recipe,stats}
                        
 
optional arguments:
  -h, --help            show this help message and exit
  -vv, --verbose        verbose mode (default: False)
  -v, --version         show program's version number and exit
  -ll {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set log level (default: WARNING)
  -ri, --reindex        Invalidate index of entries and recreate it. This deletes /Users/tg/.mtdata/mtdata.index.0.3.8-dev.pkl only and not the downloaded files. Use this if you're using in
                        developer mode and modifying mtdata index. (default: False)
  -pb, --pbar           Show progressbar (default: True)
  -no-pb, --no-pbar     Do not show progressbar (default: False)
 
  • mtdata stats --quick does HTTP HEAD and shows content length; e.g. mtdata stats --quick Statmt-commoncrawl-wmt19-fra-deu
{"id": "Statmt-commoncrawl-wmt19-fra-deu", "total_bytes": 65880180, "total_size": "65.88 MB",
"urls": {"https://data.statmt.org/wmt19/translation-task/fr-de/bitexts/commoncrawl.fr.gz": 32607032, "https://data.statmt.org/wmt19/translation-task/fr-de/bitexts/commoncrawl.de.gz": 33273148}}
  • stats without quick shows segs, toks, chars, bytes, and total_size mtdata -no-pb stats Statmt-commoncrawl-wmt19-fra-deu
{"id": "Statmt-commoncrawl-wmt19-fra-deu", "segs": 622288, "segs_err": 0, "segs_noise": 0, "deu_toks": 12217694, "fra_toks": 13992149, "deu_chars": 85747775, "fra_chars": 87337364, "deu_bytes": 87746644, "fra_bytes": 90752783, "total_bytes": 178499427, "total_size": "178.5 MB"}
  • mtdata.scripts.recipe_stats to read stats from output directory
python -m mtdata.scripts.recipe_stats -h                       
usage: recipe_stats.py [-h] [-d] [-t TOK_EXT] [-lc] DIR [DIR ...]

positional arguments:
DIR                   Recipe or experiment directory

optional arguments:
-h, --help            show this help message and exit
-d, --debug           Enable debug logs (default: False)
-t TOK_EXT, --tok-ext TOK_EXT
                      Tokenized files extension. Example .tok (default: None)
-lc, --caseless       Case insesitive when counting number of types (i.e. unique words) (default: False)

thammegowda avatar Jul 11 '22 20:07 thammegowda