weasel icon indicating copy to clipboard operation
weasel copied to clipboard

Weasel serializes `commands` field as string

Open caiorcferreira opened this issue 1 year ago โ€ข 4 comments

Description

It looks like Weasel is reading commands as a string rather than a list. This causes access to the name field to raise the error TypeError: string indices must be integers.

Environment

Name Version Build Channel weasel 0.3.4 py39hca03da5_0

Error

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/homebrew/Caskroom/miniconda/base/envs/spacy_dev_pt_core_chat_lg/lib/python3.9/site-packages โ”‚
โ”‚ /weasel/cli/run.py:42 in project_run_cli                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    39 โ”‚   โ”‚   print_run_help(project_dir, subcommand, parent_command)                            โ”‚
โ”‚    40 โ”‚   else:                                                                                  โ”‚
โ”‚    41 โ”‚   โ”‚   overrides = parse_config_overrides(ctx.args)                                       โ”‚
โ”‚ โฑ  42 โ”‚   โ”‚   project_run(                                                                       โ”‚
โ”‚    43 โ”‚   โ”‚   โ”‚   project_dir,                                                                   โ”‚
โ”‚    44 โ”‚   โ”‚   โ”‚   subcommand,                                                                    โ”‚
โ”‚    45 โ”‚   โ”‚   โ”‚   overrides=overrides,                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ locals โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚
โ”‚ โ”‚            ctx = <click.core.Context object at 0x11b83e9a0>                                  โ”‚ โ”‚
โ”‚ โ”‚            dry = False                                                                       โ”‚ โ”‚
โ”‚ โ”‚          force = False                                                                       โ”‚ โ”‚
โ”‚ โ”‚      overrides = {                                                                           โ”‚ โ”‚
โ”‚ โ”‚                  โ”‚   'vars.experiment': 29,                                                  โ”‚ โ”‚
โ”‚ โ”‚                  โ”‚   'vars.enabled_gazetteers': 'person,address',                            โ”‚ โ”‚
โ”‚ โ”‚                  โ”‚   'vars.input_data':                                                      โ”‚ โ”‚
โ”‚ โ”‚                  'experiments/028/data/oversampled_merged_dataset.json',                     โ”‚ โ”‚
โ”‚ โ”‚                  โ”‚   'vars.address_gazetteer':                                               โ”‚ โ”‚
โ”‚ โ”‚                  'assets/datasets/addresses/pt_br_address-gazetter-2.jsonl'                  โ”‚ โ”‚
โ”‚ โ”‚                  }                                                                           โ”‚ โ”‚
โ”‚ โ”‚ parent_command = 'python -m weasel'                                                          โ”‚ โ”‚
โ”‚ โ”‚    project_dir = PosixPath('.')                                                              โ”‚ โ”‚
โ”‚ โ”‚      show_help = False                                                                       โ”‚ โ”‚
โ”‚ โ”‚     subcommand = 'experiment'                                                                โ”‚ โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/homebrew/Caskroom/miniconda/base/envs/spacy_dev_pt_core_chat_lg/lib/python3.9/site-packages โ”‚
โ”‚ /weasel/cli/run.py:81 in project_run                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    78 โ”‚   skip_requirements_check (bool): No longer used, deprecated.                            โ”‚
โ”‚    79 โ”‚   """                                                                                    โ”‚
โ”‚    80 โ”‚   config = load_project_config(project_dir, overrides=overrides)                         โ”‚
โ”‚ โฑ  81 โ”‚   commands = {cmd["name"]: cmd for cmd in config.get("commands", [])}                    โ”‚
โ”‚    82 โ”‚   workflows = config.get("workflows", {})                                                โ”‚
โ”‚    83 โ”‚   validate_subcommand(list(commands.keys()), list(workflows.keys()), subcommand)         โ”‚
โ”‚    84                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ locals โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚
โ”‚ โ”‚                 capture = False                                                              โ”‚ โ”‚
โ”‚ โ”‚                  config = {                                                                  โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'title': 'NER portuguese chat',                                โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'description': 'Project tunning NER component in portuguese    โ”‚ โ”‚
โ”‚ โ”‚                           model using chat corpus',                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'directories': [                                               โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'assets',                                                  โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'scripts',                                                 โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'experiments',                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'baseline',                                                โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'packages'                                                 โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   ],                                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'assets': [                                                    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   {                                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'dest': 'assets/train.json',                           โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'description': 'Training data'                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   },                                                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   {                                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'dest': 'assets/dev.json',                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'description': 'Development data'                      โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   }                                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   ],                                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'commands': '[{"name":"download","help":"Download the          โ”‚ โ”‚
โ”‚ โ”‚                           pretrained pipeline","script":["python '+5046,                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'env': {},                                                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'vars': {                                                      โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'name': 'core_chat_lg',                                    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'lang': 'pt',                                              โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'pipeline': 'pt_core_news_lg',                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'version': '0.0.0',                                        โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'dataset': 'raw.json',                                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'train': 'train.json',                                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'dev': 'dev.json',                                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'test': 'test.json',                                       โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'test_data':                                               โ”‚ โ”‚
โ”‚ โ”‚                           'assets/datasets/chats/sample-chats-manual-labeled-test.json',     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'input_data':                                              โ”‚ โ”‚
โ”‚ โ”‚                           'assets/datasets/chats/sample-chats-manual-labeled-train.json',    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   ... +9                                                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   },                                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'workflows': {                                                 โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'experiment': [                                            โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'fetch-data',                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'split-data',                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'create-gazetteer',                                    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'convert',                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'train',                                               โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'evaluate'                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   ],                                                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'experiment_search': [                                     โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'fetch-data',                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'split-data',                                          โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'create-gazetteer',                                    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'convert',                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'train-search',                                        โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   โ”‚   'evaluate'                                             โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   ],                                                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   โ”‚   'experiment_new': ['setup_experiment', 'create-config']    โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   }                                                              โ”‚ โ”‚
โ”‚ โ”‚                           }                                                                  โ”‚ โ”‚
โ”‚ โ”‚                     dry = False                                                              โ”‚ โ”‚
โ”‚ โ”‚                   force = False                                                              โ”‚ โ”‚
โ”‚ โ”‚               overrides = {                                                                  โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'vars.experiment': 29,                                         โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'vars.enabled_gazetteers': 'person,address',                   โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'vars.input_data':                                             โ”‚ โ”‚
โ”‚ โ”‚                           'experiments/028/data/oversampled_merged_dataset.json',            โ”‚ โ”‚
โ”‚ โ”‚                           โ”‚   'vars.address_gazetteer':                                      โ”‚ โ”‚
โ”‚ โ”‚                           'assets/datasets/addresses/pt_br_address-gazetter-2.jsonl'         โ”‚ โ”‚
โ”‚ โ”‚                           }                                                                  โ”‚ โ”‚
โ”‚ โ”‚          parent_command = 'python -m weasel'                                                 โ”‚ โ”‚
โ”‚ โ”‚             project_dir = PosixPath('.')                                                     โ”‚ โ”‚
โ”‚ โ”‚ skip_requirements_check = False                                                              โ”‚ โ”‚
โ”‚ โ”‚              subcommand = 'experiment'                                                       โ”‚ โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/homebrew/Caskroom/miniconda/base/envs/spacy_dev_pt_core_chat_lg/lib/python3.9/site-packages โ”‚
โ”‚ /weasel/cli/run.py:81 in <dictcomp>                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    78 โ”‚   skip_requirements_check (bool): No longer used, deprecated.                            โ”‚
โ”‚    79 โ”‚   """                                                                                    โ”‚
โ”‚    80 โ”‚   config = load_project_config(project_dir, overrides=overrides)                         โ”‚
โ”‚ โฑ  81 โ”‚   commands = {cmd["name"]: cmd for cmd in config.get("commands", [])}                    โ”‚
โ”‚    82 โ”‚   workflows = config.get("workflows", {})                                                โ”‚
โ”‚    83 โ”‚   validate_subcommand(list(commands.keys()), list(workflows.keys()), subcommand)         โ”‚
โ”‚    84                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ locals โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                                                   โ”‚
โ”‚ โ”‚  .0 = <str_iterator object at 0x107dc13a0> โ”‚                                                   โ”‚
โ”‚ โ”‚ cmd = '['                                  โ”‚                                                   โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ                                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
TypeError: string indices must be integers

caiorcferreira avatar Dec 12 '24 11:12 caiorcferreira

Thanks.

Could you paste your workflow file? This can't be the right error behaviour no matter what, but I'm trying to figure out whether it's doing this on a workflow that should work, or whether it's just heading down the wrong error path.

honnibal avatar Dec 12 '24 12:12 honnibal

My workflow is a bit customized in how it uses variables, but it worked for a time. I wonder if my conda environment changed some version that led to this new behavior.

Workflow file:

title: "NER portuguese chat"
description: "Project tunning NER component in portuguese model using chat corpus"

# Variables can be referenced across the project.yml using ${vars.var_name}
vars:
  name: "core_chat_lg"
  lang: "pt"
  pipeline: "pt_core_news_lg"
  version: "0.0.0"

  dataset: "raw.json"
  train: "train.json"
  dev: "dev.json"
  test: "test.json"

  test_data: "assets/datasets/chats/sample-chats-manual-labeled-test.json"
  input_data: "assets/datasets/chats/sample-chats-manual-labeled-train.json"

  experiment: "01"
  train_size: 0.8

  enabled_gazetteers: "null"
  person_gazetteer: "assets/datasets/names_surnames/pt_br_names-gazetteer.jsonl"
  address_gazetteer: "assets/datasets/addresses/pt_br_address-gazetter.jsonl"
  person_entity_ruler_patterns: "null"
  loc_entity_ruler_patterns: "null"

  gazetteers_pattern: "gazetteers_patterns.jsonl"

  # Set your GPU ID, -1 is CPU
  gpu_id: -1

# These are the directories that the project needs. The project CLI will make
# sure that they always exist.
directories: ["assets", "scripts", "experiments", "baseline", "packages"]

# Assets that should be downloaded or available in the directory. We're shipping
# them with the project, so they won't have to be downloaded.
assets:
  - dest: "assets/train.json"
    description: "Training data"
  - dest: "assets/dev.json"
    description: "Development data"

# Workflows are sequences of commands (see below) executed in order. You can
# run them via "spacy project run [workflow]". If a commands's inputs/outputs
# haven't changed, it won't be re-run.
workflows:
  experiment:
    - fetch-data
    - split-data
    - create-gazetteer
    - convert
    - train
    - evaluate

  experiment_search:
    - fetch-data
    - split-data
    - create-gazetteer
    - convert
    - train-search
    - evaluate

  experiment_new:
    - setup_experiment
    - create-config

# Project commands, specified in a style similar to CI config files (e.g. Azure
# pipelines). The name is the command name that lets you trigger the command
# via "spacy project run [command] [path]". The help message is optional and
# shown when executing "spacy project run [optional command] [path] --help".
commands:
- name: "download"
  help: "Download the pretrained pipeline"
  script:
    - "python -m spacy download ${vars.pipeline}"

- name: "setup_experiment"
  help: "Setup experiment directory structure"
  script:
    - "mkdir -p experiments/0${vars.experiment}/data experiments/0${vars.experiment}/configs experiments/0${vars.experiment}/training experiments/0${vars.experiment}/corpus experiments/0${vars.experiment}/scripts"
    - "touch experiments/0${vars.experiment}/README.md"

- name: "create-config"
  help: "Create a config for updating only NER from an existing pipeline"
  script:
    - "python scripts/create_config.py ${vars.pipeline} ner experiments/0${vars.experiment}/data/${vars.gazetteers_pattern} ${vars.enabled_gazetteers} experiments/0${vars.experiment}/configs/config.cfg"
  deps:
    - "scripts/create_config.py"
  outputs:
    - "experiments/0${vars.experiment}/configs/config.cfg"

- name: "fetch-data"
  help: "Fetch the training and test data"
  script:
    - "cp ${vars.input_data} experiments/0${vars.experiment}/data/${vars.dataset}"
    - "cp ${vars.test_data} experiments/0${vars.experiment}/data/${vars.test}"
  deps:
    - "${vars.input_data}"
    - "${vars.test_data}"
  outputs:
    - "experiments/0${vars.experiment}/data/${vars.dataset}"
    - "experiments/0${vars.experiment}/data/${vars.test}"

- name: "split-data"
  help: "Split the data into training and eval sets, and copy the test data"
  script:
    - "python scripts/split_train_test.py experiments/0${vars.experiment}/data/${vars.dataset} ${vars.train_size} experiments/0${vars.experiment}/data/${vars.train} experiments/0${vars.experiment}/data/${vars.dev}"
  deps:
    - "experiments/0${vars.experiment}/data/${vars.dataset}"
    - "scripts/split_train_test.py"
  outputs:
    - "experiments/0${vars.experiment}/data/${vars.train}"
    - "experiments/0${vars.experiment}/data/${vars.dev}"

- name: "create-gazetteer"
  help: "Merge gazetter into single pattern file"
  script:
    - "python scripts/merge_gazetters.py ${vars.enabled_gazetteers} ${vars.person_gazetteer} ${vars.address_gazetteer} experiments/0${vars.experiment}/data/${vars.gazetteers_pattern}"
  deps:
    - "${vars.person_gazetteer}"
    - "${vars.address_gazetteer}"
    - "scripts/merge_gazetters.py"
  outputs:
    - "experiments/0${vars.experiment}/data/${vars.gazetteers_pattern}"

- name: "convert"
  help: "Convert the data to spaCy's binary format"
  script:
    - "mkdir -p experiments/0${vars.experiment}/corpus"
    - "python scripts/convert.py ${vars.lang} experiments/0${vars.experiment}/data/${vars.train} experiments/0${vars.experiment}/corpus/train.spacy"
    - "python scripts/convert.py ${vars.lang} experiments/0${vars.experiment}/data/${vars.dev} experiments/0${vars.experiment}/corpus/dev.spacy"
    - "python scripts/convert.py ${vars.lang} experiments/0${vars.experiment}/data/${vars.test} experiments/0${vars.experiment}/corpus/test.spacy"
  deps:
    - "experiments/0${vars.experiment}/data/${vars.train}"
    - "experiments/0${vars.experiment}/data/${vars.dev}"
    - "experiments/0${vars.experiment}/data/${vars.test}"
    - "scripts/convert.py"
  outputs:
    - "experiments/0${vars.experiment}/corpus/train.spacy"
    - "experiments/0${vars.experiment}/corpus/dev.spacy"
    - "experiments/0${vars.experiment}/corpus/test.spacy"

- name: "train"
  help: "Update the NER model"
  script:
    - "mkdir -p experiments/0${vars.experiment}/training"
    - "python -m spacy train experiments/0${vars.experiment}/configs/config.cfg --output experiments/0${vars.experiment}/training/ --paths.entity_ruler_patterns experiments/0${vars.experiment}/data/${vars.gazetteers_pattern} --paths.person_entity_ruler_patterns ${vars.person_entity_ruler_patterns} --paths.loc_entity_ruler_patterns ${vars.loc_entity_ruler_patterns} --paths.train experiments/0${vars.experiment}/corpus/train.spacy --paths.dev experiments/0${vars.experiment}/corpus/dev.spacy --gpu-id ${vars.gpu_id}"
  deps:
    - "experiments/0${vars.experiment}/configs/config.cfg"
    - "experiments/0${vars.experiment}/corpus/train.spacy"
    - "experiments/0${vars.experiment}/corpus/dev.spacy"
  outputs:
    - "experiments/0${vars.experiment}/training/model-best"

- name: "train-search"
  help: "Run customized training runs for hyperparameter search using [Weights & Biases Sweeps](https://docs.wandb.ai/guides/sweeps)"
  script:
    - "mkdir -p experiments/0${vars.experiment}/training"
    - "python scripts/train/wandb_sweeps.py experiments/0${vars.experiment}/configs/config.cfg experiments/0${vars.experiment}/training/ experiments/0${vars.experiment}/corpus/train.spacy experiments/0${vars.experiment}/corpus/dev.spacy experiments/0${vars.experiment}/corpus/train.spacy --gazetteer-path experiments/0${vars.experiment}/data/${vars.gazetteers_pattern}"
  deps:
    - "scripts/train/wandb_sweeps.py"
    - "experiments/0${vars.experiment}/configs/config.cfg"
    - "experiments/0${vars.experiment}/corpus/train.spacy"
    - "experiments/0${vars.experiment}/corpus/dev.spacy"
  outputs:
    - "experiments/0${vars.experiment}/training/model-best"

- name: "evaluate"
  help: "Evaluate the model and export metrics"
  script:
    - "python -m spacy evaluate experiments/0${vars.experiment}/training/model-best experiments/0${vars.experiment}/corpus/test.spacy --output experiments/0${vars.experiment}/metrics.json"
  deps:
    - "experiments/0${vars.experiment}/corpus/test.spacy"
    - "experiments/0${vars.experiment}/training/model-best"
  outputs:
    - "experiments/0${vars.experiment}/metrics.json"

- name: package
  help: "Package the trained model as a pip package"
  script:
    - "python -m spacy package experiments/0${vars.experiment}/training/model-best packages --name ${vars.name} --version ${vars.version} --force"
  deps:
    - "experiments/0${vars.experiment}/training/model-best"
  outputs_no_cache:
    - "packages/${vars.lang}_${vars.name}-${vars.version}/dist/${vars.lang}_${vars.name}-${vars.version}.tar.gz"

- name: visualize-model
  help: Visualize the model's output interactively using Streamlit
  # https://github.com/explosion/spacy-streamlit/issues/55
  script:
    - 'python -m streamlit run scripts/visualize_model.py experiments/0${vars.experiment}/training/model-best "AUTOMATION: Nรฃo aceite cobranรงa na entrega se o pedido foi pago pelo app e nunca compartilhe dados pessoais em conversas de chat ou telefone.'
  deps:
    - "scripts/visualize_model.py"
    - "experiments/0${vars.experiment}/training/model-best"

caiorcferreira avatar Dec 12 '24 12:12 caiorcferreira

Is the indentation right in 'commands' (maybe it's just a paste thing)? I'd have a quick look at how the file parses in a yaml-to-json converter, just to see if there's some stupid yaml whitespace thing.

honnibal avatar Dec 12 '24 12:12 honnibal

I've tried adding indentation, but the error persists. Per YAML spec, we can declare lists with or without indentation.

Try the following YAML at https://onlineyamltools.com/convert-yaml-to-json

list:
- one
- two

And the output will be:

{
  "list": [
    "one",
    "two"
  ]
}

caiorcferreira avatar Dec 12 '24 12:12 caiorcferreira