spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

TypeError raised for project.yml with vars containing int inside string

Open BLannoo opened this issue 4 years ago • 4 comments

Hi spaCy developers,

This is my first open-source contribution, so please let me know if there are things I could have done better.

How to reproduce the behavior

Given project.yml:

vars:
  my_var: "1"
commands:
  - name: "example"
    script:
      - "echo ${vars.my_var}"

When running:

python -m spacy project run example

Then I expect:

Running command: echo 1
1

However I get:

.venv/lib/python3.9/site-packages/spacy/cli/project/run.py", line 69, in <dictcomp>
commands = {cmd["name"]: cmd for cmd in config.get("commands", [])} 
TypeError: string indices must be integers 

This seems to be related to the my_var: "1", notice that my_var: 1 does work as expected.

I dug a little into your code and it seems the issue is coming from a conversion to string that only gets partially reverted in _utip.py:substitute_project_variables on line 194:

cfg = Config().from_str(cfg.to_str(), overrides=overrides)

before this line cfg contains a commands attribute that is a list as expected. However, after this line, the commands attribute becomes a string (the string does contain the list) which seems like the issue. At that point, it became a bit too advanced for me 😅


There is a workaround (using integers directly), but it did take me a couple of hours to figure out this was the issue 😅 , so a fix or a better exception might save others quite a bit of time.

Your Environment

  • spaCy version: 3.1.2
  • Platform: macOS-10.15.6-x86_64-i386-64bit
  • Python version: 3.9.6

Final remark

Great library, great documentation, loved the "free interactive course" hopefully you find some time to update it to 3.x some time.

BLannoo avatar Aug 29 '21 07:08 BLannoo

Thanks for reporting this, we'll take a look at fixing it!

polm avatar Aug 30 '21 05:08 polm

Unfortunately this is a known limitation, tied to how the config files are parsed & interpreted. cf https://github.com/explosion/thinc/pull/488:

one additional issue couldn't be resolved: if you have an integer value in quotes like "42", internally this will be parsed as an integer first, then converted into a backslashed version to ensure the value remains a string, and all this does not play well when that string is again inserted in a larger string like "hello ${vars.a}". Long story short - this can be resolved by just putting 42 instead of "42"

That PR also has some unit tests for this behaviour that are currently xfailing, and I simply haven't found a way to deal with this yet. Considering the fact that the underlying mechanisms are quite complicated, and that there is an easy workaround (write your numbers as numbers, not as strings), it's not a high-priority to fix this right now. But we should definitely try to produce a more meaningful error/warning.

svlandeg avatar Aug 30 '21 16:08 svlandeg

I ran into this today, perhaps it could be detected and an error thrown? It could be checked here: https://github.com/explosion/spaCy/blob/master/spacy/cli/_util.py#L135

e.g just checking that post interpolation, none of the major config sections are strings or something?

DeNeutoy avatar Apr 09 '22 00:04 DeNeutoy