Tagging inconsistent based on order
Really quick, huge thanks for creating/maintaining this library. You're amazing.
I don't know if this is even something that's possible to address, but it appears that the tagging of the tokens is dependent on the order. This makes sense, but it means that the same string can yield different results based on the order of the tokens.
Example
Also, as an aside, it may be worth adding "whole" as a unit.
Again, thank you so much!
Hi @ImBaedin
You're right that the tagging is dependent on the order of the tokens. The local context for a token is very important in determining the right tag.
The difference between your first two examples and the third is deliberate. The order of words in the sentence can change the result in a significantly. In your example, there isn't really a difference because either way you end with the same amount of chicken. But consider a sentence like:
1 cup sifted flour 1 cup flour, sifted
Same words, different order, and you would end up with different amounts (mass) of flour, which can be important. The model isn't sophisticated enough (or there aren't enough training examples) to know that whole chicken, shredded and whole shredded chicken are equivalent.
Also, as an aside, it may be worth adding "whole" as a unit.
I'll have a think about this. It's not technically a unit, could it be a size?
I think "whole" operates better as a unit than a size.
Something I see a lot is:
1 whole bay leaf
or
1 whole onion, diced
It's almost like a filler word.
Regardless, it shouldn't be a part of the name I believe.
As for the other bit, I don't disagree. It was just something I ran into that I was curious about, as it'll affect my implementation a bit.
In addition, potentially including 'half' and 'quarter' as well. (1 half onion, diced) It's unlikely that these would ever be used with an amount other than 1, so it's probably something that I could check for after parsing if it's not something you're interested in.