ingredient-parser icon indicating copy to clipboard operation
ingredient-parser copied to clipboard

Tagging inconsistent based on order

Open ImBaedin opened this issue 1 year ago • 3 comments

Really quick, huge thanks for creating/maintaining this library. You're amazing.

I don't know if this is even something that's possible to address, but it appears that the tagging of the tokens is dependent on the order. This makes sense, but it means that the same string can yield different results based on the order of the tokens.

Example

image

image

image

Also, as an aside, it may be worth adding "whole" as a unit.

Again, thank you so much!

ImBaedin avatar Jul 19 '24 06:07 ImBaedin

Hi @ImBaedin

You're right that the tagging is dependent on the order of the tokens. The local context for a token is very important in determining the right tag.

The difference between your first two examples and the third is deliberate. The order of words in the sentence can change the result in a significantly. In your example, there isn't really a difference because either way you end with the same amount of chicken. But consider a sentence like:

1 cup sifted flour 1 cup flour, sifted

Same words, different order, and you would end up with different amounts (mass) of flour, which can be important. The model isn't sophisticated enough (or there aren't enough training examples) to know that whole chicken, shredded and whole shredded chicken are equivalent.

Also, as an aside, it may be worth adding "whole" as a unit.

I'll have a think about this. It's not technically a unit, could it be a size?

strangetom avatar Jul 19 '24 18:07 strangetom

I think "whole" operates better as a unit than a size.

Something I see a lot is:

1 whole bay leaf

or

1 whole onion, diced

It's almost like a filler word.

Regardless, it shouldn't be a part of the name I believe.

As for the other bit, I don't disagree. It was just something I ran into that I was curious about, as it'll affect my implementation a bit.

ImBaedin avatar Jul 19 '24 19:07 ImBaedin

In addition, potentially including 'half' and 'quarter' as well. (1 half onion, diced) It's unlikely that these would ever be used with an amount other than 1, so it's probably something that I could check for after parsing if it's not something you're interested in.

ImBaedin avatar Jul 20 '24 03:07 ImBaedin