English forms inside head_templates not properly parsed
The forms array for the term big, fat, hairy deal should be:
[
{ form: 'big, fat, hairy deals', tags: [ 'plural' ] }
]
Instead it looks like this:
[
{ form: 'big', tags: [ 'plural' ] },
{ form: 'fat', tags: [ 'plural' ] }
]
The term speak to has the forms:
[
{
form: 'speaks to',
tags: [ 'present', 'singular', 'third-person' ]
},
{ form: 'speaking to', tags: [ 'participle', 'present' ] },
{ form: 'spoke to', tags: [ 'past' ] },
{ form: 'to', tags: [ 'colloquial', 'participle', 'past' ] }
]
It should be:
[
{
form: 'speaks to',
tags: [ 'present', 'singular', 'third-person' ]
},
{ form: 'speaking to', tags: [ 'participle', 'present' ] },
{ form: 'spoke to', tags: [ 'past' ] },
{ form: 'spoken to', tags: [ 'participle', 'past' ] }
]
I also don't know where it's getting the colloquial tag since I don't see it on Wiktionary.
At first guess, the first one has problems due to the commas, which I guess is a bit obvious but bears saying anyhow... For the second, the "spoken" gets parsed as a tag "spoken" -> ["colloquial"]. Both are probably going to be really annoying edge cases involving delving into some spectacularly tricky bits of code, so unless someone else wants to take a look at it I'm leaving this on the backburner for a bit.
No problem.
It seems the "forms" data are added from the parse_word_head() function. That complex function processes the expanded plain text, perhaps it could be easier to work with if the function is processing HTML nodes.
This has been long coming, but now with the addition of a kludge so that split_at_semi_comma() can skip given words and phrases, I've made a commit that should fix this.