prose icon indicating copy to clipboard operation
prose copied to clipboard

Documentation of EntityContext (v2)

Open jwmach1 opened this issue 6 years ago • 1 comments

Are EntityContext Start and End zero based?

I have the assumption that since golang is zero based on the index of the runes in a string, that the Start and End of the Spans in an EntityContext are zero based. Could you confirm and perhaps update the godoc? I traced into the UsingEntities method, and found the adjustPos method that seemed to make up for off-by-one problems. I'd like to be sure my training model is correct.

(to be clear, on the v2 branch)

jwmach1 avatar Feb 23 '19 14:02 jwmach1

At the core of this question is really, "How can I verify my prose.ModelFromData("name", prose.UsingEntities(data)) is accurate? I'm building a model that includes unicode. Which I realize this might be tied into the question of supporting other languages. Is twitter-english really english with all the unicode emoji?

jwmach1 avatar Feb 24 '19 15:02 jwmach1