Basile Dura
Basile Dura
Great idea @aricohen93. Do you have time to work on it and propose an actual pipeline component? I figure `eds.tables`, within `edsnlp/pipelines/misc`?
@aricohen93 have you put more thought into this ? I suppose we could add this quite easily, perhaps on an "experimental" status?
@percevalw, @Thomzoy, @Aremaki, I'd love to get your thoughts on this!
Sounds good! :tada:
Thanks for the heads up! A few thoughts on this, for future reference: 1. spaCy's `is_num` attribute could be helpful there 2. We could draw inspiration from the `eds.measures` pipeline...
Hello @schudoku, the issue comes from the fact that spaCy tokens and the output of HuggingFace's tokenizer do not align exactly. In particular, `\r\n...\r\n` and `contactcontact...contact` are treated as single...
> When the chunk is too long for the model, is only the rest of the chunk dropped or the rest of the complete document? Not exactly: this is done...
I missed you comment, sorry about that! I merged the upstream changes.
Linked to #368: > This could very likely be a misunderstanding on my end. But it may be related to the reversal of UPLO::Upper and UPLO::Lower.