Basile Dura

Results 9 comments of Basile Dura

Great idea @aricohen93. Do you have time to work on it and propose an actual pipeline component? I figure `eds.tables`, within `edsnlp/pipelines/misc`?

@aricohen93 have you put more thought into this ? I suppose we could add this quite easily, perhaps on an "experimental" status?

@percevalw, @Thomzoy, @Aremaki, I'd love to get your thoughts on this!

Thanks for the heads up! A few thoughts on this, for future reference: 1. spaCy's `is_num` attribute could be helpful there 2. We could draw inspiration from the `eds.measures` pipeline...

Hello @schudoku, the issue comes from the fact that spaCy tokens and the output of HuggingFace's tokenizer do not align exactly. In particular, `\r\n...\r\n` and `contactcontact...contact` are treated as single...

> When the chunk is too long for the model, is only the rest of the chunk dropped or the rest of the complete document? Not exactly: this is done...

I missed you comment, sorry about that! I merged the upstream changes.

Linked to #368: > This could very likely be a misunderstanding on my end. But it may be related to the reversal of UPLO::Upper and UPLO::Lower.