Basile Dura comments

Results 9 comments of


                                            Basile Dura

identify tables

Great idea @aricohen93. Do you have time to work on it and propose an actual pipeline component? I figure `eds.tables`, within `edsnlp/pipelines/misc`?

identify tables

@aricohen93 have you put more thought into this ? I suppose we could add this quite easily, perhaps on an "experimental" status?

Architecture choice on custom extensions

@percevalw, @Thomzoy, @Aremaki, I'd love to get your thoughts on this!

Architecture choice on custom extensions

Sounds good! :tada:

Feature request: Score

Thanks for the heads up! A few thoughts on this, for future reference: 1. spaCy's `is_num` attribute could be helpful there 2. We could draw inspiration from the `eds.measures` pipeline...

NER fails on warning "Token indices sequence length is longer than the specified maximum"

Hello @schudoku, the issue comes from the fact that spaCy tokens and the output of HuggingFace's tokenizer do not align exactly. In particular, `\r\n...\r\n` and `contactcontact...contact` are treated as single...

NER fails on warning "Token indices sequence length is longer than the specified maximum"

> When the chunk is too long for the model, is only the rest of the chunk dropped or the rest of the complete document? Not exactly: this is done...

docs: fix trailing ``` in mod.rs example

I missed you comment, sorry about that! I merged the upstream changes.

`lax::UPLO` references column-major layout

Linked to #368: > This could very likely be a misunderstanding on my end. But it may be related to the reversal of UPLO::Upper and UPLO::Lower.