Nathan Fradet

Results 63 comments of Nathan Fradet

Hi, thanks for the report ! This error happened when decoding sequences of tokens with invalid values. It has been resolved in 956738765147d7935088eb9e3e55bc8a4ab37271 I'll release the update today

Update released, it should be fixed with the version 1.2.6

My pleasure! Yes, for this one you just have to add the `TimeSignature` entry in the `additional_tokens` dictionary, as: ```Python additional_tokens = { 'Chord': True, 'Rest': True, 'Tempo': True, 'TimeSignature':...

After creating the tokenizer, what does this returns ? ```Python for i, vocab in enumerate(tokenizer.vocab): print(f'vocab {i}: {len(vocab)} tokens') ```

Ok, the error is caused because the you are trying to decode a token where the first index (token family) of value 3, whereas the vocabulary only contains 3 tokens...

You are welcome, thank you for the bug reports! I am still not sure about the purpose of `token2vocab` and `vocab2token`, as the tokens given by miditok should be well...

I am not quite sure this is what is done. 🤷‍♂️ If you want to use SOS / EOS tokens you would have to specify it when creating the tokenizer:...

Bumping this, as this would make the usage of the lib more easy and straightforward for modalities other than text, e.g. molecules, DNA, music. In [MidiTok](https://github.com/Natooz/MidiTok) we basically map each...

@Narsil @ArthurZucker how difficult do you estimate this?

Examples are broken here is due to `Seq2SeqTrainingArguments.generation_max_length` and `Seq2SeqTrainingArguments.generation_num_beams` being removed. From here what do you suggest between putting them back (and send a warning ?) or / and...