Nathan Fradet comments

Results 63 comments of


                                            Nathan Fradet

ValueError: invalid literal for int() with base 10: 'Ignore'

Hi, thanks for the report ! This error happened when decoding sequences of tokens with invalid values. It has been resolved in 956738765147d7935088eb9e3e55bc8a4ab37271 I'll release the update today

ValueError: invalid literal for int() with base 10: 'Ignore'

Update released, it should be fixed with the version 1.2.6

ValueError: invalid literal for int() with base 10: 'Ignore'

My pleasure! Yes, for this one you just have to add the `TimeSignature` entry in the `additional_tokens` dictionary, as: ```Python additional_tokens = { 'Chord': True, 'Rest': True, 'Tempo': True, 'TimeSignature':...

ValueError: invalid literal for int() with base 10: 'Ignore'

After creating the tokenizer, what does this returns ? ```Python for i, vocab in enumerate(tokenizer.vocab): print(f'vocab {i}: {len(vocab)} tokens') ```

ValueError: invalid literal for int() with base 10: 'Ignore'

Ok, the error is caused because the you are trying to decode a token where the first index (token family) of value 3, whereas the vocabulary only contains 3 tokens...

ValueError: invalid literal for int() with base 10: 'Ignore'

You are welcome, thank you for the bug reports! I am still not sure about the purpose of `token2vocab` and `vocab2token`, as the tokens given by miditok should be well...

ValueError: invalid literal for int() with base 10: 'Ignore'

I am not quite sure this is what is done. 🤷‍♂️ If you want to use SOS / EOS tokens you would have to specify it when creating the tokenizer:...

Train tokenizer on integer lists, not strings

Bumping this, as this would make the usage of the lib more easy and straightforward for modalities other than text, e.g. molecules, DNA, music. In [MidiTok](https://github.com/Natooz/MidiTok) we basically map each...

Train tokenizer on integer lists, not strings

@Narsil @ArthurZucker how difficult do you estimate this?

Seq2seq trainer generation config arg

Examples are broken here is due to `Seq2SeqTrainingArguments.generation_max_length` and `Seq2SeqTrainingArguments.generation_num_beams` being removed. From here what do you suggest between putting them back (and send a warning ?) or / and...