MidiTok after tokenizing with trained tokenizer, the "tokens" array contains original tokens

after tokenizing a song with a trained tokenizer, the "tokens" array contains only the base tokens, the "ids" array is fine containing newly generated vocab, i was wondering if this was design choice or bug

May 09 '24 10:05 theglassofwater

Hi, This is a design choice (i.e. to only alter the ids) as the main purpose of encoding the sequence is to fed the ids to a model. If you really need to explore what encoded ids are made of, you can always use the vocabulary dictionaries to convert the encoded ids https://github.com/Natooz/MidiTok/blob/main/miditok/midi_tokenizer.py#L111

May 09 '24 10:05 Natooz

This issue is stale because it has been open for 30 days with no activity.

May 31 '24 02:05 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

Jun 23 '24 02:06 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

Jul 15 '24 02:07 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jul 23 '24 02:07 github-actions[bot]