MidiTok icon indicating copy to clipboard operation
MidiTok copied to clipboard

after tokenizing with trained tokenizer, the "tokens" array contains original tokens

Open theglassofwater opened this issue 1 year ago • 3 comments

after tokenizing a song with a trained tokenizer, the "tokens" array contains only the base tokens, the "ids" array is fine containing newly generated vocab, i was wondering if this was design choice or bug

theglassofwater avatar May 09 '24 10:05 theglassofwater

Hi, This is a design choice (i.e. to only alter the ids) as the main purpose of encoding the sequence is to fed the ids to a model. If you really need to explore what encoded ids are made of, you can always use the vocabulary dictionaries to convert the encoded ids https://github.com/Natooz/MidiTok/blob/main/miditok/midi_tokenizer.py#L111

Natooz avatar May 09 '24 10:05 Natooz

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar May 31 '24 02:05 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jun 23 '24 02:06 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jul 15 '24 02:07 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Jul 23 '24 02:07 github-actions[bot]