dyyoungg
dyyoungg
按照论文的意思,第一步应该是让语言模型理解语音的语义,所以使用speechTokenizer去获得语音的离散化token表示,讲道理是应该将speechtokenizer codebook的表示project后(因为hidden dim可能不一致,需要project)添加到 语言模型的embedding层中,然后使用自回归形式预测语音token。但我在code中没有看到任何关于speechtokenizer的部分,并且tokenize部分 更像是在扩大词表,训speech的文本语料,我实在不能理解。 ``` # line 230 ~ 250 text_column_name = "text" if "text" in column_names else column_names[0] def tokenize_function(examples): output = tokenizer(examples[text_column_name]) return output tokenized_cache_file_names = {...
I compared two experimental data setups. setting 1: WenetSpeech(Chinese)only setting 2: Wenet + Giga (about 1:1, Chinese + English) It's interesting that training on setting 1 can't decrease normally (blue...
Hello, the download link seems to be unavailable: https://github.com/fishaudio/chinese-hubert-soft/releases/download/v1/chinese-hubert-soft-v1.ckpt. Could you please check or provide an updated link?
## summary This PR introduces a comprehensive performance overhaul of the multimodal resource allocation pipeline. It refactors both the `httpserver.manager` and the server (`CacheServer`) to replace sequential, "chatty" operations with...
Hello VILA team! First, thank you for open-sourcing this incredible family of Vision Language Models! The work on VILA, NVILA, and is truly impressive, and the focus on efficiency and...