ThangLD201
ThangLD201
I have a speaker diarization dataset in Vietnamese, where, in every audio file, segments of speakers are already annotated. How should I prepare and process data to be able to...
Hello, I've read and really appreciated your team's wonderful works on SRU++. I want to implement this architecture in other tasks, but i'm having problem finding the documentation on SRU++,...
### 🚀 The feature As far as I know, there are no examples or documentation on serving Speech2Text models from Huggingface, such as Wav2Vec2. How could I enable serving with...
I notice that doing inference with language model on large amount of texts can be quite slow. In particular, it took me 11 minutes to decode around 4600 lines of...
``` ./build/bin/main -m /tmp/ReluLLaMA-7B-PowerInfer-GGUF/llama-7b-relu.powerinfer.gguf -n 128 -t 8 --vram-budget 40 -p "Hi. How are you ?" Log start main: build = 1560 (2217e7f) main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0...
Hi @frederick0329, for sequence tagging (e.g. NER) one would need to predict label for each token in the sequence per a test sample. In this case, the loss is averaged...
Hi @yixinL7, I was training a BRIO model on a different dataset (RedditTIFU) and observed conflicting trends between the mle and ranking loss. I start from a converged mle checkpoint....
Hi, thanks for the great works ! There are a bit of details regarding correlation of LaSE in the paper that I did not quite understand. For each target language,...
Hi, I'm using a 40GB VRAM memory GPU but get out-of-memory error. The error seems to come from `src/model_bert.py`, Line 339 (alpha_f): ``` _, attention_probs, value_layer = self_outputs output_head_weights =...
Hi @afshinrahimi, @yuan-li, do you still keep the raw data (un-tokenized one) ? Also, which tokenizer did you use for this dataset ? I need to work with the raw...