ThangLD201

Results 19 issues of ThangLD201

I have a speaker diarization dataset in Vietnamese, where, in every audio file, segments of speakers are already annotated. How should I prepare and process data to be able to...

question

Hello, I've read and really appreciated your team's wonderful works on SRU++. I want to implement this architecture in other tasks, but i'm having problem finding the documentation on SRU++,...

### 🚀 The feature As far as I know, there are no examples or documentation on serving Speech2Text models from Huggingface, such as Wav2Vec2. How could I enable serving with...

good first issue
p2

I notice that doing inference with language model on large amount of texts can be quite slow. In particular, it took me 11 minutes to decode around 4600 lines of...

``` ./build/bin/main -m /tmp/ReluLLaMA-7B-PowerInfer-GGUF/llama-7b-relu.powerinfer.gguf -n 128 -t 8 --vram-budget 40 -p "Hi. How are you ?" Log start main: build = 1560 (2217e7f) main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0...

Hi @frederick0329, for sequence tagging (e.g. NER) one would need to predict label for each token in the sequence per a test sample. In this case, the loss is averaged...

Hi @yixinL7, I was training a BRIO model on a different dataset (RedditTIFU) and observed conflicting trends between the mle and ranking loss. I start from a converged mle checkpoint....

Hi, thanks for the great works ! There are a bit of details regarding correlation of LaSE in the paper that I did not quite understand. For each target language,...

Hi, I'm using a 40GB VRAM memory GPU but get out-of-memory error. The error seems to come from `src/model_bert.py`, Line 339 (alpha_f): ``` _, attention_probs, value_layer = self_outputs output_head_weights =...

Hi @afshinrahimi, @yuan-li, do you still keep the raw data (un-tokenized one) ? Also, which tokenizer did you use for this dataset ? I need to work with the raw...