Eric Lam

Results 20 comments of Eric Lam

你們好 我把Google版的所有模型上傳上去huggingface了 tokenizer要改用BertTokenizer https://huggingface.co/models?search=albert_chinese 在QA和NER Task上面有極為不錯的結果 NER colab: https://colab.research.google.com/drive/1r9gLof8P3Gy81bolIYX5uH2RktdD2Vfk QA Colab: https://colab.research.google.com/drive/1hqaTKxd3VtX2XkvjiO0FMtY-rTZX30MJ

好啊 我在model card那邊有寫一個簡單示例,每一個模型結果都稍稍不一樣,模型越大,預測結果也越好 由於 albert_chinese_base 模型沒有用 sentencepiece 用AlbertTokenizer會載不進詞表,因此需要改用BertTokenizer !!! 我們可以跑MaskedLM預測來驗證這個做法是否正確 ## Justify (驗證有效性) 如果模型是正確的話,直接用來預測mask的字應該會有合理的結果。 [colab trial](https://colab.research.google.com/drive/1Wjz48Uws6-VuSHv_-DcWLilv77-AaYgj) ```python from transformers import * import torch from torch.nn.functional import softmax pretrained = 'voidful/albert_chinese_base'...

> > 好啊 > > 我在model card那邊有寫一個簡單示例,每一個模型結果都稍稍不一樣,模型越大,預測結果也越好 > > 由於 albert_chinese_base 模型沒有用 sentencepiece > > 用AlbertTokenizer會載不進詞表,因此需要改用BertTokenizer !!! > > 我們可以跑MaskedLM預測來驗證這個做法是否正確 > > ## Justify (驗證有效性) > > 如果模型是正確的話,直接用來預測mask的字應該會有合理的結果。 > >...

I agree that we should correct what is actually causing the error To clear this in more general way: when one of the ref is empty or hyp is empty...

You can change the threshold here: https://github.com/oliverguhr/wav2vec2-live/blob/a22f099010820776cf179a3d142411243b392d94/live_asr.py#L39 the value of vad is > an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is...

Hi Rongao Sorry for late reply, It seems that there is some bug in https://github.com/voidful/TextRL/blob/30109590ec2fe395a2e04cca19a46f6895792b88/textrl/actor.py#L21 maybe you can change it into `pfrl.policies.SoftmaxCategoricalHead` https://pfrl.readthedocs.io/en/latest/policies.html I may need more time to further...

The same problem also occurs in `cz init` When I init cz for my existing project, it detect version from `git describe --abbrev=0 --tags`. However, this is not the version...

Hi all, the issue probably cause by https://github.com/huggingface/transformers/blob/bffac926ca6bc6c965a92bfbfd00c567a2c0fb90/src/transformers/models/t5/modeling_t5.py#L1147C8-L1147C8 it will add a position_bias after each layer output, so the initialize model will perform badly

It seems that map will also cause this issue ### Steps to reproduce the bug ```python from datasets import load_dataset original_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True) print(original_dataset.features.keys()) def test(data): return...

Can you explain how reinforcement learning can be used for token classification? I would appreciate more information on this topic.