Dalton Harper
Dalton Harper
@michael-conrad have you trained a new model for your language? I face the same problem, hope you can share your experience.
Yes, here is the distribution. For each word (token) in a given sample: ``` 1 85%: keep original 2 15%: 3 - 80%: whole word masking (e.g. nice —> MMMM)...
Thanks for your clarification. I have trained PL-bert for my language and tried to evaluate it by asking it to predict masked/unmasked tokens and phonemes. In most cases it's prediction...
@jav-ed your calculation looks correct. However, as @yl4579 clarified, we don't need to change the distribution. But I still suggest you to add the following line right after the last...