Jinchen Ge
Results
2
issues of
Jinchen Ge
## Describe the bug For most tokenizers I have tested (e.g. the RoBERTa tokenizer), the data preprocessing cache are not fully reused in the first few runs, although their `.arrow`...
bug
Currently, many LayerNorm's eps are smaller than 6.1e-5 (smallest fp16 value), which might cause underflow.