Results 2 issues of Jinchen Ge

## Describe the bug For most tokenizers I have tested (e.g. the RoBERTa tokenizer), the data preprocessing cache are not fully reused in the first few runs, although their `.arrow`...

bug

Currently, many LayerNorm's eps are smaller than 6.1e-5 (smallest fp16 value), which might cause underflow.