language-pretraining icon indicating copy to clipboard operation
language-pretraining copied to clipboard

Pre-training Language Models for Japanese

Results 1 language-pretraining issues
Sort by recently updated
recently updated
newest added

Thank you for releasing [bert-small-japanese-fin](https://huggingface.co/izumi-lab/bert-small-japanese-fin) and other Electra models for FinTech. But I've found they tokenize "四半期連結会計期間末日満期手形" in bad way: ``` >>> from transformers import AutoTokenizer >>> tokenizer=AutoTokenizer.from_pretrained("izumi-lab/bert-small-japanese-fin") >>> tokenizer.tokenize("四半期連結会計期間末日満期手形")...

bug
enhancement