fairseq
fairseq copied to clipboard
spm decoding doesn't handle byte fallback
🐛 Bug
When training an spm model with byte fallback, the decoded output in fairseq doesn't replace the bytes with the appropriate character.
To Reproduce
Train spm model with --byte_fallback enabled. Train fairseq model on encoded text, do fairseq inference, observe <0x..> in your outputs.
Code sample
Expected behavior
The decoded output should perform proper spm decoding to mirror the spm_decode behavior.
Environment
- fairseq Version (e.g., 1.0 or main): main
- PyTorch Version (e.g., 1.0) 1.12
- OS (e.g., Linux): all
- How you installed fairseq (
pip, source): source - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information: