fairseq

fairseq copied to clipboard

facebookresearch

Reame
Issues

spm decoding doesn't handle byte fallback

Open erip opened this issue 3 years ago • 0 comments

🐛 Bug

When training an spm model with byte fallback, the decoded output in fairseq doesn't replace the bytes with the appropriate character.

To Reproduce

Train spm model with --byte_fallback enabled. Train fairseq model on encoded text, do fairseq inference, observe <0x..> in your outputs.

Code sample

Expected behavior

The decoded output should perform proper spm decoding to mirror the spm_decode behavior.

Environment

fairseq Version (e.g., 1.0 or main): main
PyTorch Version (e.g., 1.0) 1.12
OS (e.g., Linux): all
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

Jul 24 '22 23:07 erip