FasterTransformer
FasterTransformer copied to clipboard
GPT-NeoX gives poor results using FP16
Branch/Tag/Commit
main
Docker Image Version
none
GPU name
T4
CUDA Driver
525.60.13
Reproduced Steps
## Steps
1. Download public GPT-NeoX Model https://huggingface.co/EleutherAI/pythia-70m
2. Convert checkpoint using `huggingface_gptneox_convert.py`
3. Run the example file https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/gptneox/gptneox_example.py
with input file that contains: "What is the boiling point of water?"
## Environment
* torch version '2.0.1+cu117'
* transformers version '4.29.0'
## Inference Settings
* `fp16` for inference inference_data_type
* beam_width = 1
* output_len = 60
* repetition_penalty = 1.1
Rest of params used default values from HuggingFace's [GenerationConfig](https://huggingface.co/docs/transformers/v4.29.0/en/main_classes/text_generation#transformers.GenerationConfig).
## Result
I observed a lot of nonsense tokens being added. It works fine with fp32.