Support Gemma 1.1 model
System Info
Model: https://huggingface.co/google/gemma-1.1-2b-it
Who can help?
@byshiue
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Use
GemmaForCausalLM.from_hugging_face().save_checkpoint()API with for https://huggingface.co/google/gemma-1.1-2b-it model, this fails for 1.1 model but succeeds for 1.0 model (https://huggingface.co/google/gemma-2b-it) - Use trt-llm build tool to build an engine, this fails for 1.0 model
Expected behavior
Successfully working TRT-LLM engine
actual behavior
Either checkpoint (for 1.1 version) or engine (for 1.0 version) build fails
additional notes
I believe issue for 1.1 comes from gelu_pytorch_tanh activation function, I'm not sure what breaks build for 1.0
Hi @ttim , if my understanding is correct, the gelu_pytorch_tanh should be equal to gelu activation function, they are different implementation. Could you please share the error log when building Gemma-1.1?
@QiJune it fails at this line https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/layers/mlp.py#L49 , presumably because of the hf configuration of the model specifying gelu_pytorch_tanh. I believe the fix is to add this alias here https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/functional.py#L5347
@ttim , Yes, I think so. Could you please submit a MR to fix it? Or you prefer to waiting for us to fix it?
@QiJune there are two issues here. Activation function issue I can fix myself. But apart from it from_hugging_face is broken for Gemma models in other code path I can't really debug myself. It happens both for Gemma 1 and 1.1 (after activation function fix. Here's error on most current dev version:
AssertionError: Gemma only supports share_embedding_table
Even if this is fixed it fails with some error from.TensorRT about incompatible types.
@QiJune I've created PR for the activation function: https://github.com/NVIDIA/TensorRT-LLM/pull/1897
As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.
Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.