TensorRT-LLM Support Gemma 1.1 model

System Info

Model: https://huggingface.co/google/gemma-1.1-2b-it

Who can help?

@byshiue

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Use GemmaForCausalLM.from_hugging_face().save_checkpoint() API with for https://huggingface.co/google/gemma-1.1-2b-it model, this fails for 1.1 model but succeeds for 1.0 model (https://huggingface.co/google/gemma-2b-it)
Use trt-llm build tool to build an engine, this fails for 1.0 model

Expected behavior

Successfully working TRT-LLM engine

actual behavior

Either checkpoint (for 1.1 version) or engine (for 1.0 version) build fails

additional notes

I believe issue for 1.1 comes from gelu_pytorch_tanh activation function, I'm not sure what breaks build for 1.0

Jul 04 '24 03:07 ttim

Hi @ttim , if my understanding is correct, the gelu_pytorch_tanh should be equal to gelu activation function, they are different implementation. Could you please share the error log when building Gemma-1.1?

Jul 04 '24 05:07 QiJune

@QiJune it fails at this line https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/layers/mlp.py#L49 , presumably because of the hf configuration of the model specifying gelu_pytorch_tanh. I believe the fix is to add this alias here https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/functional.py#L5347

Jul 04 '24 05:07 ttim

@ttim ， Yes, I think so. Could you please submit a MR to fix it? Or you prefer to waiting for us to fix it?

Jul 04 '24 07:07 QiJune

@QiJune there are two issues here. Activation function issue I can fix myself. But apart from it from_hugging_face is broken for Gemma models in other code path I can't really debug myself. It happens both for Gemma 1 and 1.1 (after activation function fix. Here's error on most current dev version:

AssertionError: Gemma only supports share_embedding_table

Even if this is fixed it fails with some error from.TensorRT about incompatible types.

Jul 04 '24 20:07 ttim

@QiJune I've created PR for the activation function: https://github.com/NVIDIA/TensorRT-LLM/pull/1897

Jul 04 '24 20:07 ttim

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.

Oct 31 '24 05:10 AdamzNV