tensorflow Failure in convert Gemma 2B models to TfLite

I tried converting Google Gemma 2B models to TfLite. Found it ending in failure

1. System information

Ubuntu 22.04
TensorFlow installation (installed with keras-nlp) :
TensorFlow library (installed with keras-nlp):

2. Code

import os
import keras
import os
import numpy as np
import keras_nlp
import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras
from tensorflow.lite.python import interpreter
import time

os.environ["KAGGLE_USERNAME"] = "rag"
os.environ["KAGGLE_KEY"] = 'e7c'
os.environ["KERAS_BACKEND"] = "tensorflow"  # Or "tensorflow" or "torch".

preprocessor = keras_nlp.models.GemmaCausalLMPreprocessor.from_preset('gemma_2b_en', sequence_length=4096, add_end_token=True
)
generator = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

def run_inference(input, generate_tflite):
  interp = interpreter.InterpreterWithCustomOps(
      model_content=generate_tflite,
      custom_op_registerers=tf_text.tflite_registrar.SELECT_TFTEXT_OPS)
  interp.get_signature_list()

  preprocessor_output = preprocessor.generate_preprocess(
    input, sequence_length=preprocessor.sequence_length
  )
  generator = interp.get_signature_runner('serving_default')
  output = generator(preprocessor_output)
  output = preprocessor.generate_postprocess(output["output_0"])
  print("\nGenerated with TFLite:\n", output)

generate_function = generator.make_generate_function()
concrete_func = generate_function.get_concrete_function({
  "token_ids": tf.TensorSpec([None, 4096]),
  "padding_mask": tf.TensorSpec([None, 4096])
})


converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func],
                                                            generator)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
converter.allow_custom_ops = True
converter.target_spec.experimental_select_user_tf_ops = ["UnsortedSegmentJoin", "UpperBound"]
converter._experimental_guarantee_all_funcs_one_use = True
generate_tflite = converter.convert()
run_inference("I'm enjoying a", generate_tflite)

with open('unquantized_mistral.tflite', 'wb') as f:
  f.write(generate_tflite)

3. Failure after conversion

I am getting this error:

tensorflow/core.py":65:1))))))))))))))))))))))))))]): error: missing attribute 'value' LLVM ERROR: Failed to infer result type(s). Aborted (core dumped)

5. (optional) Any other info / logs

2024-02-22 06:34:41.094712: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-02-22 06:34:41.094742: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-02-22 06:34:41.095691: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmp58p378bn
2024-02-22 06:34:41.140303: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-02-22 06:34:41.140329: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /tmp/tmp58p378bn
2024-02-22 06:34:41.233389: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-02-22 06:34:41.264724: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-02-22 06:34:43.697440: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /tmp/tmp58p378bn
2024-02-22 06:34:44.189111: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 3093423 microseconds.
2024-02-22 06:34:45.009212: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
loc(fused["ReadVariableOp:", callsite("decoder_block_0_1/attention_1/attention_output_1/Cast/ReadVariableOp@__inference_generate_step_12229"("/workspace/gem.py":38:1) at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_causal_lm.py":258:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_causal_lm.py":235:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_causal_lm.py":212:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_causal_lm.py":214:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py":816:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/ops/operation.py":42:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":157:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_decoder_block.py":147:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py":816:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/ops/operation.py":42:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":157:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras_nlp/models/gemma/gemma_attention.py":193:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py":816:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":118:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/ops/operation.py":42:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py":157:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/einsum_dense.py":218:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/ops/numpy.py":2414:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/numpy.py":90:1 at callsite("/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/numpy.py":91:1 at "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/core.py":65:1))))))))))))))))))))))))))]): error: missing attribute 'value'
LLVM ERROR: Failed to infer result type(s).
Aborted (core dumped)

Feb 22 '24 07:02 RageshAntonyHM

Hi @RageshAntonyHM,

I am trying to reproduce the issue while I had another error ModuleNotFoundError: No module named 'keras_nlp.backend, could you please confirm the version of it.

Thank You

Feb 23 '24 07:02 LakshmiKalaKadali

@LakshmiKalaKadali

it is keras 3.0.5 and installed keras-nlp via pip install git+https://github.com/keras-team/keras-nlp (0.8.1)

Feb 23 '24 07:02 RageshAntonyHM

@LakshmiKalaKadali

first install pip install git+https://github.com/keras-team/keras-nlp and then update Keras (pip install -U keras)

Feb 23 '24 07:02 RageshAntonyHM

Then install tensorflow-datasets also @LakshmiKalaKadali

Feb 23 '24 07:02 RageshAntonyHM

Also crashing in Colab with or without quantization.

Feb 24 '24 15:02 farmaker47

@farmaker47

This conversion pipeline needs lot of Vram. At Least 24 GB.

@LakshmiKalaKadali any updates on this please?

Feb 24 '24 15:02 RageshAntonyHM

@RageshAntonyHM I got same crash in colab A100(40GB GPU RAM).

Feb 24 '24 17:02 urim85

@urim85

Yeah. Actually, till it is crashing for me in 48 GB RTX 6000.

(What I told to @farmaker47 was, it will crash prematurally if VRAM is low. But also crashes in final step even if you have enough VRAM)

Feb 24 '24 17:02 RageshAntonyHM

I saw that training is working OK having installed first TensorFlow nightly version (2.17.0-dev20240223). @RageshAntonyHM can you try with nightly version and check again the conversion?

Feb 25 '24 07:02 farmaker47

@farmaker47

How to install TensorFlow nightly version? I tried pip install tf-nightly, but I am getting error

File "/usr/local/lib/python3.10/dist-packages/keras/src/backend/tensorflow/core.py", line 5, in from tensorflow.compiler.tf2xla.python.xla import dynamic_update_slice ModuleNotFoundError: No module named 'tensorflow.compiler.tf2xla'

Name: tf-nightly Version: 2.17.0.dev20240223

Feb 25 '24 07:02 RageshAntonyHM

I work with Colab. So it is

!pip install tf-nightly !pip install -q --upgrade keras-nlp !pip install -q -U keras>=3

Feb 25 '24 08:02 farmaker47

@farmaker47

Now, again I am getting that first mentioned error

could you please share your notebook link ?

Feb 25 '24 08:02 RageshAntonyHM

The colab is from this example

https://ai.google.dev/gemma/docs/lora_tuning

I have changed nothing. So the idea is if you install tf-nightly the error for conversion disappears? I don't understand from your previous answer if the error is during tf-nightly installation or during conversion.

Feb 25 '24 09:02 farmaker47

@farmaker47

I hope some package conflicts ,like some packages reinstall 'stable' version of tensorflow. Let me check

Feb 25 '24 09:02 RageshAntonyHM

@farmaker47

I able to ran inference already. my problem is, i need to create a TFlite model for Gemma 2B. I think there is some problem still in conversion

i am very new to AI and even python.

Feb 25 '24 09:02 RageshAntonyHM

@farmaker47

I able to ran inference already. my problem is, i need to create a TFlite model for Gemma 2B. I think there is some problem still in conversion

i am very new to AI and even python.

Then we have to wait a little bit so the TF team solves this and provide us the tf-nightly version we can use to convert it.

Feb 25 '24 10:02 farmaker47

@LakshmiKalaKadali

import keras_nlp.backend import ops is not needed. Sorry

But when using all nightly versions, I got some "GraphDef" issue

Feb 25 '24 10:02 RageshAntonyHM

a minimal script to reproduce the issue

import keras
import keras_nlp
import tensorflow as tf

os.environ["KAGGLE_USERNAME"] = '....'
os.environ["KAGGLE_KEY"] = '...'
os.environ["KERAS_BACKEND"] = "tensorflow" 

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
tf.saved_model.save(gemma_lm.backbone, '/tmp/gemma_saved_model/')

f = tf.lite.TFLiteConverter.from_saved_model('/tmp/gemma_saved_model/').convert()

I tested with tf-2.15, 2.16, and 2.17 nightly and their corresponding packages. None of them works.

Feb 26 '24 03:02 freedomtan

Hi @pkgoogle,

I have reproduced the issue in Colab with TF 2.15, the session crashed at the step generator = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en") . Please take a look.

Thank You

Feb 27 '24 11:02 LakshmiKalaKadali

Adding @advaitjain and @paulinesho for visibility.

Feb 27 '24 21:02 ymodak

I believe colab is running out of memory for @LakshmiKalaKadali 's case,

In attempting to replicate the below, I am running into tensorflow-text installation issues (apparently the Gemma tokenizer uses it for the tokenizer), this may be because of the new 2.16 release.

import keras
import keras_nlp
import tensorflow as tf

os.environ["KAGGLE_USERNAME"] = '....'
os.environ["KAGGLE_KEY"] = '...'
os.environ["KERAS_BACKEND"] = "tensorflow" 

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
tf.saved_model.save(gemma_lm.backbone, '/tmp/gemma_saved_model/')

f = tf.lite.TFLiteConverter.from_saved_model('/tmp/gemma_saved_model/').convert()

my error:

TypeError: <class 'keras_nlp.src.models.gemma.gemma_tokenizer.GemmaTokenizer'> could not be deserialized properly. Please ensure that components that are Python object instances (layers, models, etc.) returned by `get_config()` are explicitly deserialized in the model's `from_config()` method.

config={'module': 'keras_nlp.src.models.gemma.gemma_tokenizer', 'class_name': 'GemmaTokenizer', 'config': {'name': 'gemma_tokenizer', 'trainable': True, 'dtype': 'int32', 'proto': None, 'sequence_length': None}, 'registered_name': 'keras_nlp>GemmaTokenizer', 'assets': ['assets/tokenizer/vocabulary.spm'], 'weights': None}.

Exception encountered: Error when deserializing class 'GemmaTokenizer' using config={'name': 'gemma_tokenizer', 'trainable': True, 'dtype': 'int32', 'proto': None, 'sequence_length': None}.

Feb 27 '24 23:02 pkgoogle

I think there is an answer here that it is working: https://github.com/keras-team/keras/issues/19108 It is based on this comment: https://github.com/keras-team/keras/issues/19108#issuecomment-1913421572

So my code now is:

model.export("test", "tf_saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("test")
tflite_model = converter.convert()
with open("model.tflite", "wb") as f:
    f.write(tflite_model)

With the above the conversion finishes and the .tflite model is running into android. I have not used quantization since it is failing into android.

Feb 28 '24 14:02 farmaker47

@farmaker47

I ran like this:

import os
import keras
import os
import numpy as np
import keras_nlp
import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras
from tensorflow.lite.python import interpreter
import time

os.environ["KAGGLE_USERNAME"] = "rag"
os.environ["KAGGLE_KEY"] = 'e7c'
os.environ["KERAS_BACKEND"] = "tensorflow"  # Or "tensorflow" or "torch".

preprocessor = keras_nlp.models.GemmaCausalLMPreprocessor.from_preset('gemma_2b_en', sequence_length=4096, add_end_token=True
)
model = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
print("exporting")
model.export("test", "tf_saved_model")
print("converting")
converter = tf.lite.TFLiteConverter.from_saved_model("test")
tflite_model = converter.convert()
print("writiing")

with open("model.tflite", "wb") as f:
    f.write(tflite_model)

Iit fails at "converting" with this error "GPU:0 in order to run Identity: Dst tensor is not initialized. [Op:Identity] name: " :

2024-02-28 17:40:21.522813: I external/local_tsl/tsl/framework/bfc_allocator.cc:1114] Stats: 
Limit:                     23553966080
InUse:                     23553959680
MaxInUse:                  23553959680
NumAllocs:                        1629
MaxAllocSize:               2097152000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2024-02-28 17:40:21.522846: W external/local_tsl/tsl/framework/bfc_allocator.cc:497] ****************************************************************************************************
Traceback (most recent call last):
  File "/workspace/gem.py", line 22, in <module>
    converter = tf.lite.TFLiteConverter.from_saved_model("test")
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/lite/python/lite.py", line 2087, in from_saved_model
    saved_model = _load(saved_model_dir, tags)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 912, in load
    result = load_partial(export_dir, None, tags, options)["root"]
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 1043, in load_partial
    loader = Loader(object_graph_proto, saved_model_proto, export_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 226, in __init__
    self._restore_checkpoint()
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 561, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 1479, in restore
    checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root,
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/restore.py", line 62, in restore
    restore_ops = self._restore_descendants(reader)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/restore.py", line 463, in _restore_descendants
    current_position.checkpoint.restore_saveables(
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 379, in restore_saveables
    registered_savers).restore(self.save_path_tensor, self.options)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/functional_saver.py", line 499, in restore
    restore_ops = restore_fn()
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/functional_saver.py", line 467, in restore_fn
    ret = restore_fn(restored_tensors)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 747, in _restore_from_tensors
    return saveable_object_to_restore_fn(self.saveables)(restored_tensors)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 784, in _restore_from_tensors
    restore_ops[saveable.name] = saveable.restore(
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saving/saveable_object_util.py", line 602, in restore
    ret = restore_fn(restored_tensor_dict)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 779, in _restore_from_tensors
    restored_tensor = array_ops.identity(
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/weak_tensor_ops.py", line 88, in wrapper
    return op(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py", line 5883, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from 

/job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run Identity: Dst tensor is not initialized. [Op:Identity] name:

Am I doing something wrong ?

Feb 28 '24 17:02 RageshAntonyHM

You can skip the Kaggle_key...😀

I think it's a memory error

Feb 28 '24 17:02 farmaker47

@farmaker47

I rented 48 GB GPU, now got another error :

converting
2024-02-28 17:51:39.439053: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-02-28 17:51:39.439136: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-02-28 17:51:39.440216: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: test
2024-02-28 17:51:39.459869: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-02-28 17:51:39.459902: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: test
2024-02-28 17:51:39.760067: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-02-28 17:51:39.808678: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-02-28 17:51:44.034223: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: test
2024-02-28 17:51:44.392812: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 4952596 microseconds.
2024-02-28 17:51:44.754857: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 220, Total Ops 2416, % non-converted = 9.11 %
 * 184 ARITH ops, 36 TF ops

- arith.constant:  184 occurrences  (f32: 153, i32: 31)



- tf.StridedSlice:   36 occurrences  (i1: 18, i32: 18)
  (f32: 127)
  (f32: 36)
  (i1: 18)
  (f32: 36, i32: 20)
  (i32: 72)
  (f32: 18)
  (f32: 37)
  (i1: 18)
  (f32: 19, i1: 18)
  (f32: 127)
  (f32: 1, i32: 90)
  (f32: 18)
  (i1: 18)
  (i32: 1)
  (f32: 37)
  (i32: 19)
  (f32: 272)

  (f32: 36, i32: 90)
  (f32: 18)
  (i1: 18)
  (f32: 252)
  (f32: 18)
  (i32: 180)
  (f32: 18)
  (f32: 18)
  (f32: 36)
  (f32: 37)
  (f32: 37)
  (f32: 36, i32: 180)
  (f32: 54)
  (f32: 90)
  (i32: 54)
  (f32: 18)
Killed

The process terminates with "killed" message. Didn't enter "writing" !

Feb 28 '24 17:02 RageshAntonyHM

@RageshAntonyHM this looks like the OS terminated the process, maybe due to memory consumption/cpu time limitation?

Feb 28 '24 18:02 zichuan-wei

@RageshAntonyHM looks like it is still memory and compute issue

Feb 28 '24 19:02 nyadla-sys

I think there is an answer here that it is working: keras-team/keras#19108 It is based on this comment: keras-team/keras#19108 (comment)

So my code now is:
model.export("test", "tf_saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("test")
tflite_model = converter.convert()
with open("model.tflite", "wb") as f:
    f.write(tflite_model)
With the above the conversion finishes and the .tflite model is running into android. I have not used quantization since it is failing into android.

I can cofirm that I could get tflite by using:

gemma_lm.backbone.export('/tmp/gemma_saved_model')

instead of

tf.saved_model.save(gemma_lm.backbone, '/tmp/gemma_saved_model/')

Note that the converter seems not memory efficient; I observed more than 90 GiB virtual memory was needed on my desktop machine.

Feb 29 '24 02:02 freedomtan

@freedomtan can you share your working colab here

Feb 29 '24 02:02 nyadla-sys

@freedomtan can you share your working colab here

nope, because I tested it with a simple script on my local machine; didn't try to deal with memory issues in Colab :-)

import keras
import keras_nlp
import tensorflow as tf

os.environ["KAGGLE_USERNAME"] = '....'
os.environ["KAGGLE_KEY"] = '...'
os.environ["KERAS_BACKEND"] = "tensorflow" 

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
gemma_lm.backbone.export('/tmp/gemma_saved_model')

tflite_model = tf.lite.TFLiteConverter.from_saved_model('/tmp/gemma_saved_model/').convert()
with open("model.tflite", "wb") as f:
    f.write(tflite_model)

For test, I modified keras_nlp to have fixed tensor dimensions. That's it.

Feb 29 '24 03:02 freedomtan