[RLlib] AlgorithmConfig.to_dict() fails when a tensorflow model is in the environment
What happened + What you expected to happen
I have a tensorflow model in my environment and calling to_dict on the algorith config causes a keras serialization error. The error can be boiled down to keras' deserialize_keras_object.
I use tensorflow 2.15 because 2.16 raises more severe errors.
Versions / Dependencies
ray 2.24.0 [edited: wrote 1.24.0 by mistake] tensorflow 2.15.1 keras 2.15.0
Reproduction script
This script works perfectly:
from copy import deepcopy
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu', input_shape=(2,)),
tf.keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
test = deepcopy(model)
Whereas this one fails:
from copy import deepcopy
from ray.rllib.utils.framework import try_import_tf
_, tf, _ = try_import_tf()
tf.compat.v1.enable_eager_execution()
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, activation='relu', input_shape=(2,)),
tf.keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
test = deepcopy(model)
Error:
File "<venv>/lib/python3.10/site-packages/keras/src/saving/legacy/serialization.py", line 365, in class_and_config_for_serialized_keras_object
raise ValueError(
ValueError: Unknown object: 'Sequential'. Please ensure you are using a `keras.utils.custom_object_scope` and that this object is included in the scope. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.
Replacing the Sequential model by MySequential defined by:
@tf.keras.saving.register_keras_serializable(package="MyLayers")
class MySequential(tf.keras.models.Sequential):
pass
makes it work. But it is not acceptable as it requires retraining the saved models, which I unfortunately cannot do in every cases.
Issue Severity
High: It blocks me from completing my task.
For the first case you could use
...
test = tf.keras.models.clone_model(model)`
Thanks for your reply. The problem is here I used deepcopy for simplicity. But my issue is with the method AlgorithmConfig.to_dict, within which the copy takes place
Tensorflow Version: 2.16.2 ray Version: 2.31.0 Keras Version: 3.4.1 Both the scripts are working fine for me on these versions. I have tested them multiple times. This is possibly a dependency issue as you are using a very outdated version of the Keras
OR
Register Sequential as a class explicitly.
tf.keras.utils.get_custom_objects().update({‘Sequential’: tf.keras.Sequential})
Worst case: This can be a problem with Algorithm.dict() it's not properly handling the Keras Sequential model.
Hi, thank you for your help. Unfortunately none of these solutions work.
- if I update tensorflow to 2.16, I cannot even train a DQN:
ValueError: A KerasTensor cannot be used as input to a TensorFlow function.(see #44676) - if I register Sequential as suggested, the worker fails when trying to load the pickled model
Hey, Thank you for letting me know I suggest working with Keras 3 or above if that does not work then try setting it up to the legacy mode. If that does not work then the last option is to use PyTorch.
Okay, thanks. I wouldn't mind the idea of changing framework, but I'm trying to save time 😅
And yes, I've seen in the different linked issues that one workaround is to set keras to legacy mode. However I couldn't find out how to do that. Would you know how?
Thanks. Changing the framework might take some time that is true but I would say that is the worst-case option.
To set Keras to legacy mode
you can set the Environment variable
export TF_KERAS_LEGACY_MODE=1
or setting up in Python script
os.environ['TF_KERAS_LEGACY_MODE] = '1'