ray icon indicating copy to clipboard operation
ray copied to clipboard

[RLlib] AlgorithmConfig.to_dict() fails when a tensorflow model is in the environment

Open lo-zed opened this issue 1 year ago • 2 comments

What happened + What you expected to happen

I have a tensorflow model in my environment and calling to_dict on the algorith config causes a keras serialization error. The error can be boiled down to keras' deserialize_keras_object. I use tensorflow 2.15 because 2.16 raises more severe errors.

Versions / Dependencies

ray 2.24.0 [edited: wrote 1.24.0 by mistake] tensorflow 2.15.1 keras 2.15.0

Reproduction script

This script works perfectly:

from copy import deepcopy

import tensorflow as tf


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(8, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

test = deepcopy(model)

Whereas this one fails:

from copy import deepcopy

from ray.rllib.utils.framework import try_import_tf
_, tf, _ = try_import_tf()

tf.compat.v1.enable_eager_execution()


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(8, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(1, activation='linear')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

test = deepcopy(model)

Error:

  File "<venv>/lib/python3.10/site-packages/keras/src/saving/legacy/serialization.py", line 365, in class_and_config_for_serialized_keras_object
    raise ValueError(
ValueError: Unknown object: 'Sequential'. Please ensure you are using a `keras.utils.custom_object_scope` and that this object is included in the scope. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

Replacing the Sequential model by MySequential defined by:

@tf.keras.saving.register_keras_serializable(package="MyLayers")
class MySequential(tf.keras.models.Sequential):
    pass

makes it work. But it is not acceptable as it requires retraining the saved models, which I unfortunately cannot do in every cases.

Issue Severity

High: It blocks me from completing my task.

lo-zed avatar Jun 13 '24 09:06 lo-zed

For the first case you could use

...
test = tf.keras.models.clone_model(model)`

simonsays1980 avatar Jun 28 '24 08:06 simonsays1980

Thanks for your reply. The problem is here I used deepcopy for simplicity. But my issue is with the method AlgorithmConfig.to_dict, within which the copy takes place

lo-zed avatar Jun 29 '24 19:06 lo-zed

Tensorflow Version: 2.16.2 ray Version: 2.31.0 Keras Version: 3.4.1 Both the scripts are working fine for me on these versions. I have tested them multiple times. This is possibly a dependency issue as you are using a very outdated version of the Keras

OR

Register Sequential as a class explicitly. tf.keras.utils.get_custom_objects().update({‘Sequential’: tf.keras.Sequential})

Worst case: This can be a problem with Algorithm.dict() it's not properly handling the Keras Sequential model.

PranitKatwe avatar Jul 13 '24 15:07 PranitKatwe

Hi, thank you for your help. Unfortunately none of these solutions work.

  • if I update tensorflow to 2.16, I cannot even train a DQN: ValueError: A KerasTensor cannot be used as input to a TensorFlow function. (see #44676)
  • if I register Sequential as suggested, the worker fails when trying to load the pickled model

lo-zed avatar Jul 15 '24 10:07 lo-zed

Hey, Thank you for letting me know I suggest working with Keras 3 or above if that does not work then try setting it up to the legacy mode. If that does not work then the last option is to use PyTorch.

PranitKatwe avatar Jul 15 '24 15:07 PranitKatwe

Okay, thanks. I wouldn't mind the idea of changing framework, but I'm trying to save time 😅

And yes, I've seen in the different linked issues that one workaround is to set keras to legacy mode. However I couldn't find out how to do that. Would you know how?

lo-zed avatar Jul 15 '24 15:07 lo-zed

Thanks. Changing the framework might take some time that is true but I would say that is the worst-case option.

To set Keras to legacy mode you can set the Environment variable export TF_KERAS_LEGACY_MODE=1

or setting up in Python script os.environ['TF_KERAS_LEGACY_MODE] = '1'

PranitKatwe avatar Jul 15 '24 15:07 PranitKatwe