Establish a consistent memory allocation strategy for Tensorflow Memory
Currently memory allocation for TF in our examples in inconsistent and causes issues.
- [ ] Update NVTabular TF examples
- [ ] Update Merlin models config_tensorflow() function
Let's figure out best practices and make it consistent.
I think we discussed that TensorFlow 2.8 will set cuda_malloc_async by default.
@rnyak shared, if we remove os.environ["TF_GPU_ALLOCATOR"]="cuda_malloc_async", that TensorFlow2.8 will consume the full GPU memory.
I reproduced the behavior in our merlin-tensorflow-training:22.04 container.
- Set nothing: 30/40GB (I think it should be 38 GB out of 40 GB, need to double check)
- Set cuda_malloc_async: 0.5/40GB
- Set TF_MEMORY_ALLOCATION=0.5: 38/40GB
- Use configure_tensorflow: 21/40GB (default behavior is 50%)
@jperez999 I think you mentioned that TF2.8 will set it by default. Do you have any reference? Do you observe the same behavior?
What should we use for memory allocation in our examples? I think the best user experience is cuda_malloc_async.
Should we add it explicit to all examples with the note, that it is only available for TF2.8 and add an reference to Troubleshooting?
In the Troubleshooting, we can add a section for older TF versions?
Nothing:
import tensorflow as tf
print(tf.__version__)
tf.constant([0,1,2])
cuda_malloc_async:
import os
os.environ["TF_GPU_ALLOCATOR"]="cuda_malloc_async"
import tensorflow as tf
print(tf.__version__)
tf.constant([0,1,2])
TF_MEMORY_ALLOCATION:
import os
os.environ["TF_MEMORY_ALLOCATION"] = "0.5"
import tensorflow as tf
print(tf.__version__)
tf.constant([0,1,2])
configure_tensorflow:
from merlin.models.loader.tf_utils import configure_tensorflow
configure_tensorflow()
import tensorflow as tf
print(tf.__version__)
tf.constant([0,1,2])
Let's test 2.7 if cuda_malloc_async works
rename to allocate_tensorflow_memory
add kw type=dynamic | fixed | None
if default None it will use best based on tf version
if fixed force use of tf_memory_allocation
if dynamic try to use cuda-malloc-async if tf version => 2.8.0
Tensorflow 2.6 behavior (21.12 container)
Set nothing: 31/32GB Set cuda_malloc_async: 0.5/32GB Set TF_MEMORY_ALLOCATION=0.5: 31/32GB Use configure_tensorflow: is not availble in the docker container
Tensorflow 2.7 behavior (22.02 container)
Set nothing: 31/32GB Set cuda_malloc_async: 31/32GB Set TF_MEMORY_ALLOCATION=0.5: 31/32GB Use configure_tensorflow: is not availble in the docker container
@rnyak @jperez999 I am not sure, how we should continue with the TensorFlow allocation logic :) . I do not understand, why it works for 2.6 and it does not work for 2.7?
@bschifferer can we add the details in a README about TF memory allocation behaviors wrt different TF versions and in the example nbs we just say something like this nb was developed with TF 2.8 ... and for TF 2.6 and 2.7 versions please visit README and learn tips about tf memory allocation what do you think?
Yes, we can do it. But I wonder, why cuda_malloc_async works for TF2.6 but it does not work for TF2.7?
I rerun the test with native installed TensorFlow:
TF2.6 (pip)
- Nothing: 31/32 GB
- Set cuda_malloc_async: kernel dies
- Set TF_MEMORY_ALLOCATION=0.5: 31/32GB
TF2.7 (pip)
- Nothing: 31/32 GB
- Set cuda_malloc_async: 0.5/32GB
- Set TF_MEMORY_ALLOCATION=0.5: 31/32GB
TF2.8 (pip)
- Nothing: 31/32 GB
- Set cuda_malloc_async: 0.5/32GB
- Set TF_MEMORY_ALLOCATION=0.5: 31/32GB
rename to allocate_tensorflow_memory add kw
type=dynamic | fixed | Noneif default None it will use best based on tf version if fixed force use of tf_memory_allocation if dynamic try to use cuda-malloc-async if tf version => 2.8.0
@jperez999 I think that behavior is correct. In theory, cuda-malloc-async can work with TF2.7.0 but it depends on the environment. It did not work with our own container, but it worked, when installing it from pip.
@bschifferer one note to this thread:
run_ensemble_on_tritonserver is giving error if we set cuda_malloc_async.
@jperez999 have you had a change to update configure_tensorflow to allocate_tensorflow_memory ?
@EvenOldridge , Should this be added to 22.08 scope ? Its not clear how this maps to the roadmap
@viswa-nvidia I closed the ticket as there was no progress for a long time. Please reopen, if we should work on it.