ANTsPyNet icon indicating copy to clipboard operation
ANTsPyNet copied to clipboard

again: "TypeError: Expected `trainable` argument to be a boolean, but got: None"

Open storrisi42 opened this issue 3 years ago • 11 comments

Hello there ANTSpyNet gurus-

I'm trying to get the DL super-resolution to work. I'm experiencing an error that unfortunately we've already seen #50 and #57 , yet reading through these threads I cannot get the solution hacks to work. Can you advise if there's a new work-around or if I'm doing something incorrectly? In terms of build versions, I installed antspynet via pip last Wednesday July 15.

INFO OF MY SYSTEM: Ubuntu 20.04.4 LTS Python 3.8.13 tensorflow 2.9.1 nvcc --version: Cuda compilation tools, release 10.1, V10.1.243 ls -l /home/sam/.keras/ANTsXNet -rw-rw-r-- 1 sam sam 145251944 Jul 13 16:46 dbpn4x.h5 -rw-rw-r-- 1 sam sam 34878800 Jul 13 17:17 mriSuperResolution.h5

MY COMMANDS: import ants import antspynet import tensorflow as tf os_ds=ants.image_read("/home/sam/Desktop/DLdata_antspynet/sub-masked-ds.nii") ants.plot(os_ds) ptn = antspynet.get_pretrained_network("dbpn4x") print(ptn) srtest = antspynet.utilities.apply_super_resolution_model_to_image(os_ds, ptn, verbose=True)

the plot and the print work but then it breaks at the final line:

MY ERROR: Load model. Traceback (most recent call last): File "sr_antspynet.py", line 20, in srtest = antspynet.utilities.apply_super_resolution_model_to_image(os_ds, ptn, verbose=True) File "/home/sam/miniconda3/lib/python3.8/site-packages/antspynet/utilities/super_resolution_utilities.py", line 279, in apply_super_resolution_model_to_image model = load_model(model) File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/engine/base_layer.py", line 340, in init raise TypeError( TypeError: Expected trainable argument to be a boolean, but got: None

Finally, if it's useful to know, a co-worker on a similar system was also get through those commands and have it break identically. Any clues would be so incredibly appreciated and thanks again!

-Sam

storrisi42 avatar Jul 18 '22 21:07 storrisi42

Please use this functionality.

ntustison avatar Jul 18 '22 22:07 ntustison

thanks for such a quick reply! ok i switched out my final command to:

sr_test = antspynet.utilities.mri_super_resolution(os_ds, verbose=True)

but still get:

Traceback (most recent call last): File "sr_antspynet.py", line 17, in image_sr = antspynet.utilities.mri_super_resolution(os_ds, verbose=True) File "/home/sam/miniconda3/lib/python3.8/site-packages/antspynet/utilities/mri_super_resolution.py", line 45, in mri_super_resolution model_sr = tf.keras.models.load_model(model_and_weights_file_name, compile=False) File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/engine/base_layer.py", line 340, in init raise TypeError( TypeError: Expected trainable argument to be a boolean, but got: None

storrisi42 avatar Jul 18 '22 22:07 storrisi42

I just ran the following without issues on my machine (up-to-date repo and models):

% python3
Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ants
>>> import antspynet
INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2
>>> t1 = ants.image_read(antspynet.get_antsxnet_data("kirby"))
>>> t1_ds = ants.resample_image(t1, (8, 8, 8), use_voxels=False)
>>> t1_ds.shape
(26, 32, 32)
>>> t1_us = antspynet.mri_super_resolution(t1_ds)
2022-07-18 15:32:55.970718: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-18 15:32:56.596669: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
>>> t1_us.shape
(52, 64, 64)
>>> 

Because of the documented issues noted above, you most likely have outdated models that you need to delete (~/.keras/ANTsXNet/*.h5) so that they're updated with the corrected versions.

ntustison avatar Jul 18 '22 22:07 ntustison

i'm so sorry if this is frustrating. i deleted the models: rm ~/.keras/ANTsXNet/*.h5

and then re-ran my commands which re-downloaded the models. then still same error. then i tried to exactly reproduce your commands but when i got to this line, same error:

t1_us = antspynet.mri_super_resolution(t1_ds) Traceback (most recent call last): File "", line 1, in File "/home/sam/miniconda3/lib/python3.8/site-packages/antspynet/utilities/mri_super_resolution.py", line 45, in mri_super_resolution model_sr = tf.keras.models.load_model(model_and_weights_file_name, compile=False) File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/sam/miniconda3/lib/python3.8/site-packages/keras/engine/base_layer.py", line 340, in init raise TypeError( TypeError: Expected trainable argument to be a boolean, but got: None

could it be a specific keras or tensorflow issue? what are you using? thank you!

storrisi42 avatar Jul 18 '22 22:07 storrisi42

>>> print(tf.version.VERSION)
2.5.0-rc0

Is your ANTsPyNet repo up-to-date?

ntustison avatar Jul 18 '22 23:07 ntustison

print(tf.version.VERSION) 2.9.1

$ pip show antspynet Name: antspynet Version: 0.1.8

so according to the antspynet github that's the latest version, but you're using an older version of tf. should i install an older tensorflow?

storrisi42 avatar Jul 18 '22 23:07 storrisi42

For installation, did you do a git pull and install from source? If not, I would try that. If that doesn't work, I'm out of ideas.

That error you're getting is exactly the same as what we were getting before I rewrote the layers of all the models to conform to the tensorflow/keras changes and those changes should be reflected in any model that you download with the current repository. Specifically, get_pretrained_network has been updated to reflect the location of the new models.

ntustison avatar Jul 18 '22 23:07 ntustison

for install i had installed only with: "pip install antspynet". i could try git pulling and installing from source but first an update on my progress, because i did some things and it went farther (but broke again). let me explain:

i told you that after deleting my .h5 models it re-downloaded the models. here's the output of that:

(base) sam@sam-Alienware-m15-R4:~/Desktop/DLdata_antspynet$ python sr_antspynet.py Downloading data from https://ndownloader.figshare.com/files/13347617 145251944/145251944 [==============================] - 23s 0us/step Downloading data from https://ndownloader.figshare.com/files/24128618 34878800/34878800 [==============================] - 16s 0us/step

but if i look at the new cookpa commit here: https://github.com/ANTsX/ANTsPyNet/commit/1ec8951eea24b0035ae59d397544b6e183d2a2ab line 123 shows mriSuperResolution has a different URL associated with it: "mriSuperResolution": "https://figshare.com/ndownloader/files/35290684"

so i just manually downloaded that mriSuperResolution.h5 and replaced the automatically-downloaded one in /home/sam/.keras/ANTsXNet/ and now it runs farther! so it seems a bug in what gets downloaded still remains? or do i have an old download script and should i completely reinstall antspynet with pip? or completely reinstall from a git pull and from source?

finally, my new error is OOM, which i realize is a different issue. nonetheless any advice or thoughts would be super appreciated:

(base) sam@sam-Alienware-m15-R4:~/Desktop/DLdata_antspynet$ python sr_antspynet.py 2022-07-18 17:03:02.033372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-07-18 17:03:02.061603: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory 2022-07-18 17:03:02.061622: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2022-07-18 17:03:02.061963: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-07-18 17:03:02.987802: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 749979648 exceeds 10% of free system memory. 2022-07-18 17:03:03.132798: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 749979648 exceeds 10% of free system memory. 2022-07-18 17:03:03.132849: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 749979648 exceeds 10% of free system memory. 2022-07-18 17:03:04.113783: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 749979648 exceeds 10% of free system memory. 2022-07-18 17:03:04.319348: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 1499959296 exceeds 10% of free system memory. 2022-07-18 17:03:16.749496: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at conv_grad_ops_3d.cc:507 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[1,732402,13824] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu

storrisi42 avatar Jul 19 '22 00:07 storrisi42

Because the ANTsXNet toolkits are rather dynamic currently, I would recommend installing directly from source. Although there is quite a bit of effort in making regular releases, it's often not prioritized on my part. So, yes, it looks like your repository was outdated and it was pulling the old model. The error you're currently getting is expected given how much of a footprint is required to run super resolution. Try downsampling as in the example I wrote above.

ntustison avatar Jul 19 '22 00:07 ntustison

ok i'll do that moving forward! no prob about not making regular releases a first priority; totally understand. i've now downsampled my toy data and it worked really well thank you soo much.

storrisi42 avatar Jul 19 '22 01:07 storrisi42

Recent pips can install from source,

pip install git+https://github.com/ANTsX/ANTsPyNet.git

The super resolution code is quite memory intensive. I've run it on brain MRI but I have to extract the brain first, then crop the image to remove most of the background.

cookpa avatar Aug 05 '22 14:08 cookpa