DeepLabCut-live Deeplabcut-Live with AMD GPU

Hi Deeplabcut-Live developer I have ran Deeplabcut-live smoothly on NVIDIA GPU and now I have a computer setup that uses AMD GPU. According to the classic Deeplabcut installation guidance https://deeplabcut.github.io/DeepLabCut/docs/recipes/installTips.html I should be able to use AMD GPU by installing tensorflow-directml in the conda environment?! I have not seen a benchmark report about AMD GPU so I am curious if anyone has experience about this. Best Regards

Chi-Yu

Sep 02 '22 15:09 chiyu1203

Hi everyone, I have done my experiments and unfortunately, I could not get AMD GPU run deeplabcut-live with two versions of Tensorflow-directML. One coming from the discussion thread on the deeplabcut github #https://github.com/DeepLabCut/DeepLabCut/issues/1687 The other one coming from here https://docs.microsoft.com/en-us/windows/ai/directml/gpu-tensorflow-plugin The installation guidance and the initial discussion thread on the deeplabcut github sounds promising, so I am not sure what might cause AMD-GPU to be recognised in my setup. Below I will focus on the one directML tested in the deeplabcut github, naming the one designed for tensorflow 1.15.5 This is my setup info:

windows11 version21H2(OS Build 22000.856) 64bit AMD: AMD Ryzen 9 5950X 16-core Processor GPU: Radeon RXZ580 Series python 3.7.12

This is my installation step:

conda create --name dlc_dml python=3.7
conda activate dlc_dml
pip install deeplabcut-live
pip install tensorflow-directml==1.15.5
pip install imageio==2.9.0

Below is my package list

absl-py=1.2.0=pypi_0 astor=0.8.1=pypi_0 astunparse=1.6.3=pypi_0 ca-certificates=2022.6.15=h5b45459_0 cachetools=5.2.0=pypi_0 certifi=2022.6.15.1=pypi_0 charset-normalizer=2.1.1=pypi_0 colorama=0.4.5=pypi_0 colorcet=3.0.0=pypi_0 deeplabcut-live=1.0.2=pypi_0 ffmpeg=4.2.2=ha925a31_0 flatbuffers=1.12=pypi_0 gast=0.2.2=pypi_0 google-auth=2.11.0=pypi_0 google-auth-oauthlib=0.4.6=pypi_0 google-pasta=0.2.0=pypi_0 grpcio=1.48.1=pypi_0 h5py=2.10.0=pypi_0 idna=3.3=pypi_0 imageio=2.9.0=pypi_0 importlib-metadata=4.12.0=pypi_0 keras=2.9.0=pypi_0 keras-applications=1.0.8=pypi_0 keras-preprocessing=1.1.2=pypi_0 libclang=14.0.6=pypi_0 libsqlite=3.39.3=hcfcfb64_0 markdown=3.4.1=pypi_0 markupsafe=2.1.1=pypi_0 numexpr=2.8.3=pypi_0 numpy=1.18.5=pypi_0 oauthlib=3.2.0=pypi_0 opencv-python-headless=4.6.0.66=pypi_0 openssl=3.0.5=hcfcfb64_2 opt-einsum=3.3.0=pypi_0 packaging=21.3=pypi_0 pandas=1.3.5=pypi_0 param=1.12.2=pypi_0 pillow=9.2.0=pypi_0 pip=22.2.2=pyhd8ed1ab_0 protobuf=3.19.4=pypi_0 py-cpuinfo=8.0.0=pypi_0 pyasn1=0.4.8=pypi_0 pyasn1-modules=0.2.8=pypi_0 pyct=0.4.8=pypi_0 pyparsing=3.0.9=pypi_0 python=3.7.12=h900ac77_100_cpython python-dateutil=2.8.2=pypi_0 python_abi=3.7=2_cp37m pytz=2022.2.1=pypi_0 requests=2.28.1=pypi_0 requests-oauthlib=1.3.1=pypi_0 rsa=4.9=pypi_0 ruamel-yaml=0.17.21=pypi_0 ruamel-yaml-clib=0.2.6=pypi_0 setuptools=65.3.0=py37h03978a9_0 six=1.16.0=pypi_0 sqlite=3.39.3=hcfcfb64_0 tables=3.7.0=pypi_0 tensorboard=1.15.0=pypi_0 tensorboard-data-server=0.6.1=pypi_0 tensorboard-plugin-wit=1.8.1=pypi_0 tensorflow=2.9.0=pypi_0 tensorflow-directml=1.15.5=pypi_0 tensorflow-estimator=1.15.1=pypi_0 tensorflow-io-gcs-filesystem=0.27.0=pypi_0 termcolor=1.1.0=pypi_0 tqdm=4.64.1=pypi_0 typing-extensions=4.3.0=pypi_0 ucrt=10.0.20348.0=h57928b3_0 urllib3=1.26.12=pypi_0 vc=14.2=hb210afc_7 vs2015_runtime=14.29.30139=h890b9b1_7 werkzeug=2.2.2=pypi_0 wheel=0.37.1=pyhd8ed1ab_0 wrapt=1.14.1=pypi_0 zipp=3.8.1=pypi_0

I ran the command to test inference dlc-live-benchmark /path/to/exported/model /path/to/video1 /path/to/video2 -o /path/to/output -r 1.0 0.75 0.5 and got this error

Traceback (most recent call last): File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\tensorflow_core\python\framework\importer.py", line 501, in _import_graph_def_internal graph._c_graph, serialized, options) # pylint: disable=protected-access tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node DLC/MobilenetV2/Conv/BatchNorm/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\Scripts\dlc-live-benchmark.exe_main.py", line 7, in File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\dlclive\benchmark.py", line 722, in main save_video=args.save_video, File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\dlclive\benchmark.py", line 645, in benchmark_videos output=output, File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\dlclive\benchmark.py", line 305, in benchmark poses.append(live.init_inference(frame)) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\dlclive\dlclive.py", line 280, in init_inference graph = finalize_graph(graph_def) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\dlclive\graph.py", line 58, in finalize_graph tf.import_graph_def(graph_def, name="DLC") File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def producer_op_list=producer_op_list) File "C:\Users\ag-bahl\anaconda3\envs\dlc_live\lib\site-packages\tensorflow_core\python\framework\importer.py", line 505, in _import_graph_def_internal raise ValueError(str(e)) ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node DLC/MobilenetV2/Conv/BatchNorm/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

The second DirectML looked promising because it was designed for tensorflow 2. However, there might be some limitation for particular AMD GPU and this might be the main reason. To test this environment, I used python 3.9 instead as they advertise. I could open some dynamic libraries but not all. Below is the error I received

2022-09-09 19:40:23.172339: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-09-09 19:40:23.172412: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-09-09 19:40:24.430512: I tensorflow/c/logging.cc:34] Successfully opened dynamic library C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow-plugins/directml/directml.0de2b4431c6572ee74152a7ee0cd3fb1534e4a95.dll 2022-09-09 19:40:24.430948: I tensorflow/c/logging.cc:34] Successfully opened dynamic library dxgi.dll 2022-09-09 19:40:24.433075: I tensorflow/c/logging.cc:34] Successfully opened dynamic library d3d12.dll 2022-09-09 19:40:24.558598: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters. C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") 2022-09-09 19:40:25.562117: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-09-09 19:40:25.562952: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (Radeon RX 580 Series) 2022-09-09 19:40:25.712132: I tensorflow/c/logging.cc:34] Successfully opened dynamic library Kernel32.dll 2022-09-09 19:40:25.713498: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-09-09 19:40:25.713570: W tensorflow/core/common_runtime/pluggable_device/pluggable_device_bfc_allocator.cc:28] Overriding allow_growth setting because force_memory_growth was requested by the device. 2022-09-09 19:40:25.713896: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6959 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: ) 2022-09-09 19:40:25.736081: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled Traceback (most recent call last): File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1377, in _do_call return fn(*args) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _run_fn self._extend_graph() File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1400, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: Multiple OpKernel registrations match NodeDef at the same priority '{{node DLC/sub}}': 'op: "Sub" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' and 'op: "Sub" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' [[DLC/sub]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\Scripts\dlc-live-benchmark.exe_main.py", line 7, in File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 707, in main benchmark_videos( File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 629, in benchmark_videos this_inf_times, this_im_size, TFGPUinference, meta = benchmark( File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 305, in benchmark poses.append(live.init_inference(frame)) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\dlclive.py", line 371, in init_inference pose = self.get_pose(frame, **kwargs) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\dlclive.py", line 401, in get_pose pose_output = self.sess.run( File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 967, in run result = self._run(None, fetches, feed_dict, options_ptr, File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run results = self._do_run(handle, final_targets, final_fetches, File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\tensorflow\python\client\session.py", line 1396, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'DLC/sub' defined at (most recent call last): File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\Scripts\dlc-live-benchmark.exe_main.py", line 7, in sys.exit(main()) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 707, in main benchmark_videos( File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 629, in benchmark_videos this_inf_times, this_im_size, TFGPUinference, meta = benchmark( File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\benchmark.py", line 305, in benchmark poses.append(live.init_inference(frame)) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\dlclive.py", line 280, in init_inference graph = finalize_graph(graph_def) File "C:\Users\ag-bahl\anaconda3\envs\dlc_dmlplugin\lib\site-packages\dlclive\graph.py", line 58, in finalize_graph tf.import_graph_def(graph_def, name="DLC") Node: 'DLC/sub' Multiple OpKernel registrations match NodeDef at the same priority '{{node DLC/sub}}': 'op: "Sub" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' and 'op: "Sub" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }' [[DLC/sub]]

Please feel free to let me know if you have any concern or comments (did I need to install nvidia cudnn driver etcs..even if I want to directML?) Best Regards Chi-Yu

Sep 09 '22 18:09 chiyu1203

Hi I have tested the entire same installation procedure from this issue https://github.com/DeepLabCut/DeepLabCut/issues/1687 and it looks like the directML can initiate AMD GPU in our setup if the tensorflow version is 1.15.5 (so the exact same version as tensorflow-directml). Hence the easiest fix would be to train my data on tensorflow 1.15.5. However training the dataset on deeplabcut using tensorflow 1.15.5 is not straightforward. I managed to solve all the compatibility issue, but for some reasons then the training results with MobileNet_V2_1.0 is always poor (so poor that I can not do the default extract outliers), so I am kind of stuck now...

Sep 14 '22 08:09 chiyu1203

in general yes you need to use the same version of TF for export and loading.

We don't support AMD chips (the whole code is of course open source and without any warrantees), but we put that recipe there from a user who successfully tried within DLC: but not with DLC-live.

Sep 14 '22 18:09 MMathisLab