`tensorflow-metal` is not supported, errors when using tf with GPU on a mac - Some of the examples on doc does not work
Description
Hi, I'm trying to execute some of the examples that you have on your documentation. However, it's having some device-related issues. I don't have NVIDIA GPU (it's an M1 Mac), but it automatically sets the device to GPU even though the device seems to be set to False.
print(sampler.devices)
# {'/device:GPU:0': False}
Is it possible to change the use to CPU instead? I couldn't find a set device function in Vegasflow. Should it be done through tensorflow?
Thanks for your help. Cheers
Code example
from vegasflow import VegasFlow, run_eager
import tensorflow as tf
run_eager(True)
def my_complicated_fun(xarr, **kwargs):
return tf.reduce_sum(xarr, axis=1)
n_dim = 10
n_events = int(1e5)
sampler = VegasFlow(n_dim, n_events, verbose=False)
sampler.compile(my_complicated_fun)
# Now let's train the integrator for 10 iterations
_ = sampler.run_integration(10)
# Now we can use sampler to generate random numbers
rnds, _, px = sampler.generate_random_array(100)
Additional information
import tensorflow as tf
import vegasflow
print(f"Vegasflow: {vegasflow.__version__}")
# Vegasflow: 1.3.0
print(f"Tensorflow: {tf.__version__}")
# Tensorflow: 2.14.0
print(f"tf-mkl: {tf.python.framework.test_util.IsMklEnabled()}")
# AttributeError: module 'tensorflow' has no attribute 'python'
print(f"tf-cuda: {tf.test.is_built_with_cuda()}")
# tf-cuda: False
print(f"GPU available: {tf.test.is_gpu_available()}")
# GPU available: True
Traceback
[INFO] (vegasflow.monte_carlo) Checking whether the integrand outputs the correct shape (note, this will run a very small number of events and potentially trigger a retrace)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
Cell In[103], line 15
12 sampler.compile(my_complicated_fun)
14 # Now let's train the integrator for 10 iterations
---> 15 _ = sampler.run_integration(10)
17 # Now we can use sampler to generate random numbers
18 rnds, _, px = sampler.generate_random_array(100)
File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/monte_carlo.py:682, in MonteCarloFlow.run_integration(self, n_iter, log_time, histograms)
679 start = time.time()
681 # Run one single iteration and append results
--> 682 res, error = self._run_iteration()
683 all_results.append((res, error))
685 # If there is a histogram variable, store it and empty it
File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:447, in VegasFlow._run_iteration(self)
445 def _run_iteration(self):
446 """Runs one iteration of the Vegas integrator"""
--> 447 return self._iteration_content()
File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:442, in VegasFlow._iteration_content(self)
440 # If training is active, act post integration
441 if self.train:
--> 442 self.refine_grid(arr_res2)
443 return res, sigma
File ~/packages/miniconda3/lib/python3.9/site-packages/vegasflow/vflow.py:363, in VegasFlow.refine_grid(self, arr_res2)
361 for j in range(self.n_dim):
362 new_divisions = refine_grid_per_dimension(arr_res2[j, :], self.divisions[j, :])
--> 363 self.divisions[j, :].assign(new_divisions)
File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/array_ops.py:1359, in strided_slice.<locals>.assign(val, name)
1356 if name is None:
1357 name = parent_name + "_assign"
-> 1359 return var._strided_slice_assign(
1360 begin=begin,
1361 end=end,
1362 strides=strides,
1363 value=val,
1364 name=name,
1365 begin_mask=begin_mask,
1366 end_mask=end_mask,
1367 ellipsis_mask=ellipsis_mask,
1368 new_axis_mask=new_axis_mask,
1369 shrink_axis_mask=shrink_axis_mask)
File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/resource_variable_ops.py:1523, in BaseResourceVariable._strided_slice_assign(self, begin, end, strides, value, name, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask)
1518 def _strided_slice_assign(self, begin, end, strides, value, name, begin_mask,
1519 end_mask, ellipsis_mask, new_axis_mask,
1520 shrink_axis_mask):
1521 with _handle_graph(self.handle), self._assign_dependencies():
1522 return self._lazy_read(
-> 1523 gen_array_ops.resource_strided_slice_assign(
1524 ref=self.handle,
1525 begin=begin,
1526 end=end,
1527 strides=strides,
1528 value=ops.convert_to_tensor(value, dtype=self.dtype),
1529 name=name,
1530 begin_mask=begin_mask,
1531 end_mask=end_mask,
1532 ellipsis_mask=ellipsis_mask,
1533 new_axis_mask=new_axis_mask,
1534 shrink_axis_mask=shrink_axis_mask))
File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/ops/gen_array_ops.py:8826, in resource_strided_slice_assign(ref, begin, end, strides, value, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask, name)
8824 return _result
8825 except _core._NotOkStatusException as e:
-> 8826 _ops.raise_from_not_ok_status(e, name)
8827 except _core._FallbackException:
8828 pass
File ~/packages/miniconda3/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:5888, in raise_from_not_ok_status(e, name)
5886 def raise_from_not_ok_status(e, name) -> NoReturn:
5887 e.message += (" name: " + str(name if name is not None else ""))
-> 5888 raise core._status_to_exception(e) from None
InvalidArgumentError: Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceStridedSliceAssign: CPU
_Arg: GPU CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
ref (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0
Op: ResourceStridedSliceAssign
Node attrs: new_axis_mask=0, Index=DT_INT32, shrink_axis_mask=1, end_mask=2, ellipsis_mask=0, begin_mask=2, T=DT_DOUBLE
Registered kernels:
device='XLA_CPU_JIT'; Index in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN]
device='DEFAULT'; T in [DT_INT32]
device='CPU'; T in [DT_UINT64]
device='CPU'; T in [DT_INT64]
device='CPU'; T in [DT_UINT32]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_BOOL]
device='CPU'; T in [DT_STRING]
device='CPU'; T in [DT_RESOURCE]
device='CPU'; T in [DT_VARIANT]
device='CPU'; T in [DT_QINT8]
device='CPU'; T in [DT_QUINT8]
device='CPU'; T in [DT_QINT32]
[[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign
How did you install tensorflow? It should not say that the GPU is available if it is not true.
I'm also using a M1 mac and, indeed,print(f"GPU available: {tf.test.is_gpu_available()}") returns False. I cannot reproduce the error, however, maybe you can control whether tensorflow sees or not a GPU with CUDA_VISIBLE_DEVICES=""
What version of python are you using? (I need to update that issue template!)
Also, since it is probably important in a mac, are you using python from the system, brew or conda?
Hi @scarlehoff
How did you install tensorflow? It should not say that the GPU is available if it is not true.
By following this link.
What version of python are you using?
im using python 3.9.18
I cannot reproduce the error, however, maybe you can control whether tensorflow sees or not a GPU with
CUDA_VISIBLE_DEVICES=""
I tried to set os.environ["CUDA_VISIBLE_DEVICES"] = "" but I'm still getting the same error. I also tried setting CPU via
physical_devices = tf.config.list_physical_devices('CPU')
tf.config.set_visible_devices(physical_devices[0], 'CPU')
as well as
from vegasflow import configflow
configflow.DEFAULT_ACTIVE_DEVICES = ["CPU"]
...
vegas_instance = VegasFlow(dims, n_calls, list_devices=["CPU"])
...
But non of them helped.
Ah... I see, the problem is tensorflow-metal.
A possible solution is to do at the beginning of the code:
import tensorflow as tf
tf.configl.set_visible_devices([], 'GPU')
Basically the opposite strategy of what you did, telling tf "your GPUs are... None" :P
However, I'd like to understand why M1 doesn't work. As long as you are not using complex types it should be fine. Maybe we are doing something that tensorflow-metal doesn't support yet. I'll investigate. Thanks for the report (I'll change the title to make the bug evident!)
btw, the example itself in the docs is now wrong, it should be rnds, px = sampler.generate_random_array(100), I removed the extra output some time ago but it seems I forgot to update the docs, sorry about that.
Aha! Thanks, Juan; that indeed fixes the issue!