Jared Willard
Jared Willard
I've run into this bug multiple times over the past year or so when I forget that I "shouldn't ever" update conda. I'm amazed this hasn't been fixed yet.
@albertvillanova Doesn't DirectRunner offer distributed processing through? https://beam.apache.org/documentation/runners/direct/ ``` Setting parallelism Number of threads or subprocesses is defined by setting the direct_num_workers pipeline option. From 2.22.0, direct_num_workers = 0 is...
If you set the device parameter equal to "cuda:0" or simply move the model and and data to the GPU (e.g. model = model.cuda(); data = data.cuda()) this runs fine...
Yes, exact same error using `dask.distributed.LocalCluster() `
I am running into the same issue with NCCL 2.21.5, it will not pass the nccl-tests (https://github.com/NVIDIA/nccl-tests) on a cluster where each node has 4xH100s with CUDA 12.4. The output...
Thanks, I saw it is an available flag in the reference code but it doesn't seem to work. Maybe that should be removed?
I am getting a similar error after installing into NGC container with torch=2.4.0a0+f70bd71a48.nv24.6 torchao=0.11.0 torchtune==0.6.1 ``` thes@nid008232:/pscratch/sd/t/thes/jared/torchtune$ tune --help Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 16, in import...