Jared Willard comments

Results 7 comments of


                                            Jared Willard

AttributeError: module 'ruamel_yaml' has no attribute 'representer'

I've run into this bug multiple times over the past year or so when I forget that I "shouldn't ever" update conda. I'm amazed this hasn't been fixed yet.

Out of memory error on workers while running Beam+Dataflow

@albertvillanova Doesn't DirectRunner offer distributed processing through? https://beam.apache.org/documentation/runners/direct/ ``` Setting parallelism Number of threads or subprocesses is defined by setting the direct_num_workers pipeline option. From 2.22.0, direct_num_workers = 0 is...

CUDA enabled?

If you set the device parameter equal to "cuda:0" or simply move the model and and data to the GPU (e.g. model = model.cuda(); data = data.cuda()) this runs fine...

TypeError: cannot pickle '_thread.RLock' object

Yes, exact same error using `dask.distributed.LocalCluster() `

Nvidia NCCL GPU Direct RDMA causes hangs

I am running into the same issue with NCCL 2.21.5, it will not pass the nccl-tests (https://github.com/NVIDIA/nccl-tests) on a cluster where each node has 4xH100s with CUDA 12.4. The output...

Enabling vllm in llama2-70b

Thanks, I saw it is an available flag in the reference code but it doesn't seem to work. Maybe that should be removed?

BUG: Installation broken on torchao

I am getting a similar error after installing into NGC container with torch=2.4.0a0+f70bd71a48.nv24.6 torchao=0.11.0 torchtune==0.6.1 ``` thes@nid008232:/pscratch/sd/t/thes/jared/torchtune$ tune --help Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torchtune/__init__.py", line 16, in import...