ionutmodo

Results 11 comments of ionutmodo

is there any update on this? I would also need to merge the runs in two groups, that would be awesome!

are there any updates on this? The issue is still present

I had a look at this error which I also faced when training a ResNet-50 model. I got a similar error as @brando90, except that the dimensions of my tensors...

any updates on this? I am facing more or less the same issue. CMAKE fails with an error saying `Could NOT find CUDNN (missing: CUDNN_INCLUDE_DIR CUDNN_LIBRARY)`. However, these variables are...

is there any update on this?

I would like to add a related issue to torch compile: when using `compile=True` and learning rate warmup, the code breaks because the scheduler expects to find the `__func__` attribute,...

@felipemello1 I am reviving this thread with an error I encountered. I am trying to do full-finetuning-distributed with cosine learning rate with warmup using the default AdamW optimizer from pytorch...

the error message says `step_fn` doesn't have attribute `__func__`. Below you can find the config: ``` # Config for multi-device full finetuning in full_finetune_distributed.py # using a Llama3.1 8B Instruct...

@felipemello1 I started working on this again and I figured out what the issue is: when using `compile=True`, torchtune compiles the model, the loss, the optimizer step and scale grads...