Corey James Levinson comments

Results 15 comments of


                                            Corey James Levinson

How do i know when ive reached an optimum while training

YuriCat, can you add an arg in the train function that has evaluate to a different agent? Like i want to evaluate against my model from 20 epochs ago to...

torch.cuda.amp > apex.amp

@mcarilli I just watched a video that says you can used FusedAdam, FusedSGD, etc. for a faster optimizer when using amp. How do we use this in native Pytorch 1.6...

Installation succeed with CUDA 12.3, but libcudart.so is not found

Judging by the newness of this issue, maybe it's because bitsandbytes does not support cuda 12.3? What if you downgrade cuda

Shared tensors not correctly saved.

I have the same issue, “Removed shared tensor”. Transformers 4.35.2, using deepspeed on 1 gpu. Following the comments here, I disabled deepspeed and now it is saving correctly. I imagine...

Shared tensors not correctly saved.

Agree. I got the same issue when I just ran it on my 8gpu instance with deepspeed. I even downgraded to 4.35.0 and still have the same issue. basically my...

Shared tensors not correctly saved.

Bu the way, in case it matters, I am using deepspeed zero stage 0, but for Trainer it only began to use dp16 and gradient checkpointing and stuff when I...

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently

Conversation_chain_handler.py L140 change from a simple log to a raise error? There is so much stuff being printed in the log that the average person would miss the warning

[CODE IMPROVEMENT] if user specifies a “system” column but it doesnt exist, it should error out instead of continue running silently

No, it doesn’t have to do with train vs valid. Just use any csv file, and in your config.yaml for training, type system=“column_that_doesnt_exist”. The code will still run, it will...

[FEATURE]when saving multiple epochs add an epoch number suffix for when save best=False

Example: after epoch 1 it saves checkpoint_ep01.pth after epoch 2 it saves checkpoint_ep02.pth when loading mode back in according to config, it by default will load in sorted(glob(“checkpoint_ep*”))[-1] aka the...

[FEATURE]when saving multiple epochs add an epoch number suffix for when save best=False

> We didnt do that by default as model weights take a ton of disk space. > > We could theoretically make it a separate setting to additionally save all...