GradCache issues

Can you please publish this to pypi please

2

the batchsize with the gradcache

8

Dear writer, Your work is very good to me, I want to mix the SimCLR,but I don't know how to do because I find the gradcache without batchsize, but the...

here101

AttributeError: 'GCTrainer' object has no attribute 'scaler'

1

Hi @luyug, any idea on how to fix this? 04/14/2022 15:48:04 - INFO - tevatron.trainer - Initializing Gradient Cache Trainer Traceback (most recent call last): File "/home/odunayo/anaconda3/envs/tevatron_env/lib/python3.9/runpy.py", line 197, in...

ToluClassics

Example with pytorch lightning

3

Do you have example of this with pytorch lightning by any chance? Thanks for the beautiful work.

shaileshj2803

How does this provide the same gradient as a larger batch size?

6

Looking through the code, I notice that there are mini-batches consisting of just negative examples that appear to be ignored entirely. If the code ignores certain combinations, how does using...

sameerkhanna786

functional approach with distributed training

3

Thank you for the great work! Could you please provide some examples about functional approach with distributed multi-gpu training?

kevinlin311tw

How to handle BatchNorm ?

1

BatchNorm is very common in CV models, when training = True, the running statistics in BatchNorm layers is changing in every chunk.

heleifz

Role of dot product operation in forward-backward pass

Hello, When reading the implementation, I noticed that in the forward-backward pass, you used a dot-product before running the backward pass, specifically in the following line: https://github.com/luyug/GradCache/blob/0c33638cb27c2519ad09c476824d550589a8ec38/src/grad_cache/grad_cache.py#L241 I can't understand...

ahmed-tabib

Questions about training

Hi, it's a great work! We have three inputs designated as `i1`, `i2`, and `i3`, which are to be processed by the llama-7b. For input `i1`, I will extract two...

MikeDean2367

Multiple outputs implementation

1

Hello, Suppose my model returns multiple outputs. How should the functional approach be modified to handle this? Thanks.

Soumya-dutta

GradCache
GradCache copied to clipboard

Metadata

Can you please publish this to pypi please

the batchsize with the gradcache

AttributeError: 'GCTrainer' object has no attribute 'scaler'

Example with pytorch lightning

How does this provide the same gradient as a larger batch size?

functional approach with distributed training

How to handle BatchNorm ?

Role of dot product operation in forward-backward pass

Questions about training

Multiple outputs implementation

← Metadata

Owner

Metadata

GradCache GradCache copied to clipboard

Metadata

← Metadata

Owner

Metadata

GradCache
GradCache copied to clipboard