daejin issues

Repositories
Issues
Comments

Results 3 issues of


                                            daejin

Limited prunable parameters (like shortcut, ..)

Uploaded codes in repo does not seemed to support some parameters such as shortcut, batch-norm and bias terms. Does LAP only work on weights except above terms?

NaN Gradient Issue with Gemma2 Model Training

I'm encountering an issue where gradients become NaN during the training of the Gemma2 model with transformers and flash-attn. I used soft-capping for training. Environment: transformers @ git+https://github.com/huggingface/transformers.git@ac946aac257cadfa8264fa4a284cd0ea1061c5b5 flash-attn==2.6.1 torch==2.3.1

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models

In the recent commit, I have noticed an inconsistency in the configuration of the `query_pre_attn_scalar` parameter between the 9B and 27B models in this repository. Specifically: In the 9B model,...

bug

stat:awaiting response