chrisliu298

Results 4 comments of chrisliu298

I have a distributed inference script for my own use case [here](https://gist.github.com/chrisliu298/57c9654f43a27b6538e1cc8e58611942), which might help. The code assumes `ds` is a Hugging Face dataset of the following structure: ``` Dataset({...

When you run the script, did you set `"attn_implementation": "flash_attention_2"`? I noticed a performance degradation when flash attention is not used for `gemma-2-27b-it`. However, this is not the case for...

@ToSev7en, the issue occurred when I used a gemma-2-27b-it trained as a sequence classifier. However, I'm not sure if the same problem would happen with a generative model. I'll try...

It seems like the symptom I described above is different from the one described by @ToSev7en, because I am training a sequence classifier. However, they might be related. In my...