lllyyyqqq comments

Repositories
Issues
Comments

Results 4 comments of


                                            lllyyyqqq

Example script for continued pre-training?

Hi, I am also interested in the pre-training part. A script would be the best. Also, a quick question, what's the sequence length setting in the pre-training? 4k or 32k?

Bad performance for three generation modes.

I can't get good results using Qwen3-Embedding-8B neither.

Bad performance for three generation modes.

As I can observe, training loss decrease slowly to 1.3 after 30000 steps. Prediction and groud truth are about different objectives. Prediction sentence not complete. The reason I choose Qwen3...

Bad performance for three generation modes.

I have changed to 4B model, it seems already much better after 200 steps training, looks like embedding dimension is the issue.