lllyyyqqq

Results 4 comments of lllyyyqqq

Hi, I am also interested in the pre-training part. A script would be the best. Also, a quick question, what's the sequence length setting in the pre-training? 4k or 32k?

I can't get good results using Qwen3-Embedding-8B neither.

As I can observe, training loss decrease slowly to 1.3 after 30000 steps. Prediction and groud truth are about different objectives. Prediction sentence not complete. The reason I choose Qwen3...

I have changed to 4B model, it seems already much better after 200 steps training, looks like embedding dimension is the issue.