MarkYang comments

Repositories
Issues
Comments

Results 5 comments of


                                            MarkYang

How is the contrastive data pipeline implemented?

@hxs91 My hypothesis is that FoT is using a similar training strategy to Recurrent Memory Transformer, if you want to train a local context of 2k with 4 segments, you...

Agentic RL Support in GPT-OSS

@JasonZhu1313 can you share your config for running flex attention on gsm8k?

Out of Memory running Qwen3-30B-A3B with 32k sequence length on 4 nodes (32 GPUs)

Hi @jiaqiw09, I wonder if you ever fixed this issue?

Out of Memory running Qwen3-30B-A3B with 32k sequence length on 4 nodes (32 GPUs)

@jiaqiw09 Thanks jiaqi, I later managed to solve it by increasing TP 1->4, but I'll also try your method.

Qwen3-30B-A3B OOM with GRPO on 4x8H200 141G

@qingyujean Did you eventually fix this problem?