Mert Unsal
Mert Unsal
Same issue in Windows 10!
Yes - I agree with you that even if the function worked properly it does not achieve anything beyond integrating 1/cos(x). I think at the very least it should be...
We have already implemented this feature, please check `reward_model.launch_reward_fn_async=True` argument
Probably best way to do this is using async chat scheduler and collecting reward results at the end. For batch rewards, we have implemented a batch reward manager, please check...
Is this issue still there?
The speed seems to be extremely slow on 8xH100 with config (only difference is long context training) ``` PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \ NPROC_PER_NODE=8 \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ megatron sft \ --load 'Qwen3-Omni-30B-A3B-Instruct-mcore' \...
I get no error in `ds_report` as well but when I try to run some scripts with `verl` library and ulysses paralellism I face the same error.
I want to use the qwen2 template - if I train with the qwen2 template will model's chat template be automatically modified when pushed to hub? In other words, can...
I haven't - I will try. Either way, shouldn't 8xH100 memory be enough to train a 32B model? Also - what does `enable_liger_kernel` do?
I seem to get another error this time ```AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' ``` will try fixing this