heyzude
heyzude
Sorry. Your implementation is right. My statement above is wrong.
See tf.reduce_sum() at line 462 of bmaml.py. I think the author's implementation is correct.
Hi, and thanks for sharing your work! Could you elaborate more on which specific part of Megatraon-LM (https://github.com/NVIDIA/Megatron-LM) you used?
No I mean, the code does not seem to update the vllm weight in proper way.