heyzude

Results 4 comments of heyzude

Sorry. Your implementation is right. My statement above is wrong.

See tf.reduce_sum() at line 462 of bmaml.py. I think the author's implementation is correct.

Hi, and thanks for sharing your work! Could you elaborate more on which specific part of Megatraon-LM (https://github.com/NVIDIA/Megatron-LM) you used?

No I mean, the code does not seem to update the vllm weight in proper way.