mollon650

Results 5 comments of mollon650

@jinfagang can u show your code how to conver the model to onnx, thanks

@siddharth9820 thanks for your reply,I have another question about the code, `if not fp16_master_weights_and_gradients: self.single_partition_of_fp32_groups.append(self.parallel_partitioned_bit16_groups[i][partition_id].to( self.device).clone().float().detach()) else: self.single_partition_of_fp32_groups.append(self.parallel_partitioned_bit16_groups[i][partition_id].to( self.device).clone().half().detach()) self.single_partition_of_fp32_groups[ i].requires_grad = True # keep this in case internal optimizer...

device index is used for export buffer , how to get the index from driver in a better method? @stellaraccident

@cuichenx why changing the wrapper alone would not work? RMSNorm (Root Mean Square Normalization) does not operate across tokens; rather, it normalizes independently for each token. Specifically, RMSNorm is applied...