X_Bee

Results 4 comments of X_Bee

My solution in `llama/convert_checkpoint.py` ```python ... # about line 666 def get_tllm_linear_weight(weight, prefix, bias=None, use_weight_only=False, plugin_weight_only_quant_type=torch.int8, dtype='float32', use_gemm_woq_plugin=True, postfix='weight'): results = {} print(f"{weight.shape=}") if use_weight_only: if len(weight.shape)==3: v = weight.permute(0,...

不行 。Author 使用的Java序列化对象构建的模型,只能用FeatureIndex实例的转码跟解码类对象解析

I felt the same doubt. why Just use out_channel=8 ,uses the mean (the first 4 channels) ,dorpout bias? Maybe,Update latent_z by grad without bias ``` dt = timesteps[i] - timesteps[i...

> 这是弃用的代码,实际上我们用rope。 Why cross-attn module set `use_rope=False, `?