X_Bee comments

Results 4 comments of


                                            X_Bee

Mixtral convert error: t() expects a tensor with <= 2 dimensions, but self is 3D

My solution in `llama/convert_checkpoint.py` ```python ... # about line 666 def get_tllm_linear_weight(weight, prefix, bias=None, use_weight_only=False, plugin_weight_only_quant_type=torch.int8, dtype='float32', use_gemm_woq_plugin=True, postfix='weight'): results = {} print(f"{weight.shape=}") if use_weight_only: if len(weight.shape)==3: v = weight.permute(0,...

使用crf++训练出来的model能否在crf4j下使用

不行。Author 使用的Java序列化对象构建的模型，只能用FeatureIndex实例的转码跟解码类对象解析

The training problem

I felt the same doubt. why Just use out_channel=8 ,uses the mean (the first 4 channels) ,dorpout bias? Maybe,Update latent_z by grad without bias ``` dt = timesteps[i] - timesteps[i...

get_3d_sincos_pos_embed

> 这是弃用的代码，实际上我们用rope。 Why cross-attn module set `use_rope=False, `?