ljy comments

Results 12 comments of

ljy

Ubuntu18.04环境会报如下错误

ubuntu20.04环境，总结一下我使用的各位老哥的解决方法第一个问题，使用/bin/bash install.sh 第二个问题第一问，https://gist.github.com/BoWang816/c2e9ce52ce03c59450bcf587b7d0f456 第二个问题第二问，在~/.vimrc 文件中加入如下三行: let Tlist_Show_One_File=1 "不同时显示多个文件的tag，只显示当前文件的 let Tlist_Exit_OnlyWindow=1 "如果taglist窗口是最后一个窗口，则退出vim let Tlist_Ctags_Cmd="/usr/bin/ctags" "将taglist与ctags关联

puck::PuckIndex::train: free(): invalid next size (normal)

> Hi dqxcj, Thank you for your feedback. We have tested with our dataset (80k, 32d) and did not encounter this issue. If possible, could you provide the specific parameters...

puck::PuckIndex::train: free(): invalid next size (normal)

在puck assign也偶尔会出现core ```log I1119 04:14:54.293910 289901 puck_index.cpp:749] puck_assign, thread_params.start_id = 0 points_count = 250367 feature_file_name = /data/home/ann/test/data/puck/20241118/index/all_data.feat.bin threadId = 0 I1119 04:14:54.293926 289902 puck_index.cpp:749] puck_assign, thread_params.start_id = 250367 points_count =...

puck::PuckIndex::train: free(): invalid next size (normal)

好的，感谢，我试试

puck::PuckIndex::train: free(): invalid next size (normal)

还是会core: ```log I1120 17:05:28.135869 589223 kmeans.cpp:182] true nsubset for KMEANS_PLUS_PLUS = 2048 I1120 17:05:42.524760 589223 puck_index.cpp:867] deviation error of init sub 27 pq codebook clusters is 6.39904e-06 I1120 17:05:42.530800 589223...

puck::PuckIndex::train: free(): invalid next size (normal)

> Hi dqxcj, > > We have noticed that your log shows: `I1120 17:06:41.142906 589223 puck_index.cpp:867] deviation error of init sub 32 pq codebook clusters is -1` > > If...

puck::PuckIndex::train: free(): invalid next size (normal)

> 这次的core是kmeans 收敛不了导致的。一般情况下，kmeans的训练数据集要>= 聚类中心的20倍，且是数据集是dense embedding基本上都会收敛。 > 有两个办法可以试试， 1.扩大数据规模 or 缩小聚类中心个数。 puck默认coarse & fine聚类中心训练数据规模500w，pq训练规模100w，这种量级下一般都会收敛。如果数据规模太小，可以试试缩小coarse_cluster_count 和 fine_cluster_count，观察一下“kmeans: reassigned”有没有这个报错。pq的训练聚类中心是256，只能扩大训练数据规模试试。 2.上面的办法解决不了的话，大概率是这个数据集分布不太适合kmeans。即使部分时候可以聚类成功，分组效果也会比较差从而导致召回率低。对于这种数据集，推荐使用tinker，设置index_type=2，使用较小的coarse_cluster_count 和 fine_cluster_count。好的，我试试，谢谢。

puck::PuckIndex::train: free(): invalid next size (normal)

> Hi dqxcj, > > We have noticed that your log shows: `I1120 17:06:41.142906 589223 puck_index.cpp:867] deviation error of init sub 32 pq codebook clusters is -1` > > If...