adol001 issues

Results 12 issues of


                                            adol001

Refine checkpoint by Parallel zip compression and decompression

### What changes are proposed in this pull request? Parallel zip compression and decompression can be used for RocksInodeStore. ### Why are the changes needed? Checkpoint is too slow for...

API Change

Refine add mPendingPaths in InodeSyncStream new type

### What changes are proposed in this pull request? In the current implementation of InodeSyncStream, when synchronizing a large directory, because mPendingPaths is a queue, the entire synchronization process is...

API Change

Speed up rocksdb checkpoint

**Is your feature request related to a problem? Please describe.** We now store 10 billion alluxio inode metadata using rocksdb. The current alluxio checkpoint will tar.gz rocksdb data file by...

priority-medium

type-feature

AlluxioJniFuseFileSystemTest#create fail

**Alluxio Version:** 2.8.1 **Describe the bug** mvn clean test -Phadoop-2 -Dhadoop.version=2.7.3 -pl 'integration/fuse' -Dtest=alluxio.fuse.AlluxioJniFuseFileSystemTest#create ```java when(mFileSystem.getStatus(any(AlluxioURI.class))).thenReturn(mock(URIStatus.class)); ``` ```java return Optional.of(CommonUtils.waitForResult("file completed", () -> { try { return fileSystem.getStatus(uri); } catch...

area-fuse

type-bug

预训练bge large 1.5，loss可以低到多少

loss到了4以后，再下降的速度就让人心焦，以此时的encoder model进行cmteb评测，分数很低预训练loss下降到多少可以进行finetune？求开发者提示一下，当时你们用3台a100训练了多长时间？

为啥对bge-reranker-large进行CMTEB的T2Reranking数据集测试，分数要大幅低于bge large1.5?

reranker "map": 0.5106724000079814, "mrr": 0.583640624496244 1.5 "map": 0.656140927550039, "mrr": 0.746971351731846 CMTEB是11月27日的版本，没有使用--add_instruction

希望知道如何解决bge1.5之前版本不相似句子之间的相似度分数很高的问题

2. 不相似句子之间的相似度分数很高建议使用bge v1.5，它缓解了相似度分布的问题。由于我们通过温度为0.01的对比学习来微调模型，当前BGE模型的相似度分布大约在[0.6, 1]区间内。因此，相似度大于0.6并不表示这两个句子相似。对于下游任务，如段落检索或语义相似性，重要的是分数的相对顺序，而不是绝对值。如果你需要根据相似度阈值过滤相似句子，请根据数据的相似度分布(如0.8,0.85，甚至0.9)选择合适的相似度阈值。是从1.5以后，温度改成0.02解决的吗？

Raft journal system pre-apply statemachine may cause inconsistency

**Alluxio Version:** 2.7.2 **Describe the bug** If delete a file and pre-apply. But this log is rejected by most nodes in the high-availability cluster (the new leader is elected), then...

priority-medium

type-bug

stale

bge m3如何进行预训练

有没有类似bge 1.5的那种预训练脚本？

关于mldr数据集

我用这个做了评测测试包括bge-m3， https://huggingface.co/Alibaba-NLP/gte-multilingual-base https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct 从中文ndcg10来看，都不怎么样，逊于bm25。 dense search 8k目前看起来暂时达不到传统搜索效果，能否出一个2k的mldr版本？