matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: create hnsw index which table 10M rows reported internal error: commitUnsafe

Open heni02 opened this issue 4 months ago • 3 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Branch Name

3.0-dev

Commit ID

630316e96

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

1000万行向量数据create hnsw index报错internal error: commitUnsafe ,偶现不是必现 2025-09-19 11:01:25 ERROR AnnInitializer:215 - context deadline exceeded internal error: commitUnsafe java.sql.SQLException: context deadline exceeded internal error: commitUnsafe at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:764) at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:648) at io.mo.tcase.ann.AnnInitializer.createIndexes(AnnInitializer.java:212) at io.mo.tcase.ann.AnnInitializer.init(AnnInitializer.java:73) at io.mo.parser.CaseConfigParser.parseAnnTypeScripts(CaseConfigParser.java:632) at io.mo.parser.CaseConfigParser.parseTransaction(CaseConfigParser.java:155) at io.mo.parser.CaseConfigParser.parse(CaseConfigParser.java:69) at io.mo.RUN.parseAndInitCase(RUN.java:150) at io.mo.RUN.main(RUN.java:82)

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/17832226240/job/50769740089

Image

log:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22TLD%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-branch-commit-630316e96-20250918%5C%22%7D%20%7C%3D%20%60commitUnsafe%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221758279618710%22,%22to%22:%221758279832925%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

tke regression hnsw deep96 benchmark test

Additional information

No response

heni02 avatar Sep 19 '25 11:09 heni02

看情况是超时了,create index 超过26分钟,但是还没找到哪些timeout的设置

Image

LeftHandCold avatar Sep 22 '25 02:09 LeftHandCold

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22TLD%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-branch-commit-630316e96-20250918%5C%22%7D%20%7C%3D%20%60b669992ed17708e5186667cb13e02ad2%60%20%7C%3D%20%60INSERT%20INTO%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221758276043000%22,%22to%22:%221758279710000%22%7D%7D%7D&schemaVersion=1&orgId=1 commit耗时太高,一般其他正常的job不超过2分钟,但是这个需要6分钟

LeftHandCold avatar Sep 22 '25 03:09 LeftHandCold

hnsw index create 结束后才会将统计构造 SQL 将数据 insert into 到索引表中,这会连续写入大量的数据,这个 case 中,会连续写入 8G 数据,所以会出现 commit 耗时过高。Index create 和 insert into应该可以一定程度的并行

gouhongshen avatar Sep 24 '25 08:09 gouhongshen