[Bug]: [customer regression test] load data 'lost connection'
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
Branch Name
2.2-dev
Commit ID
9dc59d64
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/15539067482/job/43747743176
日志:(时间范围:2025-06-09 16:49:31 2025-06-09 17:01:15) https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%2210-222-1-128%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1
Expected Behavior
No response
Steps to Reproduce
trigger customer regrssion test
Additional information
No response
oom
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%22centos7-10-222-4-2%5C%22,%20filename%3D%5C%22%2Fdata%2Fcus_reg_bk%2Fmo%2Flogs%2Fmo-20250609_231554.log%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221749511224144%22,%22to%22:%221749511530199%22%7D%7D%7D&schemaVersion=1&orgId=1
每次大概mo-service 11G左右oom
bvt的峰值内存大概11G左右,跑完结束等待几分钟,大概res是8G。load tpch 期间增加4G左右的内存,宿主机15G,docker limit也是15G,如果跑完bvt 内存释放不及时,马上load tpch 1G,mo-service很可能超过14G,这样宿主机就会因为oom kill mo-service。手动测试了一下,跑完bvt 7.7G,load tpch峰值能到11G,load完成后过段时间又回到7.7G。感觉可以增加内存到18G就可以解决,或者调整mem cache为默认的512M,现在是1.5G
mem.zip 007 load 前 009 load 后 010 load 后等待几分钟