matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: [customer regression test] load data 'lost connection'

Open Ariznawlll opened this issue 7 months ago • 2 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Branch Name

2.2-dev

Commit ID

9dc59d64

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/15539067482/job/43747743176

Image

日志:(时间范围:2025-06-09 16:49:31 2025-06-09 17:01:15) https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%2210-222-1-128%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-24h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

trigger customer regrssion test

Additional information

No response

Ariznawlll avatar Jun 10 '25 04:06 Ariznawlll

oom

Image

LeftHandCold avatar Jun 10 '25 09:06 LeftHandCold

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%22centos7-10-222-4-2%5C%22,%20filename%3D%5C%22%2Fdata%2Fcus_reg_bk%2Fmo%2Flogs%2Fmo-20250609_231554.log%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221749511224144%22,%22to%22:%221749511530199%22%7D%7D%7D&schemaVersion=1&orgId=1

LeftHandCold avatar Jun 10 '25 09:06 LeftHandCold

每次大概mo-service 11G左右oom

LeftHandCold avatar Jul 18 '25 06:07 LeftHandCold

bvt的峰值内存大概11G左右,跑完结束等待几分钟,大概res是8G。load tpch 期间增加4G左右的内存,宿主机15G,docker limit也是15G,如果跑完bvt 内存释放不及时,马上load tpch 1G,mo-service很可能超过14G,这样宿主机就会因为oom kill mo-service。手动测试了一下,跑完bvt 7.7G,load tpch峰值能到11G,load完成后过段时间又回到7.7G。感觉可以增加内存到18G就可以解决,或者调整mem cache为默认的512M,现在是1.5G

LeftHandCold avatar Jul 28 '25 07:07 LeftHandCold

mem.zip 007 load 前 009 load 后 010 load 后等待几分钟

LeftHandCold avatar Jul 28 '25 07:07 LeftHandCold

Image

LeftHandCold avatar Jul 28 '25 07:07 LeftHandCold