matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: CN lost connection

Open fengttt opened this issue 2 years ago • 12 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

Running same test in #9646, but with a better machine (32G to 64G memory), you will see cn lost connection.

ERROR 20503 (HY000) at line 40: stream closed

Expected Behavior

No response

Steps to Reproduce

No response

Additional information

No response

fengttt avatar May 23 '23 18:05 fengttt

in local test:

create database if not exists db1;

use db1

drop table if exists t;

create table t (i int, j int);

insert into t values (1, 1), (2, 2), (3, 3), (4, 4), (5, null), (null, 5);

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

delete from t where i = 1;
delete from t where i = 2;
select count(*) from t;

insert into t select * from t;
select count(*) from t;

above script will result in: 2023/05/24 12:21:17.672100 +0800 ERROR logservicedriver/appender.go:70 append failed: internal error: message body 116720140 is too large, max is 104857600

volgariver6 avatar May 24 '23 06:05 volgariver6

dup with #9447

volgariver6 avatar May 31 '23 14:05 volgariver6

该问题的分析见https://github.com/matrixorigin/matrixone/issues/9447#issuecomment-1576675658

后续由 @triump2020 再做一些优化。

volgariver6 avatar Jun 05 '23 12:06 volgariver6

ERROR 20503 (HY000) at line 40: stream closed

出现该错误或者其他的连接关闭的错误的原因是,rpc框架中的gc任务会检查每个连接的活跃状态,当超过一定时间(默认1分钟)没有数据时,就会关闭这个连接。

delete语句的commit时间长,导致其连接被gc给close了,所以执行失败,需要优化。

volgariver6 avatar Jun 06 '23 06:06 volgariver6

depends on #9996

triump2020 avatar Jun 26 '23 06:06 triump2020

@jiangxinmeng1 delete related issue

XuPeng-SH avatar Jul 24 '23 05:07 XuPeng-SH

#10418 cannot fix this issue. This depends on relatively significant refactoring.

XuPeng-SH avatar Jul 31 '23 00:07 XuPeng-SH

还没有进展

jiangxinmeng1 avatar Oct 07 '23 10:10 jiangxinmeng1

depends on V1.1

triump2020 avatar Oct 12 '23 10:10 triump2020

depends on V1.1

triump2020 avatar Oct 18 '23 10:10 triump2020

it depends on #11805 #11804

XuPeng-SH avatar Oct 23 '23 01:10 XuPeng-SH

depends on #11471

XuPeng-SH avatar Jun 26 '24 14:06 XuPeng-SH

depends on https://github.com/matrixorigin/matrixone/issues/11805 https://github.com/matrixorigin/matrixone/issues/11804

jiangxinmeng1 avatar Jun 27 '24 01:06 jiangxinmeng1