matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: insert into select is very slow.

Open Ariznawlll opened this issue 1 year ago • 3 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch Name

main

Commit ID

ead69b441080825781ba62c0bc886f4f4b9f9ac5

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

10亿数据量 insert into select花了大约80分钟才成功image

job url(Load and insert test -> insert into select): https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8029862130/job/21936797207

log: http://175.178.192.213:30088/explore?panes=%7B%22ZYH%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-big-data-nightly-ead69b4%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221708793400000%22,%22to%22:%221708800600000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

create table  if not exists big_data_test.table_basic_for_load_1B(
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3)
);

load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='mo-big-data/1000000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_pk_for_load_1B fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true';

create table  if not exists big_data_test.table_with_pk_for_insert_1B(
id bigint primary key,
col1 tinyint,
col2 smallint,
col3 int,
col4 bigint,
col5 tinyint unsigned,
col6 smallint unsigned,
col7 int unsigned,
col8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(16,6),
col17 text,
col18 json,
col19 blob,
col20 binary(255),
col21 varbinary(255),
col22 vecf32(3),
col23 vecf32(3),
col24 vecf64(3),
col25 vecf64(3)
);


insert into big_data_test.table_with_pk_for_insert_1B(id,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25)  select * from big_data_test.table_with_pk_for_load_1B;

Additional information

No response

Ariznawlll avatar Feb 26 '24 10:02 Ariznawlll

load的SQL因为权限的问题本地还是没跑成功, 在本地的ubuntu上使用下面的SQL进行模拟, 并没有复现成功, 一亿规模的insert into select 耗时也是一分钟多一些. 算符合预期. 现在big_data_test是一周跑一次, 可以等后续看看有没有复现, 如果还有复现的话, 可能需要进一步测试一下特定数据类型比如说vecf的插入性能.

create table  if not exists t0(
col1 bigint,
col2 bigint,
col3 bigint,
col4 bigint,
col5 bigint,
col6 bigint,
col7 bigint,
col8 bigint,
col9 bigint,
col10 bigint,
col11 bigint,
col12 bigint,
col13 bigint,
col14 bigint,
col15 bigint,
col16 bigint,
col17 bigint,
col18 bigint,
col19 bigint,
col20 bigint,
col21 bigint,
col22 varchar(26),
col23 varchar(26),
col24 varchar(26),
col25 varchar(26)
);

insert into t0 select result, result, result, result, result, result, result, result, result, result, result, result,
 result, result, result, result, result, result, result, result, result, result || "22", result || "23", result || "24", result || "25" 
 from generate_series(1,100000000) g;
...
image

jensenojs avatar Mar 27 '24 09:03 jensenojs

测个十亿规模的看看耗时

without pk : 15 min 38.60s image

with pk : 22.10.85 image

jensenojs avatar Apr 01 '24 10:04 jensenojs

休假中

jensenojs avatar Apr 08 '24 10:04 jensenojs

无进展

jensenojs avatar Apr 12 '24 10:04 jensenojs

not working on it

jensenojs avatar Apr 17 '24 10:04 jensenojs

处理其他s-1

jensenojs avatar Apr 22 '24 10:04 jensenojs

无进展

jensenojs avatar Apr 25 '24 10:04 jensenojs

下面这个pr能修复insert into select慢的问题

  • https://github.com/matrixorigin/matrixone/pull/15771

现在还是要等这个pr进行daily正确性验证

  • https://github.com/matrixorigin/matrixone/pull/15763

jensenojs avatar Apr 29 '24 06:04 jensenojs

待测试

jensenojs avatar May 06 '24 10:05 jensenojs

同上

jensenojs avatar May 09 '24 10:05 jensenojs

4月29号的测试结果, 问题依然存在, 待定位 企业微信截图_72fd4540-3be7-431e-82fc-36dc19eb9116

jensenojs avatar May 09 '24 10:05 jensenojs

处理

  • https://github.com/matrixorigin/matrixone/issues/15809

jensenojs avatar May 23 '24 10:05 jensenojs

从pprof结果来看, 内存占用的优化在大数据量下对时间的提升还是很明显的

https://github.com/matrixorigin/matrixone/pull/17020#issuecomment-2179749788

jensenojs avatar Jun 20 '24 03:06 jensenojs

还需要继续优化, 现在有最新的问题 : https://github.com/matrixorigin/matrixone/issues/17143#issuecomment-2188493489

jensenojs avatar Jun 25 '24 10:06 jensenojs

最新的结果看似乎快了三十分钟

image

jensenojs avatar Jun 26 '24 10:06 jensenojs

等1.2-dev上的pr在合并之后看一下

jensenojs avatar Jul 01 '24 10:07 jensenojs

处理中移物联的insert pprof

jensenojs avatar Jul 04 '24 10:07 jensenojs

not working on it

jensenojs avatar Jul 09 '24 10:07 jensenojs

  • https://github.com/matrixorigin/matrixone/pull/17211

jensenojs avatar Jul 12 '24 10:07 jensenojs

1.2-dev commit-id: 74f04ebd0334312ef17f364fb498daa61585c582 image

main commit-id: c1751aa204f5e21e1c247299b2a7a1b3366a991a image

main与1.2-dev性能已经同步,性能从大约80min提升到45min左右。

Ariznawlll avatar Jul 17 '24 02:07 Ariznawlll

这个问题主要用于解决CTAS 性能慢的问题,针对这方面的问题,看起来已经优化到原有时间的一半左右。

另外,针对1.3.0 版本 OLTP 负载的性能指标也有和insert 相关的内容, 建议把这个场景放到1.3.0 版本的测试计划中,之后关闭这个issue, 并在1.3.0 版本中对该场景进行测试,需要达到的目标为:

  1. 1.3.0 版本的性能相较于1.2.2 版本不回退。
  2. 1.3.0 版本的性能,知道达到 mysql性能的20%

allengaoo avatar Jul 17 '24 05:07 allengaoo

这个问题主要用于解决CTAS 性能慢的问题,针对这方面的问题,看起来已经优化到原有时间的一半左右。

另外,针对1.3.0 版本 OLTP 负载的性能指标也有和insert 相关的内容, 建议把这个场景放到1.3.0 版本的测试计划中,之后关闭这个issue, 并在1.3.0 版本中对该场景进行测试,需要达到的目标为:

  1. 1.3.0 版本的性能相较于1.2.2 版本不回退。
  2. 1.3.0 版本的性能,知道达到 mysql性能的20%

@allengaoo 斌哥, 这里的第二个目标可能需要再看一下, 现在mo在大数据量下的insert into select不见得比mysql慢的.

jensenojs avatar Jul 17 '24 06:07 jensenojs

ok. 如果说MO 在这种场景下的性能会比mysql 高的话, 那这个地方建议对于测试的标准改为以下的内容: 1.3.0 版本的性能相较于1.2.2 版本不回退。 1.3.0 版本的性能,至少达到 mysql性能的120% ,MySQL 的 CTAS时间 / mo 的CTAS时间 >= 1.2

allengaoo avatar Jul 17 '24 06:07 allengaoo