dumpling icon indicating copy to clipboard operation
dumpling copied to clipboard

replace go-sql-connector/mysql with siddontang/go-mysql/client to improve dumpling's performance

Open lichunzhu opened this issue 5 years ago • 25 comments

Description

Background

Currently dumpling's performance is only 1/2 to 2/3 of mydumper. There are two parts that cost a lot of time for dumpling.

  1. After analyzing the torch graph and doing some simple tests we find that dumpling costs a lot of time in fetch one row.

That's because driver.Value in database/sql package is an interface{} type variable.

When we convert []byte type varible to interface{} type, it will use runtime.mallocgc in runtime.convTslice to do that which will cost a lot of time, but we can't change driver.Value.

One solution is to abandon the usage of database/sql and directly use the []byte value readed from mysql server. But this is a huge change for dumpling.

  1. Now dumpling will do these things in serial:

read a row -> escape string -> write to buffer -> read next row ...

Actually, when we escape the value, we can start to read another row to improve the performance. But it seems hard for database/sql package to implement this function, which means we may have to implement the MySQL client by ourselves.

Reference

  1. convertion code in go-mysql: https://github.com/go-sql-driver/mysql/blob/73dc904a9ece5c074295a77abb7a135797a351bf/packets.go#L770
  2. dumpling torch graph: image
  3. mydumper torch graph: image
  4. code to test the effiency when assign []byte to interface{} assign []byte to interface{} will cost much more time than assign to []byte in this test. https://gist.github.com/lichunzhu/2433d332b4bfc57fb7c1aa3f404b4c58
  5. Test: If we use dumpling with only scan (disable escape and write), it will cost the same time as mydumper both write and read. Revelant torch: image

Tasks

  • improve dumpling's performance, make it better than mydumper (for both single-threaded and multi-threaded running)
    • one possible approach is to replace go-sql-connector/mysql with siddontang/go-mysql/client to improve dumpling's performance. What's more, we need to refactor this package to parallel reading from database, escaping chapters and writing to disks.

Score

  • 6600

Mentor

  • @kennytm

Recommended Skills

  • performance improvement for golang

lichunzhu avatar Aug 04 '20 08:08 lichunzhu

append some information which maybe helpful:

runtime.mallocgc in runtime.convTslice will use per-goroutine's mcache to allocate the space for storing internal interface structure. mcache has preserved some fixed-size slot for allocation, when ran out, mcache will be dynamically resized from other memory.

lance6716 avatar Aug 04 '20 13:08 lance6716

/ping

0xPoe avatar Sep 17 '20 06:09 0xPoe

pong! I am challenge bot.

ti-challenge-bot[bot] avatar Sep 17 '20 06:09 ti-challenge-bot[bot]

This issue does not belong to any SIG.

More

Tip : Currently, we only support sig labels starting with sig/, maybe you should add this type of label.

Warning: None

ti-challenge-bot[bot] avatar Sep 17 '20 06:09 ti-challenge-bot[bot]

This issue does not belong to any SIG.

More

Tip : Currently, we only support sig labels starting with sig/, maybe you should add this type of label.

Warning: None

ti-challenge-bot[bot] avatar Sep 17 '20 06:09 ti-challenge-bot[bot]

@Rustin-Liu shouldn't the bot recognize that this entire repo belongs to SIG-Migrate? 😂

kennytm avatar Sep 17 '20 06:09 kennytm

@kennytm https://github.com/tikv/pd/pull/2981 We can add a config for it.

0xPoe avatar Sep 17 '20 06:09 0xPoe

ok thanks

kennytm avatar Sep 17 '20 06:09 kennytm

/assign @kennytm

AndreMouche avatar Sep 23 '20 06:09 AndreMouche

/pick-up

sylzd avatar Sep 27 '20 06:09 sylzd

The challenge program issue is already in the assign flow, so you cannot pick up this issue. But the current issue needs help, you can contact @kennytm to try to solve this issue together.

ti-challenge-bot[bot] avatar Sep 27 '20 06:09 ti-challenge-bot[bot]

/pick-up

sylzd avatar Sep 27 '20 06:09 sylzd

Pick up success.

ti-challenge-bot[bot] avatar Sep 27 '20 06:09 ti-challenge-bot[bot]

@sylzd You did not submit PR within 7 days, so give up automatically.

ti-challenge-bot[bot] avatar Oct 04 '20 07:10 ti-challenge-bot[bot]

/pick-up

sylzd avatar Oct 04 '20 09:10 sylzd

Pick up success.

ti-challenge-bot[bot] avatar Oct 04 '20 09:10 ti-challenge-bot[bot]

@sylzd You did not submit PR within 7 days, so give up automatically.

ti-challenge-bot[bot] avatar Oct 11 '20 09:10 ti-challenge-bot[bot]

/pick-up

sylzd avatar Oct 13 '20 06:10 sylzd

Pick up success.

ti-challenge-bot[bot] avatar Oct 13 '20 06:10 ti-challenge-bot[bot]

@sylzd You did not submit PR within 7 days, so give up automatically.

ti-challenge-bot[bot] avatar Oct 20 '20 07:10 ti-challenge-bot[bot]

@sylzd hello, there hasn't been much updates since the pick up, do you need some help?

kennytm avatar Oct 20 '20 07:10 kennytm

@kennytm yes, replace driver may not help,it cost double. we are trying to replace database/sql or something else.

  1. our debug flame graph result shows that only 7.44% convTslice cost. Command:
time ./dumpling -h 10.162.1.1 -u dump_test2 -p* -P 4000 --filetype sql -r 10000 --threads 16 -B sbtest -o dumpling_output

FlameGraph: image

  1. when we replace go-sql-driver with go-mysql, convTslice only cost 1.22% and real time is double than before.(it cost too much to query) image image

sylzd avatar Oct 20 '20 08:10 sylzd

Do you replace go-sql-driver with go-mysql/driver? Or do you go-sql-driver and database/sql with go-mysql/client?

lichunzhu avatar Oct 20 '20 11:10 lichunzhu

/pick-up

sylzd avatar Oct 20 '20 13:10 sylzd

Pick up success.

ti-challenge-bot[bot] avatar Oct 20 '20 13:10 ti-challenge-bot[bot]