gh-ost option to auto adjust chunk size

Similar to what pt-osc does, give the option to adjust chunk size dynamically to try to make chunks run in exactly --chunk-time seconds. In our testing this can make a big difference in reducing the time it takes to run a schema change. From pt-osc doc: The tool tracks the copy rate (rows per second) and adjusts the chunk size after each data-copy query, so that the next query takes this amount of time (in seconds) to execute. It keeps an exponentially decaying moving average of queries per second, so that if the server’s performance changes due to changes in server load, the tool adapts quickly.

Apr 06 '17 12:04 igroene

Here are some numbers to support the request.

In doing some basic comparison of time-to-run between pt-online-schema-change and gh-ost, comparable flags were set to attempt apples-to-apples settings.

gh-ost from master:

time ./gh-ost \
--allow-on-master \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas=s1 \
--max-lag-millis=1500 \
--host=m1 \
--user="ghost" \
--password="ghost" \
--database="d1" \
--table="t1" \
--verbose \
--alter="add column whatever varchar(50)" \
--cut-over=default \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--execute

and pt-osc: time pt-online-schema-change --alter "ADD COLUMN whatever varchar(50)" D=d1,t=t1 --user ghost --password ghost --recurse 1 --critical-load Threads_running=1000 --max-load Threads_running=25 --chunk-size 1000 --max-lag 1s --no-drop-old-table --execute

With these settings, running against 1.5 million rows, time-to-run was about the same. (FWIW, running gh-ost from a slave is also about the same, a single-digit %age increase in time.)

At one point in the testing, as a "mistake" pt-osc was executed with no specified flags: time pt-online-schema-change --alter "ADD COLUMN whatever varchar(50)" D=d1,t=t1 --user ghost --password ghost --recurse 1 --execute

With these settings, pt-osc was 40% faster. Using the following, it's possible to see the chunk size ratcheting up to around 35,000 as it progresses:

time PTDEBUG=1 pt-online-schema-change --alter "ADD COLUMN whatever varchar(50)" D=d1,t=t1 --user ghost --password ghost --recurse 1 --execute > ptosc.out2017Apr06 2>&1
less ptosc.out2017Apr06 | grep -i "set new chunk size"

Although gh-ost allows easy adjustment of chunk-size through the operational controls, the feature request is to add this self-adjustment for time-to-run optimization.

Apr 06 '17 12:04 dataindataout

@arthurnn perhaps this would be of interest to you?

Apr 14 '17 05:04 shlomi-noach

this might come in handy for any implementer https://github.com/VividCortex/ewma

Nov 07 '17 12:11 igroene

@igroene I actually already tried EWMA. It did not behave in a way I was satisfied with. The gh-ost progress can be very hectic, what with potentially long throttles on replication lag, followed by excellent progress, followed by, again, long throttles. EWMA does not model well around these.

Also, on very large tables migrations have the tendency to run slower and slower (because indexes get bloated) ; I don't know what algorithm would adapt to this behavior, but EWMA did not adapt nicely.

Nov 07 '17 12:11 shlomi-noach

That is interesting, I was expecting to see a similar behaviour to pt-osc where the feature does help a lot with reducing migration times. Do you have any thoughts on why it could be? I will continue to think about this

Nov 07 '17 14:11 igroene

Any new progress about this issue? This is the same thing i am concerning.

Jan 15 '19 06:01 lujinke

There is no progress on this issue. I'm unlikely to tackle it in the near future.

Jan 15 '19 07:01 shlomi-noach

Hi, I understand currently set limit for chunk-size is 100K. IS there anyway we can modify and increase the number to say 200k or so.

Mar 23 '23 09:03 geddamurisatish