A data-loss case occurs when chunk-key contains decimal-column
Hello:
We tried to recycle data-free space by alter table tab engine = innodb from a table.
After gh-ost executed, no error coccurs, but data is lost:
gh-ost version is 1.1.2
mysql version is 5.6
gh-ost cmd is:
./gh-ost \
--host=${IP} --port=${Port} --user=${user} --password=${password} --database=${db}--table=NewInstance \
--alter=engine=innodb \
--switch-to-rbr \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1000 \
--heartbeat-interval-millis=2000 \
--throttle-control-replicas=${IP_list} \
--cut-over=default --exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--dml-batch-size=10 \
--nice-ratio=0 \
--serve-socket-file=./tmp/gh-ost.sock \
--throttle-flag-file=./tmp/gh-ost.pause.flag \
--ok-to-drop-table \
--cut-over-exponential-backoff=true \
--exponential-backoff-max-interval=1024 \
--panic-flag-file=./tmp/gh-ost.panic.flag \
--postpone-cut-over-flag-file=flag \
--verbose \
--execute \
the create- table info is:
CREATE TABLE `NewInstance` (
`InstanceID` decimal(32,0) NOT NULL DEFAULT '0' ,
`InstanceStatus` tinyint(4) NOT NULL,
`TriggerTime` datetime NOT NULL,
`TriggerDate` date NOT NULL,
`BeginTime` datetime DEFAULT NULL,
`EndTime` datetime DEFAULT NULL,
`LastRetryTime` datetime DEFAULT NULL,
`Retry` int(11) DEFAULT NULL,
`Parameters` text,
`RunParameters` text,
`Status` tinyint(4) NOT NULL,
`TaskID` bigint(20) NOT NULL,
`job_id` varchar(100) NOT NULL DEFAULT '0',
PRIMARY KEY (`InstanceID`),
KEY `NewInstace_TaskID` (`TaskID`),
KEY `idx_InstanceStatus_Status` (`InstanceStatus`,`Status`),
KEY `idx_job_id` (`job_id`),
KEY `idx_TriggerTime` (`TriggerTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# before alter
> select count(1) from NewInstance;
+----------+
| count(1) |
+----------+
| 6367309 |
+----------+
1 row in set (0.76 sec)
# after alter
> select count(1) from NewInstance;
+----------+
| count(1) |
+----------+
| 6352445 |
+----------+
1 row in set (0.82 sec)
The cause has been located and fixed !
It's because the unique-key contains decimal-type-column, reference: 链接
- If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value.
- In all other cases, the arguments are compared as floating-point (double-precision) numbers. For example, a comparison of string and numeric operands takes place as a comparison of floating-point numbers.
the query-sql and insert-sql of row-copy sql is like:
select /* gh-ost `test_routine`.`NewInstance` iteration:6347 */
`InstanceID`
from
`test_routine`.`NewInstance`
where ((`InstanceID` > _binary'2708201202204012300000012')) and ((`InstanceID` < _binary'257980120220318000000004000') or ((`InstanceID` = _binary'257980120220318000000004000')))
order by `InstanceID` asc
limit 1
offset 999;
insert /* gh-ost `test_routine`.`NewInstance` */ ignore into `test_routine`.`_NewInstance_gho` (`InstanceID`, `InstanceStatus`, `TriggerTim
e`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id`)
(select `InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskI
D`, `job_id` from `test_routine`.`NewInstance` force index (`PRIMARY`)
where (((`InstanceID` > _binary'2708201202204012300000012')) and ((`InstanceID` < _binary'2708201202204062200000015') or ((`InstanceID` = _binary'2708201202204062200
000015')))) lock in share mode
)
one of lost data records is primary-key InstanceID = _binary'80700020220314193000'
It should be contained in row-copy sql like this, BUT not
insert /* gh-ost `test_routine`.`NewInstance` */ ignore into `test_routine`.`_NewInstance1_gho` (`InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id`)
(select `InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id` from `test_routine`.`NewInstance` force index (`PRIMARY`)
where (((`InstanceID` > _binary'80700020220314190000')) and ((`InstanceID` < _binary'80700020220405100000') or ((`InstanceID` = _binary'80700020220405100000')))) lock in share mode
)
you can reproduce this case simplely by:
> CREATE TABLE `test_decimal` (
`id` decimal(32,0) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
> insert into test_decimal values('80700020220314193000');
# expect: 1 row, got 0 row
> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
one of the right solution is adding cast as follows:
# expect: 1 row, got 1 row -- perfect
select count(1) from test_decimal where (((`id` > cast(_binary'80700020220314190000' as decimal(32)))) and ((`id` < cast(_binary'80700020220405100000' as decimal(32,0))) or ((`id` = cast(_binary'80700020220405100000' as decimal(32,0))))));
we have fixed this bug and try to contribute patch gh-ost community.
Thank you! From wangzhanbing Baidu
Thanks for your detailed report! I tried to reproduce using MySQL 5.6, 5.7, and 8.0 and I'd like to share the results. It seems that this is only a problem in 5.6.
mysql> CREATE TABLE `test_decimal` (
-> `id` decimal(32,0) NOT NULL DEFAULT '0',
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.05 sec)
mysql> insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.01 sec)
mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 5.6.51 |
+-----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE `test_decimal` (
-> `id` decimal(32,0) NOT NULL DEFAULT '0',
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.00 sec)
mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.23 |
+-----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE `test_decimal` (
-> `id` decimal(32,0) NOT NULL DEFAULT '0',
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.00 sec)
mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.27 |
+-----------+
1 row in set (0.00 sec)