Generated columns index replacement rule not support JSON functions like MySQL
Bug Report
Please answer these questions before submitting your issue. Thanks!
In MySQL, use this way, optimizer will look for compatible virtual columns with indexes that match the expression in JSON queries:
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.26 |
+-----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE person (
-> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
-> NAME VARCHAR(255) NOT NULL,
-> address_info JSON,
-> city VARCHAR(64) AS (JSON_UNQUOTE(address_info->"$.city")),
-> KEY (city)
-> );
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO `person` (`id`, `NAME`, `address_info`) VALUES('1','a','{\"city\": \"Beijing\"}');
Query OK, 1 row affected (0.01 sec)
mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
+----------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on person using city (city='Beijing') (cost=0.35 rows=1) (actual time=0.026..0.029 rows=1 loops=1)
|
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
But same SQL in TiDB, the query will be TableFullScan:
SELECT VERSION();
+--------------------+
| VERSION() |
+--------------------+
| 5.7.25-TiDB-v6.1.0 |
+--------------------+
1 row in set (0.00 sec)
mysql> CREATE TABLE person (
-> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
-> NAME VARCHAR(255) NOT NULL,
-> address_info JSON,
-> city VARCHAR(64) AS (JSON_UNQUOTE(address_info->"$.city")),
-> KEY (city)
-> );
EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
Query OK, 0 rows affected (0.08 sec)
mysql> INSERT INTO `person` (`id`, `NAME`, `address_info`) VALUES('1','a','{\"city\": \"Beijing\"}');
Query OK, 1 row affected (0.01 sec)
mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_4 | 8000.00 | 1 | root | | time:640µs, loops:2, Concurrency:5 | test.person.name, test.person.id, test.person.city | 19.2 KB | N/A |
| └─TableReader_7 | 8000.00 | 1 | root | | time:586.1µs, loops:2, cop_task: {num: 1, max: 489.7µs, proc_keys: 1, rpc_num: 1, rpc_time: 472.4µs, copr_cache_hit_ratio: 0.00} | data:Selection_6 | 328 Bytes | N/A |
| └─Selection_6 | 8000.00 | 1 | cop[tikv] | | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 72, total_keys: 2, rocksdb: {delete_skipped_count: 1, key_skipped_count: 2, block: {cache_hit_count: 1, read_count: 0, read_byte: 0 Bytes}}} | eq(json_unquote(cast(json_extract(test.person.address_info, "$.city"), var_string(16777216))), "Beijing") | N/A | N/A |
| └─TableFullScan_5 | 10000.00 | 1 | cop[tikv] | table:person | tikv_task:{time:0s, loops:1} | keep order:false, stats:pseudo | N/A | N/A |
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+
It seems the Generated columns index replacement rule not work in this case.
2. What did you expect to see? (Required)
Generated columns index replacement rule supports JSON functions, like MySQL.
3. What did you see instead (Required)
Current version of TiDB not support.
4. What is your TiDB version? (Required)
6.1.0
Hi, @sunnyfun888 JSON_UNQUTE is a special function that hard to apply the optimization. You can work around with
mysql> alter table person add index idx((cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64))));
Query OK, 0 rows affected (2.54 sec)
mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64)) = 'Beijing';
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_4 | 10.00 | 0 | root | | time:892.8µs, loops:1, Concurrency:OFF | test.person.name, test.person.id, test.person.city | 2.23 KB | N/A |
| └─Projection_12 | 10.00 | 0 | root | | time:889µs, loops:1, Concurrency:OFF | test.person.id, test.person.name, test.person.city, cast(json_extract(test.person.address_info, $.city), var_string(64)) | 2.86 KB | N/A |
| └─IndexLookUp_11 | 10.00 | 0 | root | | time:887.8µs, loops:1, | | 163 Bytes | N/A |
| ├─IndexRangeScan_9(Build) | 10.00 | 0 | cop[tikv] | table:person, index:idx(cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64))) | time:577.9µs, loops:1, cop_task: {num: 1, max: 376.3µs, proc_keys: 0, rpc_num: 1, rpc_time: 301.6µs, copr_cache_hit_ratio: 0.00}, tikv_task:{time:229.8µs, loops:0} | range:["Beijing","Beijing"], keep order:false, stats:pseudo | N/A | N/A |
| └─TableRowIDScan_10(Probe) | 10.00 | 0 | cop[tikv] | table:person | | keep order:false, stats:pseudo | N/A | N/A |
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
5 rows in set (0.01 sec)
Hi @wjhuang2016, We're trying to migrate a legacy system from MySQL to TiDB. During the migration research, this issue was found. There're a variety of data types in the json(string, number, datetime), all was fine when use virtual columns with indexes in MySQL, if use this workaround, the application need to record every data types instead of DB, maybe the cost will be a little big.
Hi @wjhuang2016, We're trying to migrate a legacy system from MySQL to TiDB. During the migration research, this issue was found. There're a variety of data types in the json(string, number, datetime), all was fine when use virtual columns with indexes in MySQL, if use this workaround, the application need to record every data types instead of DB, maybe the cost will be a little big.
The problem is that the expression type and the generated column type need to be strictly equal.
MySQL doesn't follow it, and it causes a bug in MySQL.
For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation.
If we replace it by city, the result may be wrong.
The problem is that the expression type and the generated column type need to be strictly equal. MySQL doesn't follow it, and it causes a bug in MySQL. For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation. If we replace it by
city, the result may be wrong.
I think all other virtual columns use expressions will meet the same problem, infact in MySQL if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation is rollbacked. The behavior of MySQL is more like syntactic sugar.
The problem is that the expression type and the generated column type need to be strictly equal. MySQL doesn't follow it, and it causes a bug in MySQL. For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation. If we replace it by
city, the result may be wrong.I think all other virtual columns use expressions will meet the same problem, infact in MySQL if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation is rollbacked. The behavior of MySQL is more like syntactic sugar.
@sunnyfun888 For the strict sql-mode, it would report an error. But if in the empty sql-mode, it can insert successfully.
if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation will be rollbacked
It's not true if the sql_mode is "".
set @@sql_mode="";
If users can accept the possible wrong result, it's ok the do this optimization. But when designed this feature, we tend to choose the correctness.
If users can accept the possible wrong result, it's ok the do this optimization. But when designed this feature, we tend to choose the correctness.
I agree and can use a switch then leave the choice to users.
it is a mysql bug, consider the following case
CREATE TABLE person (id INT PRIMARY KEY,address_info JSON,city VARCHAR(2) AS (JSON_UNQUOTE(address_info->"$.city")),KEY (city));
set @@sql_mode='';
INSERT INTO `person` (`id`, `address_info`) VALUES('1','{\"city\": \"Beijing\"}'); -- success with warning
set @@sql_mode=default;
SELECT id FROM person ignore index(`city`) WHERE address_info->>"$.city" = 'Beijing'; -- 1 record
SELECT id FROM person force index(`city`) WHERE address_info->>"$.city" = 'Beijing'; -- 0 record
use or not use an index will get a different result, it is absolutely a bug
close due to MySQL has confirmed it is a bug https://bugs.mysql.com/bug.php?id=108114