mysql_random_data_load
mysql_random_data_load copied to clipboard
FR: Use SHOW INDEX STATUS output to derive data distribution
I believe we could use output of SHOW INDEX STATUS to find out how many unique values each column should present.
+------------------------+------------+--------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+--------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| performance_by_affcode | 0 | PRIMARY | 1 | id | A | 2183073712 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 0 | affcode_per_day | 1 | profile_id | A | 1113 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 0 | affcode_per_day | 2 | ad_id_in_target | A | 8838355 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 0 | affcode_per_day | 3 | affcode | A | 33076874 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 0 | affcode_per_day | 4 | date | A | 2183073712 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 1 | campaign_id_date | 1 | campaign_id | A | 447 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 1 | campaign_id_date | 2 | date | A | 326563 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 1 | last_updated | 1 | last_updated | A | 13311425 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 1 | profile_ad_id_date | 1 | profile_id | A | 919 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 1 | profile_ad_id_date | 2 | ad_id_in_target | A | 10754057 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 1 | profile_ad_id_date | 3 | date | A | 2183073712 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 1 | profile_affcode | 1 | profile_id | A | 763 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 1 | profile_affcode | 2 | affcode | A | 31638749 | NULL | NULL | | BTREE | | |
| performance_by_affcode | 1 | profile_id_date | 1 | profile_id | A | 723 | NULL | NULL | YES | BTREE | | |
| performance_by_affcode | 1 | profile_id_date | 2 | date | A | 265387 | NULL | NULL | | BTREE | | |
+------------------------+------------+--------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The maximum cardinality value would be the total rows to generate, and you can nest-loop from largest to smallest. So if you had columns A, B and C with cardinality 2, 3 and 6 respectively you would end up with
A | B | C
---+---+----
1 | 1 | 1
2 | 2 | 2
1 | 3 | 3
2 | 1 | 4
1 | 2 | 5
2 | 3 | 6
[note for usage: if an index does not exists customer could provide SELECT COUNT(DISTINCT...) for the relevant columns and we could manually add lines to the output based on that to test on our side]