doris-flink-connector icon indicating copy to clipboard operation
doris-flink-connector copied to clipboard

[Bug] MySQL blob type should not be converted to base64 encoded string in Doris

Open trikker opened this issue 2 years ago • 0 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Version

1.4.0

What's Wrong?

After syncing mysql data to doris, the blob type is converted to doris text type because doris doesn't support blob type. But the data should not be converted to base64 encoded.

What You Expected?

The data should be hex encoded in doris text type or a config should be added to specify the encoder of blob/binary data from source database(e.g. MySQL).

How to Reproduce?

suppose there is a table tt in MySQL which contains blob data type:

CREATE TABLE `tt` (
  `id` int NOT NULL AUTO_INCREMENT primary key,
  `name` varchar(20),
  `bb` blob,
  `tt` text);

start flink and run the following flink job, you need the below two jars to run the job:

flink-doris-connector-1.17-1.4.0.jar flink-sql-connector-mysql-cdc-2.4.2.jar

bin/flink run -d \
    -Dexecution.checkpointing.interval=10s \
    -Dparallelism.default=1 \
    -c org.apache.doris.flink.tools.cdc.CdcTools \
    lib/flink-doris-connector-1.17-1.4.0.jar \
    mysql-sync-database \
    --database testdb \
    --job-name flink_sync_mysql_to_doris_testdb \
    --mysql-conf hostname=xxxx \
    --mysql-conf port=3360 \
    --mysql-conf username=flink \
    --mysql-conf password=flink \
    --mysql-conf database-name=testdb \
    --including-tables ".*" \
    --sink-conf fenodes=xxxx:8030 \
    --sink-conf username=flink \
    --sink-conf password=flink \
    --sink-conf jdbc-url=jdbc:mysql://xxxx:9030 \
    --sink-conf sink.label-prefix=labeltest1898 \
    --table-conf replication_num=1

insert a record into MySQL:

mysql> insert into tt  values(1, 'aaa', 'abcd中国', 'abcd中国');
Query OK, 1 row affected (0.00 sec)

mysql> select * from tt;
+----+------+------------------------+------------+
| id | name | bb                     | tt         |
+----+------+------------------------+------------+
|  1 | aaa  | 0x61626364E4B8ADE59BBD | abcd中国   |
+----+------+------------------------+------------+
1 row in set (0.00 sec)

The data in doris:

mysql> select * from tt;
+------+------+------------------+------------+
| id   | name | bb               | tt         |
+------+------+------------------+------------+
|    1 | aaa  | YWJjZOS4reWbvQ== | abcd中国   |
+------+------+------------------+------------+
1 row in set (0.03 sec)

The data "YWJjZOS4reWbvQ==" is base64 encoded.

mysql> select id, name, from_base64(bb), tt from tt;
+------+------+-----------------+------------+
| id   | name | from_base64(bb) | tt         |
+------+------+-----------------+------------+
|    1 | aaa  | abcd中国        | abcd中国   |
+------+------+-----------------+------------+
1 row in set (0.03 sec)

Anything Else?

It is strange and I haven't found the problem in the code. I searched "base64" in the code and don't find anything suspicious. Maybe the code is in its dependency?

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

trikker avatar Nov 20 '23 09:11 trikker