seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Feature][Connector-V2] Should multi-table source can set `result_table_name` independently?

Open Jetiaime opened this issue 2 years ago • 5 comments

Search before asking

  • [X] I had searched in the feature and found no similar feature requirement.

Description

I'm working on a project that it required to move data from the oracle to Clickhouse, and one database has hundreds or thousands of tables, and I'm not going to write it by my hand. So I'm looking forward to multi-table source. But I found that I cann't set the result_table_name on the table_list for each table, I wish I can write conf like this:

env {
  job.mode = "BATCH"
  parallelism = 4
}

source {
  Jdbc {
    url = "jdbc:oracle:thin:@host116:8085:XE"
    driver = "oracle.jdbc.OracleDriver"
    connection_check_timeout_sec = 100
    user = "system"
    password = "wdp123"
    table_list = [
        {
            table_path = "XE.WHS.TEST2"
            result_table_name = "test2"
        },
        {
            table_path = "XE.WHS.ADMINPRINCIPAL"
            result_table_name = "admin_principal"
        }
    ]
  }
}

sink {
  Console {
    source_table_name = "test2"
  }
  Console {
    source_table_name = "admin_principal"
  }
}

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Jetiaime avatar Apr 12 '24 07:04 Jetiaime

You can define the source like this:

source { Mysql-CDC { result_table_name = "tb1" base-url = "jdbc:mysql://127.0.0.1:3306/seatunnel" username = "root" password = "xxx" table-names = ["seatunnel.role"] } Mysql-CDC { result_table_name = "tb2" base-url = "jdbc:mysql://hadoop05:3306/db1" username = "root" password = "xxxx" table-names = ["db1.role"] } }

dailai avatar Apr 12 '24 07:04 dailai

You can define the source like this:

source { Mysql-CDC { result_table_name = "tb1" base-url = "jdbc:mysql://127.0.0.1:3306/seatunnel" username = "root" password = "xxx" table-names = ["seatunnel.role"] } Mysql-CDC { result_table_name = "tb2" base-url = "jdbc:mysql://hadoop05:3306/db1" username = "root" password = "xxxx" table-names = ["db1.role"] } }

Yeah, this is another way to finish it. But for my example, there are many common configs in my source. Is it a better way that I specific result_table_name for each table, rather than I copy the same config for many times. I know that result_table_name is a very basic configuration in the code. Is there any chance to put it into a table level configuration, not a datasource level.

Jetiaime avatar Apr 12 '24 07:04 Jetiaime

This is a demo maybe can help you. https://github.com/apache/seatunnel/blob/790150a54b559148657a599fa16bf00539a5ab74/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-1/src/test/resources/jdbc_mysql_source_and_sink_with_multiple_tables.conf

In this way, that's meaning we read two table in one source, then we can write data into one sink but with two table data:

source {
  Jdbc {
    url = "jdbc:oracle:thin:@host116:8085:XE"
    driver = "oracle.jdbc.OracleDriver"
    connection_check_timeout_sec = 100
    user = "system"
    password = "wdp123"
    table_list = [
        {
            table_path = "XE.WHS.TEST2"
        },
        {
            table_path = "XE.WHS.ADMINPRINCIPAL"
        }
    ]
  }
}

If you want write two table into two different sink, you should config two different source.

Hisoka-X avatar Apr 12 '24 13:04 Hisoka-X

This is a demo maybe can help you. https://github.com/apache/seatunnel/blob/790150a54b559148657a599fa16bf00539a5ab74/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-jdbc-e2e/connector-jdbc-e2e-part-1/src/test/resources/jdbc_mysql_source_and_sink_with_multiple_tables.conf

In this way, that's meaning we read two table in one source, then we can write data into one sink but with two table data:

source {
  Jdbc {
    url = "jdbc:oracle:thin:@host116:8085:XE"
    driver = "oracle.jdbc.OracleDriver"
    connection_check_timeout_sec = 100
    user = "system"
    password = "wdp123"
    table_list = [
        {
            table_path = "XE.WHS.TEST2"
        },
        {
            table_path = "XE.WHS.ADMINPRINCIPAL"
        }
    ]
  }
}

If you want write two table into two different sink, you should config two different source.

This demo can solve my problem,but I found that the Clickhouse Sink haven't impleted SupportSaveMode, I might implete related interface for my project and submit a pr.

Jetiaime avatar Apr 15 '24 09:04 Jetiaime

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar May 16 '24 00:05 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar May 23 '24 00:05 github-actions[bot]