ucx icon indicating copy to clipboard operation
ucx copied to clipboard

[FEATURE]: Migrate tables in unsupported filesystem

Open qziyuan opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Problem statement

External tables stored in adl:// and wasbs:// will be crawled and marked with What.EXTERNAL_NO_SYNC. We will need more What enum to differentiate following scenarios:

  • Hiveserde tables, like ParquetHiveSerDe, which cannot be SYNC, but can be in place migrated by creating a UC table with supported data source (for example create external table ... using parquet ... location)
  • Hiveserde tables that have to be migrated using CTAS
  • Tables in unsupported filesystem like adl:// and wasbs://. It require either:
    • migrate the storage to ADLS Gen2 first and update the HMS table location, then migrate to UC.
    • or deep clone or CTAS the table to a UC.

Proposed Solution

  • Add more What enum.
  • Discuss the strategy of how to migrate those tables in the future.

Additional Context

Related issue:

  • #355 which reports unsupported table in dashboard.
  • #1064 Migrate UC External Location should skip unsupported filesystem

qziyuan avatar Mar 15 '24 23:03 qziyuan

effort might be: 3weeks for cloud-level copy or few days for CTAS

nfx avatar Apr 22 '24 17:04 nfx

@HariGS-DB @FastLee to triage and find the better time estimate

nfx avatar Nov 05 '24 15:11 nfx