databend icon indicating copy to clipboard operation
databend copied to clipboard

Tracking issues of Data Lake with Iceberg Support

Open Xuanwo opened this issue 2 years ago • 2 comments

After the close of https://github.com/datafuselabs/databend/issues/11947, Databend has completed all preparation work required for implementing data lake support!

Databend now has multi-catalog support!

We can create a new catalog like:

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://testbucket/iceberg_ctl/'
    AWS_KEY_ID='minioadmin'
    AWS_SECRET_KEY='minioadmin'
    ENDPOINT_URL='${STORAGE_S3_ENDPOINT_URL}'
);

And we can show/drop them:

SHOW DATABASES IN iceberg_ctl;
SHOW TABLES IN iceberg_ctl.iceberg_db;
DROP CATALOG IF EXISTS iceberg_ctl

Databend now can read existing iceberg!

We can query data in an exisint iceberg table like the following:

SELECT count(*) FROM iceberg_ctl.iceberg_db.iceberg_tbl;

We have found a way to add data features in Databend. I have some ideas that we can start working on:

Tasks

Our current goal is to make reading from iceberg table fast and reliable.

  • [ ] Implement partiation for iceberg table
  • [ ] Implement push_down for iceberg table
  • [ ] Implement iceberg rest catalog support
  • [ ] Work with iceberg community to build iceberg-rust

Future

  • [ ] Implement write operation for iceberg table (users can ingest data in iceberg directly!)
  • [ ] Implement optimize operation for iceberg table (users can use databend cloud as a serverless table optimizer!)

Xuanwo avatar Jul 31 '23 12:07 Xuanwo

Hi @Xuanwo , this is an exciting feature! I was wondering though, if the initial implementation supports iceberg's temporal/as-of queries?

Regards, Chris Whelan

chrisfw avatar Nov 27 '23 22:11 chrisfw

Currently, databend support querying Iceberg tables with partition on timestamp column with day/month/year transformation or does task "Implement partiation for iceberg table" means the same ?

atifiu avatar Apr 16 '24 06:04 atifiu