databend Tracking issues of Data Lake with Iceberg Support

After the close of https://github.com/datafuselabs/databend/issues/11947, Databend has completed all preparation work required for implementing data lake support!

Databend now has multi-catalog support!

We can create a new catalog like:

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://testbucket/iceberg_ctl/'
    AWS_KEY_ID='minioadmin'
    AWS_SECRET_KEY='minioadmin'
    ENDPOINT_URL='${STORAGE_S3_ENDPOINT_URL}'
);

And we can show/drop them:

SHOW DATABASES IN iceberg_ctl;
SHOW TABLES IN iceberg_ctl.iceberg_db;
DROP CATALOG IF EXISTS iceberg_ctl

Databend now can read existing iceberg!

We can query data in an exisint iceberg table like the following:

SELECT count(*) FROM iceberg_ctl.iceberg_db.iceberg_tbl;

We have found a way to add data features in Databend. I have some ideas that we can start working on:

Tasks

Our current goal is to make reading from iceberg table fast and reliable.

[ ] Implement partiation for iceberg table
[ ] Implement push_down for iceberg table
[ ] Implement iceberg rest catalog support
[ ] Work with iceberg community to build iceberg-rust

Future

[ ] Implement write operation for iceberg table (users can ingest data in iceberg directly!)
[ ] Implement optimize operation for iceberg table (users can use databend cloud as a serverless table optimizer!)

Jul 31 '23 12:07 Xuanwo

Hi @Xuanwo , this is an exciting feature! I was wondering though, if the initial implementation supports iceberg's temporal/as-of queries?

Regards, Chris Whelan

Nov 27 '23 22:11 chrisfw

Currently, databend support querying Iceberg tables with partition on timestamp column with day/month/year transformation or does task "Implement partiation for iceberg table" means the same ?

Apr 16 '24 06:04 atifiu