Tracking issues of Data Lake with Iceberg Support
After the close of https://github.com/datafuselabs/databend/issues/11947, Databend has completed all preparation work required for implementing data lake support!
Databend now has multi-catalog support!
We can create a new catalog like:
CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
URL='s3://testbucket/iceberg_ctl/'
AWS_KEY_ID='minioadmin'
AWS_SECRET_KEY='minioadmin'
ENDPOINT_URL='${STORAGE_S3_ENDPOINT_URL}'
);
And we can show/drop them:
SHOW DATABASES IN iceberg_ctl;
SHOW TABLES IN iceberg_ctl.iceberg_db;
DROP CATALOG IF EXISTS iceberg_ctl
Databend now can read existing iceberg!
We can query data in an exisint iceberg table like the following:
SELECT count(*) FROM iceberg_ctl.iceberg_db.iceberg_tbl;
We have found a way to add data features in Databend. I have some ideas that we can start working on:
Tasks
Our current goal is to make reading from iceberg table fast and reliable.
- [ ] Implement partiation for iceberg table
- [ ] Implement push_down for iceberg table
- [ ] Implement iceberg rest catalog support
- [ ] Work with iceberg community to build iceberg-rust
Future
- [ ] Implement write operation for iceberg table (users can ingest data in iceberg directly!)
- [ ] Implement optimize operation for iceberg table (users can use databend cloud as a serverless table optimizer!)
Hi @Xuanwo , this is an exciting feature! I was wondering though, if the initial implementation supports iceberg's temporal/as-of queries?
Regards, Chris Whelan
Currently, databend support querying Iceberg tables with partition on timestamp column with day/month/year transformation or does task "Implement partiation for iceberg table" means the same ?