iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

[DISCUSS] A catalog loader api.

Open liurenjie1024 opened this issue 9 months ago • 7 comments

When developing iceberg-playground and the integration test framework, I realized that it would be convenient to have a catalog loader api. I have following design in mind:

pub trait CatalogBuilder {
   type C: Catalog;
   fn with_name(self, name: &str) -> Self;
   fn with_uri(self, uri: &str) -> Self;
   fn with_warehouse(self, warehouse: &str) -> Self;
   fn with_prop(self, key: &str, value: &str) -> Self;
   fn build(self) -> Result<Self::C>;
}

And we need to have all catalogs to implement this trait:

pub struct RestCatalogBuilder {
  ...
}

impl CatalogBuilder for RestCatalogBuilder {
  ....
}

impl RestCatalogBuilder {
  fn with_rest_client(self, client: Client) -> Self {
  ...
 }
}

Then we will have a catalog loader api like this:

fn load_catalog(r#type: &str) -> CatalogBuilder {
  match r#type {
     "rest" => RestCatalogBuilder::new()
     "hive" => ....
  }
}

liurenjie1024 avatar Apr 18 '25 13:04 liurenjie1024

cc @Xuanwo @sdd @ZENOTME @c-thiel @Fokko

liurenjie1024 avatar Apr 18 '25 13:04 liurenjie1024

Thanks @liurenjie1024 for raising this discussion. According to my understanding, the CatalogBuilder is used to let us have a unified way to create catalog like following, right?

let catalog_builder = load_catalog(xxx);
catalog_builder.with_name()
               .with_xxx()
               ...
let catalog = catalog_builder.build()

One thing I'm concerned about is that there may be a specific parameter to create the catalog, e.g. S3Catalog and GlueCatalog have different catalog configs. In this case, looks like it's hard to unify the with_xxx of them?

ZENOTME avatar Apr 18 '25 13:04 ZENOTME

I share the same concern as @ZENOTME.

For example, the REST Catalog supports setting an HttpClient. How can we enable CatalogBuilder to support this as well? Alternatively, should we treat CatalogBuilder as an entry point, and recommend that advanced users—such as those who need to configure the HttpClient—use RestCatalogBuilder directly?

Xuanwo avatar Apr 18 '25 14:04 Xuanwo

One thing I'm concerned about is that there may be a specific parameter to create the catalog, e.g. S3Catalog and GlueCatalog have different catalog configs. In this case, looks like it's hard to unify the with_xxx of them?

This configuration could all be configured by keys. For example, table_bucket_arn in S3Catalog could be configured with a key like s3.catalog.table.bucket.arn.

For example, the REST Catalog supports setting an HttpClient. How can we enable CatalogBuilder to support this as well? Alternatively, should we treat CatalogBuilder as an entry point, and recommend that advanced users—such as those who need to configure the HttpClient—use RestCatalogBuilder directly?

Yeah, I totally agree with it. The load_catalog api is to allow user to have a convenient way to creat catalog, but advanced user should still use concrete catalog builder directly.

liurenjie1024 avatar Apr 19 '25 02:04 liurenjie1024

+1 this is similar to pyiceberg's load_catalog, i like it

kevinjqliu avatar Apr 21 '25 18:04 kevinjqliu

One thing also worth discussion is whether we need to remove obsolete builder api, for example RestCatalogConfigBuilder. I prefer to remove them, but it's a breaking api change.

liurenjie1024 avatar Apr 22 '25 01:04 liurenjie1024

Meeting conclusion: it would be fine to break the obsolete config builder api as we are not 1.0, and we need to provide docs/release notes about the migration.

liurenjie1024 avatar Apr 24 '25 15:04 liurenjie1024

@liurenjie1024 For the load_catalog function, how do you plan to avoid cyclic dependency ? Catalog does not include packages of specific implementations.

I can help with this API. Trying to see what you are going for.

gsoundar avatar May 20 '25 22:05 gsoundar

Hi, @gsoundar I currently have a prototype in https://github.com/apache/iceberg-rust/pull/1231 . Per offline discussion with @Xuanwo he has some different opinions and planning to refine the proposal. We can continue discussion in #1231

liurenjie1024 avatar May 22 '25 03:05 liurenjie1024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Nov 19 '25 00:11 github-actions[bot]

Completed.

liurenjie1024 avatar Nov 19 '25 09:11 liurenjie1024