[DISCUSS] A catalog loader api.
When developing iceberg-playground and the integration test framework, I realized that it would be convenient to have a catalog loader api. I have following design in mind:
pub trait CatalogBuilder {
type C: Catalog;
fn with_name(self, name: &str) -> Self;
fn with_uri(self, uri: &str) -> Self;
fn with_warehouse(self, warehouse: &str) -> Self;
fn with_prop(self, key: &str, value: &str) -> Self;
fn build(self) -> Result<Self::C>;
}
And we need to have all catalogs to implement this trait:
pub struct RestCatalogBuilder {
...
}
impl CatalogBuilder for RestCatalogBuilder {
....
}
impl RestCatalogBuilder {
fn with_rest_client(self, client: Client) -> Self {
...
}
}
Then we will have a catalog loader api like this:
fn load_catalog(r#type: &str) -> CatalogBuilder {
match r#type {
"rest" => RestCatalogBuilder::new()
"hive" => ....
}
}
cc @Xuanwo @sdd @ZENOTME @c-thiel @Fokko
Thanks @liurenjie1024 for raising this discussion. According to my understanding, the CatalogBuilder is used to let us have a unified way to create catalog like following, right?
let catalog_builder = load_catalog(xxx);
catalog_builder.with_name()
.with_xxx()
...
let catalog = catalog_builder.build()
One thing I'm concerned about is that there may be a specific parameter to create the catalog, e.g. S3Catalog and GlueCatalog have different catalog configs. In this case, looks like it's hard to unify the with_xxx of them?
I share the same concern as @ZENOTME.
For example, the REST Catalog supports setting an HttpClient. How can we enable CatalogBuilder to support this as well? Alternatively, should we treat CatalogBuilder as an entry point, and recommend that advanced users—such as those who need to configure the HttpClient—use RestCatalogBuilder directly?
One thing I'm concerned about is that there may be a specific parameter to create the catalog, e.g. S3Catalog and GlueCatalog have different catalog configs. In this case, looks like it's hard to unify the with_xxx of them?
This configuration could all be configured by keys. For example, table_bucket_arn in S3Catalog could be configured with a key like s3.catalog.table.bucket.arn.
For example, the REST Catalog supports setting an HttpClient. How can we enable CatalogBuilder to support this as well? Alternatively, should we treat CatalogBuilder as an entry point, and recommend that advanced users—such as those who need to configure the HttpClient—use RestCatalogBuilder directly?
Yeah, I totally agree with it. The load_catalog api is to allow user to have a convenient way to creat catalog, but advanced user should still use concrete catalog builder directly.
+1 this is similar to pyiceberg's load_catalog, i like it
One thing also worth discussion is whether we need to remove obsolete builder api, for example RestCatalogConfigBuilder. I prefer to remove them, but it's a breaking api change.
Meeting conclusion: it would be fine to break the obsolete config builder api as we are not 1.0, and we need to provide docs/release notes about the migration.
@liurenjie1024 For the load_catalog function, how do you plan to avoid cyclic dependency ? Catalog does not include packages of specific implementations.
I can help with this API. Trying to see what you are going for.
Hi, @gsoundar I currently have a prototype in https://github.com/apache/iceberg-rust/pull/1231 . Per offline discussion with @Xuanwo he has some different opinions and planning to refine the proposal. We can continue discussion in #1231
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Completed.