HIVE-28059: Iceberg REST Catalog
What changes were proposed in this pull request?
This is a basic implementation of the Iceberg REST Catalog as a service embedded within HMS.
Why are the changes needed?
For Hive users that have started using Iceberg to store data and want to expose those same tables to external query engines, implementing the Iceberg REST catalog embedded within HMS allows an easy upgrade and deployment path. Although a dedicated service to host the REST catalog is certainly desirable, it also implies deploying and managing another service.
Does this PR introduce any user-facing change?
It does since HMS then 'hosts' a servlet that implements the Iceberg REST Catalog API.
Is the change a dependency upgrade?
No.
How was this patch tested?
Unit tests.
Quality Gate passed
Issues
60 New issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code
Quality Gate passed
Issues
92 New issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code
Are you familiar with Hive HCatalog?
HCatalog is a tool for accessing metadata that reside in Hive metastore. It acts as an API to expose the metastore as REST interface to external tools
https://www.linkedin.com/pulse/hive-metastore-hcatalog-hcat-haotian-zhang
If Hive already provides REST interface for HMS, we could just extend it with the Iceberg API, the same way DataBricks in UnityCatalog
sb.annotatedService(basePath + "iceberg",
new IcebergRestCatalogService(catalogService, schemaService, tableService),
icebergRequestConverter, icebergResponseConverter);
https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/UnityCatalogServer.java
Under hive-webhcat there is already HMS API REST server (probably API list needs to be updated), however, it could be extended with Iceberg API support
HCatalog seems almost abandoned. HIVE-28059 is just trying to expose the Iceberg API - nothing more - on an existing HMS instance. This patch allows Hive 4 users to consume their Iceberg data with other engines and at least 14 people seem to agree this would be/have been useful.
HCatalog seems almost abandoned. HIVE-28059 is just trying to expose the Iceberg API - nothing more - on an existing HMS instance. This patch allows Hive 4 users to consume their Iceberg data with other engines and at least 14 people seem to agree this would be/have been useful.
it is supposed to be a decoupled REST service solution, not a hacky enabler. You could check existing projects to see how it's done: Gravitino REST Catalog, Polaris, Unity, etc
it is supposed to be a decoupled REST service solution, not a hacky enabler. You could check existing projects to see how it's done: Gravitino REST Catalog, Polaris, Unity, etc
Iceberg tables still necessitate a Catalog - what HMS provides; it is sad that you can not see value in this practical approach that does not require depending on yet another service that ultimately will drive people away from Hive. Had you not embraced such a narrow view but played the community enabler role, Hive 4 would have had the 1st implementation of an Iceberg catalog 6 months ago... Since you are judge and jury, you should force close the JIRA and the PR.
that does not require depending
- I don't see any other community members supporting your "solution"; Did you get feedback or review comments from anyone else?
- 6 months ago what we saw was a "broken hello world" nothing more. No doubt it improved after numerous review comments, but not all of them were resolved;
- implemented here "rest catalog" solution is tightly coupled with HMS significantly limiting its usage patterns;
PS: Failure to accept criticism does you no credit;
- I don't see any other community members supporting your "solution"; Did you get feedback or review comments from anyone else?
10 thumbs up, 4 rockets on the PR - but I guess that is not "the process".
- implemented here "rest catalog" solution is tightly coupled with HMS significantly limiting its usage patterns;
If you already have HMS... but this does not fit your dogma.
PS: Failure to accept criticism does you no credit;
PS: Failure to accept other views does you no credit either;
10 thumbs up, 4 rockets on the PR - but I guess that is not "the process".
It seems like you can't comprehend the difference between an idea/feature and the way it is implemented.
If you already have HMS... but this does not fit your dogma.
WHY should it be coupled with HMS? Haven't we discussed this internally and reached a consensus to have a "decoupled REST service"? Why do you keep pushing your individual vision that is not supported by the majority, a vision that "only fits your dogma"?
- Have you thought about the upgrade process? In order to release a new version of RestCatalog that would require HMS upgrade.
- What if someone decides to use a different Catalog implementation in Hive?
- What if we decide to use RestCatalog as central place for all metadata, serve it from multiple catalogs?
- What if we need to scale them differently?
Dissing, ad-hominem attacks, soon you'll go (back) to insults; always delightful. I will not talk about 'internal' products since this is not the place; I will just notice that your view of the majority is just claiming (first) you have it... Since you obviously need to be right and can not accept the benefit of simplicity, I'll leave you be in your kingdom.
@henrib, i don't get you at all. I am trying to give you some suggestions, but you just make it personal. Ok, I'll give it a last shot and leave it up to you:
If you think HMS users would benefit from the REST API exposed directly from HMS Server, please move your RestCatalog implementation under the standalone-metastore-server, IDK create package rest (it shouldn't be part of the compute engine) and get reviews from the HMS folks: @nrg4878, @dengzhhu653, @saihemanth-cloudera
Quality Gate passed
Issues
98 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
any updates here? This is a very useful implementation that will simplify access to metadata. and will simplify some external integration.
Work on it just resumed; need to move the whole under standalone-metastore and adapt to latest trunk (4.1) before presenting to new reviewers. Please add vote on the JIRA, this might help at review time.
In my case currently we going to use HMS + external iceberg rest catalog as external hive table in HMS. Because we use REST API for external integrations.
Just think aloud, how can we expand the capability of the Iceberg rest catalog? like supporting the Hive native tables, and Iceberg tables from different sources/places not only from the HMS.
Please check my comments!
Will do :-)