hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28059: Iceberg REST Catalog

Open henrib opened this issue 1 year ago • 12 comments

What changes were proposed in this pull request?

This is a basic implementation of the Iceberg REST Catalog as a service embedded within HMS.

Why are the changes needed?

For Hive users that have started using Iceberg to store data and want to expose those same tables to external query engines, implementing the Iceberg REST catalog embedded within HMS allows an easy upgrade and deployment path. Although a dedicated service to host the REST catalog is certainly desirable, it also implies deploying and managing another service.

Does this PR introduce any user-facing change?

It does since HMS then 'hosts' a servlet that implements the Iceberg REST Catalog API.

Is the change a dependency upgrade?

No.

How was this patch tested?

Unit tests.

henrib avatar Mar 20 '24 22:03 henrib

Are you familiar with Hive HCatalog?

HCatalog is a tool for accessing metadata that reside in Hive metastore. It acts as an API to expose the metastore as REST interface to external tools

https://www.linkedin.com/pulse/hive-metastore-hcatalog-hcat-haotian-zhang

If Hive already provides REST interface for HMS, we could just extend it with the Iceberg API, the same way DataBricks in UnityCatalog

sb.annotatedService(basePath + "iceberg",
      new IcebergRestCatalogService(catalogService, schemaService, tableService),
      icebergRequestConverter, icebergResponseConverter);

https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/UnityCatalogServer.java

Under hive-webhcat there is already HMS API REST server (probably API list needs to be updated), however, it could be extended with Iceberg API support

deniskuzZ avatar Jun 14 '24 15:06 deniskuzZ

HCatalog seems almost abandoned. HIVE-28059 is just trying to expose the Iceberg API - nothing more - on an existing HMS instance. This patch allows Hive 4 users to consume their Iceberg data with other engines and at least 14 people seem to agree this would be/have been useful.

henrib avatar Aug 19 '24 09:08 henrib

HCatalog seems almost abandoned. HIVE-28059 is just trying to expose the Iceberg API - nothing more - on an existing HMS instance. This patch allows Hive 4 users to consume their Iceberg data with other engines and at least 14 people seem to agree this would be/have been useful.

it is supposed to be a decoupled REST service solution, not a hacky enabler. You could check existing projects to see how it's done: Gravitino REST Catalog, Polaris, Unity, etc

deniskuzZ avatar Sep 13 '24 12:09 deniskuzZ

it is supposed to be a decoupled REST service solution, not a hacky enabler. You could check existing projects to see how it's done: Gravitino REST Catalog, Polaris, Unity, etc

Iceberg tables still necessitate a Catalog - what HMS provides; it is sad that you can not see value in this practical approach that does not require depending on yet another service that ultimately will drive people away from Hive. Had you not embraced such a narrow view but played the community enabler role, Hive 4 would have had the 1st implementation of an Iceberg catalog 6 months ago... Since you are judge and jury, you should force close the JIRA and the PR.

henrib avatar Sep 13 '24 15:09 henrib

that does not require depending

  1. I don't see any other community members supporting your "solution"; Did you get feedback or review comments from anyone else?
  2. 6 months ago what we saw was a "broken hello world" nothing more. No doubt it improved after numerous review comments, but not all of them were resolved;
  3. implemented here "rest catalog" solution is tightly coupled with HMS significantly limiting its usage patterns;

PS: Failure to accept criticism does you no credit;

deniskuzZ avatar Sep 13 '24 16:09 deniskuzZ

  1. I don't see any other community members supporting your "solution"; Did you get feedback or review comments from anyone else?

10 thumbs up, 4 rockets on the PR - but I guess that is not "the process".

  1. implemented here "rest catalog" solution is tightly coupled with HMS significantly limiting its usage patterns;

If you already have HMS... but this does not fit your dogma.

PS: Failure to accept criticism does you no credit;

PS: Failure to accept other views does you no credit either;

henrib avatar Sep 13 '24 16:09 henrib

10 thumbs up, 4 rockets on the PR - but I guess that is not "the process".

It seems like you can't comprehend the difference between an idea/feature and the way it is implemented.

If you already have HMS... but this does not fit your dogma.

WHY should it be coupled with HMS? Haven't we discussed this internally and reached a consensus to have a "decoupled REST service"? Why do you keep pushing your individual vision that is not supported by the majority, a vision that "only fits your dogma"?

  1. Have you thought about the upgrade process? In order to release a new version of RestCatalog that would require HMS upgrade.
  2. What if someone decides to use a different Catalog implementation in Hive?
  3. What if we decide to use RestCatalog as central place for all metadata, serve it from multiple catalogs?
  4. What if we need to scale them differently?

deniskuzZ avatar Sep 13 '24 17:09 deniskuzZ

Dissing, ad-hominem attacks, soon you'll go (back) to insults; always delightful. I will not talk about 'internal' products since this is not the place; I will just notice that your view of the majority is just claiming (first) you have it... Since you obviously need to be right and can not accept the benefit of simplicity, I'll leave you be in your kingdom.

henrib avatar Sep 13 '24 18:09 henrib

@henrib, i don't get you at all. I am trying to give you some suggestions, but you just make it personal. Ok, I'll give it a last shot and leave it up to you:

If you think HMS users would benefit from the REST API exposed directly from HMS Server, please move your RestCatalog implementation under the standalone-metastore-server, IDK create package rest (it shouldn't be part of the compute engine) and get reviews from the HMS folks: @nrg4878, @dengzhhu653, @saihemanth-cloudera

deniskuzZ avatar Sep 14 '24 10:09 deniskuzZ

any updates here? This is a very useful implementation that will simplify access to metadata. and will simplify some external integration.

vladislav-kuryata avatar Oct 30 '24 17:10 vladislav-kuryata

Work on it just resumed; need to move the whole under standalone-metastore and adapt to latest trunk (4.1) before presenting to new reviewers. Please add vote on the JIRA, this might help at review time.

henrib avatar Oct 30 '24 18:10 henrib

In my case currently we going to use HMS + external iceberg rest catalog as external hive table in HMS. Because we use REST API for external integrations.

vladislav-kuryata avatar Oct 30 '24 18:10 vladislav-kuryata

Just think aloud, how can we expand the capability of the Iceberg rest catalog? like supporting the Hive native tables, and Iceberg tables from different sources/places not only from the HMS.

dengzhhu653 avatar Nov 18 '24 08:11 dengzhhu653

Please check my comments!

Will do :-)

henrib avatar Dec 16 '24 17:12 henrib