HIVE-29016: Disable caching on the Iceberg REST Catalog
What changes were proposed in this pull request?
Remove caching on the Iceberg REST API server. It's mainly because the current code didn't work with caching correctly, as stated in HIVE-29016.
This is an alternative to https://github.com/apache/hive/pull/5871
Why are the changes needed?
Disable caching.
Does this PR introduce any user-facing change?
No. This feature is not yet shipped.
How was this patch tested?
Unit test
+1 We should revisit the caching catalog implementation, as it seems to be broken.
@deniskuzZ the unit test that creates a table through http does not fail though. ( TestHMSCatalog.testTableAPI() ). Any other unit test that reproduces that issue would be much appreciated.
+1 We should revisit the caching catalog implementation, as it seems to be broken.
@deniskuzZ the unit test that creates a table through http does not fail though. ( TestHMSCatalog.testTableAPI() ). Any other unit test that reproduces that issue would be much appreciated.
@okumin, how did you create the table? have you used RESTCatalog instead of manual http calls construction as in TestHMSCatalog?
CreateTableRequest create = CreateTableRequest.builder().
withName(tblName).
withLocation(location).
withSchema(schema).build();
URL url = iceUri.resolve("namespaces/" + DB_NAME + "/tables").toURL();
Object response = clientCall(jwt, url, "POST", create);
@henrib, transaction API doesn't work at all:
org.apache.iceberg.exceptions.ServiceFailureException: Server error: IllegalStateException: Cannot wrap catalog that does not produce BaseTransaction
that is what @okumin mentioned
reproduce:
@Test
public void testTableAPI() throws Exception {
URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath);
Schema schema = getTestSchema();
final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536));
final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName);
String location = temp.newFolder(TBL.toString()).toString();
Map<String, String> properties = ImmutableMap.of(
"type", "rest",
"uri", iceUri.toString());
final String catalogName = catalog.name();
RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, new Configuration());
restCatalog.initialize(catalogName, properties);
restCatalog
.buildTable(TBL, schema)
.withLocation(location)
.createTransaction()
.commitTransaction();
Table table = catalog.loadTable(TBL);
Assert.assertEquals(location, table.location());
}
MetastoreConf.setLongVar(conf, MetastoreConf.ConfVars.ICEBERG_CATALOG_CACHE_EXPIRY, -1L); solves the issue
but i am not sure if HMSCachingCatalog can be even used with transaction API. If not, we should revert
cc @okumin
With this modification in HMSCachingCatalog:
@Override
public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) {
tableCache.invalidate(identifier);
return nsCatalog.buildTable(identifier, schema);
}
This test succeeds:
@Test
public void testTableAPI2() throws Exception {
String cname = catalog.name();
URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath);
String jwt = generateJWT();
Schema schema = getTestSchema();
final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536));
final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName);
String location = temp.newFolder(TBL.toString()).toString();
Configuration configuration = new Configuration();
configuration.set("iceberg.catalog", cname);
configuration.set("iceberg.catalog."+cname+".type", "rest");
configuration.set("iceberg.catalog."+cname+".uri", iceUri.toString());
configuration.set("iceberg.catalog."+cname+".token", jwt);
String catalogName = configuration.get(CATALOG_NAME);
Assert.assertEquals(cname, catalogName);
Map<String, String> properties = getCatalogPropertiesFromConf(configuration, catalogName);
Assert.assertFalse(properties.isEmpty());
RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, configuration);
restCatalog.initialize(catalogName, properties);
restCatalog
.buildTable(TBL, schema)
.withLocation(location)
.createTransaction()
.commitTransaction();
Table table = catalog.loadTable(TBL);
Assert.assertEquals(location, table.location());
}
With this modification in HMSCachingCatalog:
@Override public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) { tableCache.invalidate(identifier); return nsCatalog.buildTable(identifier, schema); }This test succeeds:
@Test public void testTableAPI2() throws Exception { String cname = catalog.name(); URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath); String jwt = generateJWT(); Schema schema = getTestSchema(); final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536)); final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName); String location = temp.newFolder(TBL.toString()).toString(); Configuration configuration = new Configuration(); configuration.set("iceberg.catalog", cname); configuration.set("iceberg.catalog."+cname+".type", "rest"); configuration.set("iceberg.catalog."+cname+".uri", iceUri.toString()); configuration.set("iceberg.catalog."+cname+".token", jwt); String catalogName = configuration.get(CATALOG_NAME); Assert.assertEquals(cname, catalogName); Map<String, String> properties = getCatalogPropertiesFromConf(configuration, catalogName); Assert.assertFalse(properties.isEmpty()); RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, configuration); restCatalog.initialize(catalogName, properties); restCatalog .buildTable(TBL, schema) .withLocation(location) .createTransaction() .commitTransaction(); Table table = catalog.loadTable(TBL); Assert.assertEquals(location, table.location()); }
@okumin, can we add the proposed change to this PR?
@henrib, why do we need to call tableCache.invalidate(identifier);? isn't it enough to call nsCatalog directly?
@Override
public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) {
return nsCatalog.buildTable(identifier, schema);
}
@henrib, why do we need to call
tableCache.invalidate(identifier);? isn't it enough to callnsCatalogdirectly? @deniskuzZ you are right, it is certainly not necessary to invalidate the cache for a table that does not exist yet
Hi @okumin, could you please incorporate the above minor change here as well? Thanks!
I added the recommended change. I did not add a test case because we're replacing the existing ones with the more comprehensive test suite in https://issues.apache.org/jira/browse/HIVE-29019. I am feeling the current implementation can't pass the RCK, so I will check failure cases at once in another PR.
May be we can try and fix the actual cache instead ? Have a look at 5882
Quality Gate passed
Issues
8 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
May be we can try and fix the actual cache instead ? Have a look at 5882
Since it's a blocker for Hive-4.1 and we don't have extensive test coverage, I would propose to disable, at least until we cut the 4.1 branch
Merged. @deniskuzZ @henrib Thanks for your review comments!