hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-29017: Restore ability to prevent metastore metrics startup queries

Open cnauroth opened this issue 7 months ago • 5 comments

What changes were proposed in this pull request?

Restore configuration metastore.initial.metadata.count.enabled to prevent metastore metrics startup queries

Why are the changes needed?

For releases as late as Hive 2.3.10, there was support for a metastore.initial.metadata.count.enabled configuration property. This could be set to false to prevent running queries to determine the counts of existing databases, tables and partitions. In large-scale ephemeral/serverless deployments with many HiveMetaStore processes starting up, this can be a source of unnecessary load spikes on the database. It appears this property was accidentally removed during the Standalone Metastore separation work, specifically in the HIVE-17307 metrics refactoring.

Does this PR introduce any user-facing change?

Yes, this restores the ability to control configuration for whether or not to run these metrics queries.

How was this patch tested?

Manual testing.

cnauroth avatar Jun 14 '25 04:06 cnauroth

@dengzhhu653 and @nrg4878 , this is another request to review bug fix changes in a similar area around the metastore metrics. Thank you.

cnauroth avatar Jun 14 '25 04:06 cnauroth

I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.

dengzhhu653 avatar Jun 14 '25 07:06 dengzhhu653

I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.

Thanks for the reply, @dengzhhu653 . I still have a use for this configuration, even after HIVE-27692. I have a customer whose architecture involves starting many ephemeral clusters (potentially thousands concurrently), running a few jobs, and then deleting the clusters. Each cluster runs the HiveMetaStore process connected to a centralized database. This is not embedded HMS, but it triggers the same kind of unnecessary load pattern. I would like to be able to disable these queries when we upgrade to Hive 4.

cnauroth avatar Jun 16 '25 21:06 cnauroth

I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.

Thanks for the reply, @dengzhhu653 . I still have a use for this configuration, even after HIVE-27692. I have a customer whose architecture involves starting many ephemeral clusters (potentially thousands concurrently), running a few jobs, and then deleting the clusters. Each cluster runs the HiveMetaStore process connected to a centralized database. This is not embedded HMS, but it triggers the same kind of unnecessary load pattern. I would like to be able to disable these queries when we upgrade to Hive 4.

What do you think? cc @deniskuzZ @nrg4878 @saihemanth-cloudera

dengzhhu653 avatar Jun 17 '25 02:06 dengzhhu653

@ayushtkn , thank you for reviewing.

@dengzhhu653 , what are your thoughts on committing this, based on the +1 from Ayush?

cnauroth avatar Jun 20 '25 22:06 cnauroth

Thank you for the review and commit everyone!

cnauroth avatar Jun 23 '25 16:06 cnauroth