HIVE-29017: Restore ability to prevent metastore metrics startup queries
What changes were proposed in this pull request?
Restore configuration metastore.initial.metadata.count.enabled to prevent metastore metrics startup queries
Why are the changes needed?
For releases as late as Hive 2.3.10, there was support for a metastore.initial.metadata.count.enabled configuration property. This could be set to false to prevent running queries to determine the counts of existing databases, tables and partitions. In large-scale ephemeral/serverless deployments with many HiveMetaStore processes starting up, this can be a source of unnecessary load spikes on the database. It appears this property was accidentally removed during the Standalone Metastore separation work, specifically in the HIVE-17307 metrics refactoring.
Does this PR introduce any user-facing change?
Yes, this restores the ability to control configuration for whether or not to run these metrics queries.
How was this patch tested?
Manual testing.
@dengzhhu653 and @nrg4878 , this is another request to review bug fix changes in a similar area around the metastore metrics. Thank you.
I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.
I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.
Thanks for the reply, @dengzhhu653 . I still have a use for this configuration, even after HIVE-27692. I have a customer whose architecture involves starting many ephemeral clusters (potentially thousands concurrently), running a few jobs, and then deleting the clusters. Each cluster runs the HiveMetaStore process connected to a centralized database. This is not embedded HMS, but it triggers the same kind of unnecessary load pattern. I would like to be able to disable these queries when we upgrade to Hive 4.
I'm not sure if we need this property since HIVE-27692, we don't run these count queries on embedded HMS client.
Thanks for the reply, @dengzhhu653 . I still have a use for this configuration, even after HIVE-27692. I have a customer whose architecture involves starting many ephemeral clusters (potentially thousands concurrently), running a few jobs, and then deleting the clusters. Each cluster runs the HiveMetaStore process connected to a centralized database. This is not embedded HMS, but it triggers the same kind of unnecessary load pattern. I would like to be able to disable these queries when we upgrade to Hive 4.
What do you think? cc @deniskuzZ @nrg4878 @saihemanth-cloudera
@ayushtkn , thank you for reviewing.
@dengzhhu653 , what are your thoughts on committing this, based on the +1 from Ayush?
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Thank you for the review and commit everyone!