iceberg Spark: Iceberg_Spark Add delegation token for HiveCatalog

Implementing Delegation Token Fetching for HiveCatalog in the Iceberg_Spark Module close #13116

May 30 '25 07:05 zhangwl9

@gaborgsomogyi Could you please take a look at this when you're free, thanks

Jun 10 '25 06:06 zhangwl9

@zhangwl9 have you tested it in a real multiple kerberized HMS env? From my first glance, I don't think it works.

Jun 10 '25 09:06 pan3793

@zhangwl9 have you tested it in a real multiple kerberized HMS env? From my first glance, I don't think it works.你在一个真实的多个 Kerberos 化 HMS 环境中测试过吗？从我的初步观察来看，我不认为它有效。

@pan3793 I kinit a kerberos, then start spark Application with HiveDelegationTokenProvider, execute the query iceberg table job (onHiveCatalog), when the kerberos expires, execute the query operation is normal. However, if HiveDelegationTokenProvider is not enabled, when kerberos expires, the query operation will report kerberos-related errors.

Jun 11 '25 11:06 zhangwl9

I think using kinit is not the way forward since that's ephemeral and long running jobs are not able to summon such credentials. I would suggest keytab since it's the standard when we speak about kerberos...

Jun 11 '25 11:06 gaborgsomogyi

I think using kinit is not the way forward since that's ephemeral and long running jobs are not able to summon such credentials. I would suggest keytab since it's the standard when we speak about kerberos...

@gaborgsomogyi Spark allows using either Keytab or TGT for Kerberos authN, they are different ways.

@zhangwl9 DT is not required when TGT is available, and it's the user's responsibility to refresh TGT, for details, refer to SPARK-26595 and

https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L67-L73

And the reason of my words "I don't think it works" is because your current implementation would simply erase the HMS token fetched by Spark's built-in HiveDelegationTokenProvider, right? If so, how does it work if your Spark built-in Hive catalog and Iceberg Hive catalog use different HMSs?

Jun 11 '25 12:06 pan3793

@gaborgsomogyi Spark allows using either Keytab or TGT for Kerberos authN, they are different ways.

Since I've written it I know that TGT is not supporting many use-cases, like long running streaming workloads and client mode amongst others. Adding something which is giving partial support doesn't make sense.

Jun 11 '25 12:06 gaborgsomogyi

@gaborgsomogyi TGT is quite a useful way to simplify keytab management, we heavliy use TGT with spark-submit --proxy-user xxx to run Spark jobs, most of them completed in the DT lifetime so no renewal is required, for long running jobs, we have external service[1] to refresh DT and use custom RPC to send the DT to driver. I know you wrote the Spark Kafka data source DT provider, I'm not sure how many differences between Keytab and DT Kerberos authN for Kafka, but there are not many differences for Hive, what we need to do here is just follow the Spark's built-in HiveDelegationTokenProvider behavior, right?

[1] https://github.com/apache/kyuubi/issues/913

Jun 11 '25 13:06 pan3793

Had a look on the HiveDelegationTokenProvider... that doesn't do much but if that fills the actual needs then go on🙂

Jun 11 '25 14:06 gaborgsomogyi

I think using kinit is not the way forward since that's ephemeral and long running jobs are not able to summon such credentials. I would suggest keytab since it's the standard when we speak about kerberos...我认为使用 kinit 并不是前进的方向，因为它是短暂的，而长时间运行的任务无法获取此类凭据。我建议使用 keytab，因为当我们谈论 kerberos 时，它是标准做法...

@gaborgsomogyi Spark allows using either Keytab or TGT for Kerberos authN, they are different ways.Spark 允许使用 Keytab 或 TGT 进行 Kerberos 身份验证，它们是不同的方式。

@zhangwl9 DT is not required when TGT is available, and it's the user's responsibility to refresh TGT, for details, refer to SPARK-26595 and当 TGT 可用时，DT 是不必要的，并且刷新 TGT 是用户的责任，详细信息请参考 SPARK-26595 和

https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L67-L73

And the reason of my words "I don't think it works" is because your current implementation would simply erase the HMS token fetched by Spark's built-in HiveDelegationTokenProvider, right? If so, how does it work if your Spark built-in Hive catalog and Iceberg Hive catalog use different HMSs?我所说的“我不认为它有效”的原因是因为你当前的实现会直接清除 Spark 内置的 HiveDelegationTokenProvider 获取的 HMS 令牌，对吧？如果是这样，当你的 Spark 内置 Hive 目录和 Iceberg Hive 目录使用不同的 HMS 时，它将如何工作？

@pan3793 The IcebergHiveConnectorDelegationTokenProvider gets and writes delegation tokens for each HMS based on the metastore URI. if the Spark built-in Hive Catalog and the Iceberg Hive Catalog point to different HMS， they don't affect each other; however, if the URIs are the same, the Token will be overwritten.

Jun 11 '25 15:06 zhangwl9

erase the HMS token fetched by Spark's built-in HiveDelegationTokenProvider

@pan3793

I think using kinit is not the way forward since that's ephemeral and long running jobs are not able to summon such credentials. I would suggest keytab since it's the standard when we speak about kerberos...我认为使用 kinit 并不是前进的方向，因为它是短暂的，而长时间运行的任务无法获取此类凭据。我建议使用 keytab，因为当我们谈论 kerberos 时，它是标准做法...

@gaborgsomogyi Spark allows using either Keytab or TGT for Kerberos authN, they are different ways.Spark 允许使用 Keytab 或 TGT 进行 Kerberos 身份验证，它们是不同的方式。 @zhangwl9 DT is not required when TGT is available, and it's the user's responsibility to refresh TGT, for details, refer to SPARK-26595 and当 TGT 可用时，DT 是不必要的，并且刷新 TGT 是用户的责任，详细信息请参考 SPARK-26595 和 https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L67-L73 And the reason of my words "I don't think it works" is because your current implementation would simply erase the HMS token fetched by Spark's built-in HiveDelegationTokenProvider, right? If so, how does it work if your Spark built-in Hive catalog and Iceberg Hive catalog use different HMSs?我所说的“我不认为它有效”的原因是因为你当前的实现会直接清除 Spark 内置的 HiveDelegationTokenProvider 获取的 HMS 令牌，对吧？如果是这样，当你的 Spark 内置 Hive 目录和 Iceberg Hive 目录使用不同的 HMS 时，它将如何工作？

@pan3793 The IcebergHiveConnectorDelegationTokenProvider gets and writes delegation tokens for each HMS based on the metastore URI. if the Spark built-in Hive Catalog and the Iceberg Hive Catalog point to different HMS， they don't affect each other; however, if the URIs are the same, the Token will be overwritten.

@pan3793 Currently，the current value of each HMS's metastoreuri is used as the key, and is saved to credientials along with the corresponding token Whether it will also erase the HMS token fetched by Spark's built-in HiveDelegationTokenProvider that uses “hive.server2.delegation.token” as the key.

Jun 13 '25 07:06 zhangwl9

@gaborgsomogyi As per your reveiew suggestion, I have modified the code regarding #obtainDelegationTokens not returning the next refresh time, which I found to be reasonable based on the explanation there. https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L67-L73 Based on the explanation in lines 75-79, there is no need for periodic token acquisition here either

Jun 13 '25 07:06 zhangwl9

@gaborgsomogyi Could you help me in your spare time to take another look at the code for any other issues that I haven't found yet, thanks!

Jun 13 '25 07:06 zhangwl9

@zhangwl9 keep the original commit history, and stop squashing unless you are requested, otherwise reviewers have no idea what your change is after each round of reviewing. Also, providing valid manual test results after each change, since it is not covered by UT.

the current value of each HMS's metastoreuri is used as the key, and is saved to credientials along with the corresponding token

Okay, you use metastore uris as the signature in the token producer to distinguish different HMSs. Then what about the consumer side? Is Hive client smart enough to know which token it should pick?

Jun 13 '25 07:06 pan3793

@pan3793 Thank you very much for the reminder that I still need to use the metastoreUri as a service for token

Jun 13 '25 12:06 zhangwl9

The change itself looks good, manual testing is required however. Since I'm not in a situation that I can do that somebody else is needed for the job.

Jun 13 '25 13:06 gaborgsomogyi

@zhangwl9 keep the original commit history, and stop squashing unless you are requested, otherwise reviewers have no idea what your change is after each round of reviewing. Also, providing valid manual test results after each change, since it is not covered by UT.保留原始提交历史，除非被要求，否则不要合并，否则审阅者每次审阅后都不知道你的改动是什么。此外，每次改动后都要提供有效的手动测试结果，因为这不包含在单元测试中。

the current value of each HMS's metastoreuri is used as the key, and is saved to credientials along with the corresponding token每个 HMS 的 metastoreuri 当前值被用作键，并连同相应的 token 保存在凭证中

Okay, you use metastore uris as the signature in the token producer to distinguish different HMSs. Then what about the consumer side? Is Hive client smart enough to know which token it should pick?好的，你在 token 生产者中使用 metastore uris 作为签名来区分不同的 HMS。那么消费者端呢？Hive 客户端足够智能，知道该选择哪个 token 吗？ @pan3793 I have now added metastoreUri for token as a service does it solve your query?

Jun 16 '25 01:06 zhangwl9

@pan3793 I have now added metastoreUri for token as a service does it solve your query?

Not yet, if you do a real test, you will find it does not work.

Jun 16 '25 03:06 pan3793

@pan3793 Yes, I need to additionally register the parameter "hive.metastore.token.signature=metaStoreUri" along with it to HMS when iceberg creates the hiveCatalog

Jun 17 '25 02:06 zhangwl9

@pan3793 Yes, I need to additionally register the parameter "hive.metastore.token.signature=metaStoreUri" along with it to HMS when iceberg creates the hiveCatalog

You finally got it, and this is the key point of how to make delegation token works with multipile HMS.

Your current implmentation still has some drawbacks compared to https://github.com/apache/kyuubi/pull/4560, for example:

how to simplify hive.metastore.token.signature configuration?
how to explicitly disable token obtaining for specific HMS(s) when multiple Iceberg Hive catalogs are configured?
one HMS token request failure should not block token obtaining from other HMSs
etc.

Jun 19 '25 07:06 pan3793

@pan3793 Yes, I need to additionally register the parameter "hive.metastore.token.signature=metaStoreUri" along with it to HMS when iceberg creates the hiveCatalog

You finally got it, and this is the key point of how to make delegation token works with multipile HMS.

Your current implmentation still has some drawbacks compared to apache/kyuubi#4560, for example:

how to simplify hive.metastore.token.signature configuration?

how to explicitly disable token obtaining for specific HMS(s) when multiple Iceberg Hive catalogs are configured?

one HMS token request failure should not block token obtaining from other HMSs

etc. @pan3793 Thank you very much for your guidance, I've filled in the missing logic, can you review it again for me when you have time

Jun 23 '25 08:06 zhangwl9

@pan3793 I've filled in the missing logic , can you help me to review when you are free?

Jun 27 '25 02:06 zhangwl9

@zhangwl9 sorry, a little bit busy these days, I will take a look next week

Jun 27 '25 02:06 pan3793

@zhangwl9 sorry, a little bit busy these days, I will take a look next week

@pan3793 can you help me to review when you are free?

Jul 03 '25 08:07 zhangwl9

@gaborgsomogyi @pvary @bryanck I've already fixed a version based on the REVIEW, and in a cluster of two hms, self-tested it, and was able to take a look at the code to see if there's any other issues when you're available, thank you very much! If there are no issues with the code, what should I do to move forward with merging the PR?

Jul 22 '25 07:07 zhangwl9

@zhangwl9 have a quick glance, I don't think my previous comments are fully addressed/rejected

Aug 07 '25 03:08 pan3793

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

Sep 07 '25 00:09 github-actions[bot]

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sep 14 '25 00:09 github-actions[bot]