hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28826: Beeline fails to connect with spaces in the jdbc url

Open mszurap opened this issue 10 months ago • 6 comments

What changes were proposed in this pull request?

In some environments we need to customize the javax.jdo.option.ConnectionURL, and typically with Oracle databases the JDBC connection URL might contain spaces, like

jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS_LIST=...

This is not handled by beeline, so hive schema upgrades are failing.

For the Oracle JDBC driver at least, the URL escaping does not work. The proposed solution is to remove the whitespaces from the connection url before passing the url to the Commands.java: https://github.com/mszurap/hive/blob/2d528556b69c1ec011e83f69d2a2fdcfb78b356e/beeline/src/java/org/apache/hive/beeline/Commands.java#L1521.

Spaces were not respected before in the connection url. However if spaces are needed in the connection URL (like in ";sslTrustStore=/path/to/mydir with spaces/truststore.jks" or "?tez.application.tags=A B") then the URL escaping with "%20" can be still used in those places, those are decoded properly back to spaces.

Why are the changes needed?

To make the upgades more robust, we should handle these situations.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Added a new beeline test "testSimpleArgsWithSpaceInUrl" to test the functionality that the spaces are now removed from the connection url ("-u" argument only). The beeline unit tests were successful.

mszurap avatar Mar 18 '25 17:03 mszurap

thanks for this patch @mszurap this patch makes sense to me, only 1 note regarding:

sslTrustStore=/path/to/mydir with spaces/truststore.jks

even if I hate to see spaces in folder names, we need to clarify if those scenarios will fail after this patch or not, I mean it wasn't clear that the URL encoding is a manual workaround for this or if it's handled by beeline somehow

abstractdog avatar Mar 21 '25 06:03 abstractdog

Yeah.

  • Right now, beeline fails to connect if there are spaces anywhere - because the "string".split(" ") skews the parameters
  • With the fix, we remove the spaces, so that could cause problems in such situations (in the given example the directory without spaces might not exist), but as mentioned we have a manual workaround at least with urlencoding the spaces.

I'm not sure if we need to address those very edge cases.

mszurap avatar Mar 21 '25 09:03 mszurap

@mszurap, i believe that removing all the whitespaces is not the way forward. As JDBC URL consists of: JDBC URL: jdbc:hive2://<host>:<port>/dbName;sess_var_list?hive_conf_list#hive_var_list

in hive_var_list, users can have custom configs based on their use-case. If any config value expect space separated values then it becomes a breaking change for users and they have to explicilty provide "%20" as you mentioned above.

Aggarwal-Raghav avatar Mar 25 '25 15:03 Aggarwal-Raghav

Hi @Aggarwal-Raghav , thanks for looking at this. Not as an excuse, but even now, if there is a space anywhere in the ";sess_var_list?hive_conf_list#hive_var_list" part, the string.split will split it and this usecase breaks even now, anybody wanting that already needs to escape the spaces.

If we want to support spaces in the ";sess_var_list?hive_conf_list#hive_var_list" part too, then:

  1. we either need to refactor the whole code to avoid the string splitting by space and pass the parameters in a different way to the "Commands" (haven't explored how complex is this)
  2. or we use the same approach but carefully, we can skip removing the spaces after the semicolon, the question mark or the hashtag sign. Would you agree to this rather?

mszurap avatar Mar 25 '25 17:03 mszurap

@abstractdog @Aggarwal-Raghav as mentioned, this fix does not break the current workloads as even now the spaces are not tolerated by beeline. Can you advise how should we proceed? Do we need to refactor everything just to support the spaces in the ";sess_var_list?hive_conf_list#hive_var_list" section?

mszurap avatar Apr 29 '25 12:04 mszurap

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Jul 20 '25 00:07 github-actions[bot]

@nrg4878 @abstractdog do you have suggestions how to proceed further? Do you have objections agains removing all the spaces from the URL? (by definition, URLs cannot contain spaces, they need to be encoded).

mszurap avatar Aug 15 '25 07:08 mszurap

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Nov 15 '25 00:11 github-actions[bot]