[BUG] URL variables are not properly escaped
What is the bug?
When a GetRequest is referenced in a GET <index>/_doc/<id> call, the ID variable gets URL encoded for all special characters other than punctuation. This is causing issue with IDs that are a JSON formatted string. The braces character and quotes character gets encoded, but not the , or the :. Here is a sample GET request logged (scrubbed):
Executing request GET /<index>/_doc/%7B%22id%22:%22test-id%22,%22version%22:0%7D HTTP/1.1
This would result in a 404 returned by the Java client. When I entered the same request in the OpenSearch dashboard dev console, the same result occurs. However, when I encode the , and : properly in the console request, the document I wanted gets returned.
How can one reproduce the bug?
Write a document into the OpenSearch collection with a JSON string as the ID.
Pass in that JSON string in a OpenSearchClient.get() request as the ID.
What is the expected behavior?
The ID gets encoded properly into this request:
GET /<index>/_doc/%7B%22id%22%3A%22test-id%22%2C%22version%22%3A0%7D
What is your host/environment?
AWS Lambda Java 17
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
When digging into the codebase, I found that the encoding is done by org.apache.http.client.utils.URLEncodedUtils.formatSegment, which encodes only escaping characters that would affect the path. I used java.net.URLEncoder.encode to get the expected behavior, but I'm not sure how it affects other use cases (such as spaces).
Is this encoding happening incorrectly in the client? Can you encode before calling a client?
Want to try to add a test that reproduces this, and maybe a fix?
When encoding before calling the OpenSearchClient.get(), the encoded % characters would then get decoded as well.
The problem is this line of code: https://github.com/opensearch-project/opensearch-java/blob/main/java-client/src/main/java/org/opensearch/client/opensearch/core/GetRequest.java#L507
Which calls this method: https://github.com/opensearch-project/opensearch-java/blob/388447604a0a5e115c2f6052f5904fc6c1f13bd5/java-client/src/main/java/org/opensearch/client/transport/endpoints/SimpleEndpoint.java#L135
Unfortunately, this encoding isn't configurable by users of OpenSearch client, so I don't have a way to override this behavior.
Thanks, so it's just a bug where those arguments should not be encoded at all? I would write some tests of the expected behavior and fix it.