opensearch-java icon indicating copy to clipboard operation
opensearch-java copied to clipboard

[BUG] URL variables are not properly escaped

Open ssm951 opened this issue 2 years ago • 3 comments

What is the bug?

When a GetRequest is referenced in a GET <index>/_doc/<id> call, the ID variable gets URL encoded for all special characters other than punctuation. This is causing issue with IDs that are a JSON formatted string. The braces character and quotes character gets encoded, but not the , or the :. Here is a sample GET request logged (scrubbed):

Executing request GET /<index>/_doc/%7B%22id%22:%22test-id%22,%22version%22:0%7D HTTP/1.1

This would result in a 404 returned by the Java client. When I entered the same request in the OpenSearch dashboard dev console, the same result occurs. However, when I encode the , and : properly in the console request, the document I wanted gets returned.

How can one reproduce the bug?

Write a document into the OpenSearch collection with a JSON string as the ID.

Pass in that JSON string in a OpenSearchClient.get() request as the ID.

What is the expected behavior?

The ID gets encoded properly into this request:

GET /<index>/_doc/%7B%22id%22%3A%22test-id%22%2C%22version%22%3A0%7D

What is your host/environment?

AWS Lambda Java 17

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

When digging into the codebase, I found that the encoding is done by org.apache.http.client.utils.URLEncodedUtils.formatSegment, which encodes only escaping characters that would affect the path. I used java.net.URLEncoder.encode to get the expected behavior, but I'm not sure how it affects other use cases (such as spaces).

ssm951 avatar Feb 03 '24 16:02 ssm951

Is this encoding happening incorrectly in the client? Can you encode before calling a client?

Want to try to add a test that reproduces this, and maybe a fix?

dblock avatar Feb 05 '24 19:02 dblock

When encoding before calling the OpenSearchClient.get(), the encoded % characters would then get decoded as well.

The problem is this line of code: https://github.com/opensearch-project/opensearch-java/blob/main/java-client/src/main/java/org/opensearch/client/opensearch/core/GetRequest.java#L507

Which calls this method: https://github.com/opensearch-project/opensearch-java/blob/388447604a0a5e115c2f6052f5904fc6c1f13bd5/java-client/src/main/java/org/opensearch/client/transport/endpoints/SimpleEndpoint.java#L135

Unfortunately, this encoding isn't configurable by users of OpenSearch client, so I don't have a way to override this behavior.

ssm951 avatar Feb 05 '24 21:02 ssm951

Thanks, so it's just a bug where those arguments should not be encoded at all? I would write some tests of the expected behavior and fix it.

dblock avatar Feb 06 '24 18:02 dblock