sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] CSV output doesn't handle entries with quotes but not commas.

Open Michael-S opened this issue 1 year ago • 1 comments

What is the bug? CSV output of entries like "a" b or a " b should, according to RFC 4180, escape them to be """a"" b" and "a "" b", respectively. See section 2.5, https://www.rfc-editor.org/rfc/rfc4180

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Add values to a test index.
#!/usr/bin/env bash
# Assuming OPENSEARCH_SERVER is the server,
# OPENSEARCH_USER is the host, OPENSEARCH_PASSWORD is the password.
# and the user has the rights to create and query this index.
# Add an entry to the index that has two quotes in it, not at the boundaries.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/1" \
    -H 'Content-Type: application/json' \
    -d'{  "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Add an entry that has a quote in it.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/2" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has no quotes or commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/3" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "a b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/4" \
    -H 'Content-Type: application/json' \
    -d'{ "field1": "a, b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
  1. Query the index to CSV output.
#!/usr/bin/env bash
curl -XPOST "https://${OPENSEARCH_SERVER}:9200/_plugins/_sql?format=csv" \
    -H 'Content-Type: application/json' \
    -d'{ "query": "SELECT * FROM csv_error LIMIT 50" }' \
     --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
  1. Response data is:
"a" b  # expected to be """a"" b"
a " b  # expected to be "a "" b"
a b     # correct
"a, b" # correct

What is the expected behavior? The CSV output escapes double quotes properly.

What is your host/environment?

  • OS: Ubuntu 20.04
  • Version - 2.17.0.0
  • Plugins - default

Do you have any additional context? As far as I can tell, the problem appears to be in function quoteIfRequired in FlatResponseBase.java, https://github.com/opensearch-project/sql/blob/c1e623d6d3464056a4b16ae7e51dca218d73a5b6/protocol/src/main/java/org/opensearch/sql/protocol/response/format/FlatResponseBase.java#L85 but it's possible changing it there might affect other output formats.

Michael-S avatar Oct 01 '24 19:10 Michael-S

[Catch All Triage - 1, 2]

dblock avatar Oct 21 '24 16:10 dblock