[BUG] CSV output doesn't handle entries with quotes but not commas.
What is the bug?
CSV output of entries like "a" b or a " b should, according to RFC 4180, escape them to be """a"" b" and "a "" b", respectively. See section 2.5, https://www.rfc-editor.org/rfc/rfc4180
How can one reproduce the bug? Steps to reproduce the behavior:
- Add values to a test index.
#!/usr/bin/env bash
# Assuming OPENSEARCH_SERVER is the server,
# OPENSEARCH_USER is the host, OPENSEARCH_PASSWORD is the password.
# and the user has the rights to create and query this index.
# Add an entry to the index that has two quotes in it, not at the boundaries.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/1" \
-H 'Content-Type: application/json' \
-d'{ "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Add an entry that has a quote in it.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/2" \
-H 'Content-Type: application/json' \
-d'{ "field1": "\"a\" b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has no quotes or commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/3" \
-H 'Content-Type: application/json' \
-d'{ "field1": "a b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
# Optional, add an entry that has commas.
curl -XPUT "https://${OPENSEARCH_SERVER}:9200/csv_error/_doc/4" \
-H 'Content-Type: application/json' \
-d'{ "field1": "a, b" }' --user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
- Query the index to CSV output.
#!/usr/bin/env bash
curl -XPOST "https://${OPENSEARCH_SERVER}:9200/_plugins/_sql?format=csv" \
-H 'Content-Type: application/json' \
-d'{ "query": "SELECT * FROM csv_error LIMIT 50" }' \
--user "${OPENSEARCH_USER}":"${OPENSEARCH_PASSWORD}"
- Response data is:
"a" b # expected to be """a"" b"
a " b # expected to be "a "" b"
a b # correct
"a, b" # correct
What is the expected behavior? The CSV output escapes double quotes properly.
What is your host/environment?
- OS: Ubuntu 20.04
- Version - 2.17.0.0
- Plugins - default
Do you have any additional context?
As far as I can tell, the problem appears to be in function quoteIfRequired in FlatResponseBase.java, https://github.com/opensearch-project/sql/blob/c1e623d6d3464056a4b16ae7e51dca218d73a5b6/protocol/src/main/java/org/opensearch/sql/protocol/response/format/FlatResponseBase.java#L85 but it's possible changing it there might affect other output formats.