dash icon indicating copy to clipboard operation
dash copied to clipboard

Search param removes values after ampersand, introduced in 2.18.2

Open ezwc opened this issue 1 year ago • 6 comments

I pass to one of my pages a search string, like: ?param1=something&param2=something2

Accessing it uding: def layout(**kwargs):

In 2.18.1, this works for values that included ampersand. For example: ?param1=something&bla&param2=something2 would result in kwargs[param1]='something&bla'

With 2.18.2 I get just: kwargs[param1]='something' with anything after the ampersand removed.

I would guess this is related to #2991 .

To be clear, I specifically downgraded dash to 2.18.1 and the issue went away.

ezwc avatar Dec 12 '24 11:12 ezwc

Thank you for reporting this issue.

Root Cause

The behavior you’re experiencing stems from a change in the query string parsing logic introduced in Dash 2.18.2. Specifically, the _parse_query_string function uses unquote(search) in combination with parse_qs. While this combination adheres to general query string parsing rules, it introduces problems when handling unencoded reserved characters like & within values.

For example:

  • In ?param1=something&bla&param2=something2, parse_qs treats &bla as a new key (bla) instead of part of the value for param1.
  • As per the RFC 3986 standard, & is a reserved character and must be percent-encoded (%26) if it’s meant to be part of a value. So, the correctly encoded query string should be: ?param1=something%26bla&param2=something2

Observed Issue with unquote

Additionally, even a properly encoded query string like ?param1=something%26bla&param2=something2 results in the value bla being dropped in Dash 2.18.2. This happens because unquote(search) decodes %26 back to &, and parse_qs again interprets & as a delimiter rather than part of the value.

Proposed Solution

To address this issue, while supporting duplicate keys and empty values, _parse_query_string can be updated to use parse_qsl instead of parse_qs. The parse_qsl method offers better control, especially when handling encoded characters within query strings.

Updated _parse_query_string Implementation

from urllib.parse import parse_qsl

def _parse_query_string(search):
  if search and len(search) > 0 and search[0] == "?":
      search = search[1:]
  else:
      return {}

  parsed_qs = {}
  for k, v in parse_qsl(search, keep_blank_values=True):
      if k in parsed_qs:
          if isinstance(parsed_qs[k], list):
              parsed_qs[k].append(v)
          else:
              parsed_qs[k] = [parsed_qs[k], v]
      else:
          parsed_qs[k] = v
  return parsed_qs



query_strings = [
    "?param1=something&bla&param2=something2",
    "?param1=something%26bla&param2=something2",
    "?param1=value1&param1=value2&param2=value3",
    "?param1=&param2=something2",
    "",
]

for qs in query_strings:
    print(f"Query: {qs}")
    print(f"Parsed: {_parse_query_string(qs)}")

Output:


Query: ?param1=something&bla&param2=something2
Parsed: {'param1': 'something', 'bla': '', 'param2': 'something2'}

Query: ?param1=something%26bla&param2=something2
Parsed: {'param1': 'something&bla', 'param2': 'something2'}

Query: ?param1=value1&param1=value2&param2=value3
Parsed: {'param1': ['value1', 'value2'], 'param2': 'value3'}

Query: ?param1=&param2=something2
Parsed: {'param1': '', 'param2': 'something2'}

Query: 
Parsed: {}

Recommendations

  1. Encourage Proper URI Encoding According to the URI specification, reserved characters like & and = must be percent-encoded when used as part of values. For example: ?param1=something%26bla&param2=something2 This ensures predictable parsing behavior.

  2. Discussions @gvwilson @T4rk1n Given the above, should Dash enforce URI encoding compliance, or should we handle these edge cases within the parsing logic instead?
    If we decide to enforce URI encoding compliance, parse_qsl seems to be a better solution as it provides better handling for encoded characters, duplicate keys, and empty values. On the other hand, if we choose not to enforce URI compliance, we may need to consider removing the search = unquote(search) line from _parse_query_string to prevent issues like misinterpreting %26 (decoded as &) as a delimiter rather than part of a value.

Looking forward to your thoughts on how best to proceed.

C0deBeez avatar Dec 16 '24 03:12 C0deBeez

@T4rk1n which of these options should we pursue (or do you prefer another entirely)?

gvwilson avatar Jan 03 '25 17:01 gvwilson

I'd like to follow the rfc standard and expect the values to be properly encoded. But it should parse it in the values and not do what is described here:

This happens because unquote(search) decodes %26 back to &, and parse_qs again interprets & as a delimiter rather than part of the value.

T4rk1n avatar Jan 06 '25 15:01 T4rk1n

thanks @T4rk1n - @ezwc if you'd like to create a PR we will prioritize review.

gvwilson avatar Jan 08 '25 15:01 gvwilson

Hi @ezwc,

Thank you for raising this issue and for providing detailed examples.

Why This Issue Requires RFC Compliance

The core problem arises from the interpretation of & in the query string. According to RFC 3986, & is defined as the delimiter for separating key-value pairs in query strings. If you want & to be part of a value, it must be percent-encoded as %26. This ensures unambiguous parsing.

For example:

?param1=something&bla&param2=something2

Under RFC-compliant parsing, this would result in:

{
    "param1": "something",
    "bla": '',
    "param2": "something2"
}

If your intended behavior is to include & within a value, it should be encoded as %26:

?param1=something%26bla&param2=something2

This will correctly parse as:

{
    "param1": "something&bla",
    "param2": "something2"
}

Why RFC Compliance Is Necessary

  1. Consistency: Adhering to RFC 3986 ensures Dash behaves consistently with other tools, libraries, and systems.
  2. Compatibility: Non-compliance may lead to unexpected behavior in different environments.
  3. Avoiding Ambiguity: Without encoding &, there is no clear distinction between a delimiter and a value.

Proposed Fix

Based on @T4rk1n's suggestion, I propose the following implementation for _parse_query_string, which fully complies with the RFC:

from urllib.parse import parse_qs

def _parse_query_string(search):
    if not search or not search.startswith("?"):
        return {}

    query_string = search[1:]

    parsed_qs = parse_qs(query_string, keep_blank_values=True)

    return {
        k: v[0] if len(v) == 1 else v
        for k, v in parsed_qs.items()
    }

Test Cases

Here are some test cases with the proposed solution:

query_strings = [ 
    "?param1=something&bla&param2=something2",
    "?param1=something%26bla&param2=something2",
    "?param1=value1&param1=value2&param2=value3",
    "?param1=&param2=something2",
    "",
]

for qs in query_strings:
    print(f"Query: {qs}")
    print(f"Parsed: {_parse_query_string(qs)}")

Output:

Query: ?param1=something&bla&param2=something2
Parsed: {'param1': 'something', 'bla': '', 'param2': 'something2'}

Query: ?param1=something%26bla&param2=something2
Parsed: {'param1': 'something&bla', 'param2': 'something2'}

Query: ?param1=value1&param1=value2&param2=value3
Parsed: {'param1': ['value1', 'value2'], 'param2': 'value3'}

Query: ?param1=&param2=something2
Parsed: {'param1': '', 'param2': 'something2'}

Query: 
Parsed: {}

@gvwilson @T4rk1n If there are no objections, I am happy to submit a PR with this fix to resolve the issue.

Let me know if you have any concerns or additional feedback. 😊

C0deBeez avatar Jan 17 '25 02:01 C0deBeez

@C0deBeez thanks very much Ethan - if you'd like to put together a PR I will try to expedite review. Cheers - @gvwilson

gvwilson avatar Jan 17 '25 14:01 gvwilson