urlpath icon indicating copy to clipboard operation
urlpath copied to clipboard

Double encoding of percent-encoded characters in URL path segments when path ends with a file

Open JIAQIA opened this issue 7 months ago • 0 comments

Bug Report Title

Bug: Incorrect Double-Encoding of Percent Characters (%20 → %2520) When Path Ends with Filename Segment

Bug Description

The urlpath.URL class exhibits inconsistent and incorrect percent-encoding behavior when processing S3 URLs containing spaces in directory names. The bug specifically manifests when the URL path ends with a filename segment, causing valid percent-encoded spaces (%20) to be double-encoded as %2520.

Steps to Reproduce

from urlpath import URL

Test 1: Path ends with filename -> BUG

url1 = URL('s3://host/object/with%20with%20space/file') print(url1) # Actual: s3://host/object/with%2520with%2520space/file # Expected: s3://host/object/with%20with%20space/file

Test 2: Path ends with directory -> Correct

url2 = URL('s3://host/object/with%20with%20space') print(url2) # Output: s3://host/object/with%20with%20space (correct)

Expected Behavior

Percent-encoded sequences (like %20 for space) should be preserved intact in all path segments, regardless of whether the path ends with a file or directory.

Actual Behavior

  1. Path ends with file: Existing %20 sequences are corrupted to %2520
    (% → wrongly encoded as %25 + 20 = %2520)
  2. Path ends with directory: Encoding is preserved correctly

Impact Analysis

This violates RFC 3986 (URI standard) which mandates:
"Percent-encoding MUST be preserved for characters that are not allowed in a segment."

The bug corrupts valid URLs and causes downstream failures when used with AWS S3 (e.g., s3:// URLs become unusable with boto3).

Environment

• Library Version: urlpath 1.2.0

• Python Version: 3.11

• OS: MacOS 13.6.7

Proposed Fix Direction

The URL parser should:

  1. Preserve existing percent-encoded sequences
  2. Only encode unreserved characters not already encoded
  3. Handle trailing segments consistently regardless of file/directory type

JIAQIA avatar Jul 22 '25 05:07 JIAQIA