Double encoding of percent-encoded characters in URL path segments when path ends with a file
Bug Report Title
Bug: Incorrect Double-Encoding of Percent Characters (%20 → %2520) When Path Ends with Filename Segment
Bug Description
The urlpath.URL class exhibits inconsistent and incorrect percent-encoding behavior when processing S3 URLs containing spaces in directory names. The bug specifically manifests when the URL path ends with a filename segment, causing valid percent-encoded spaces (%20) to be double-encoded as %2520.
Steps to Reproduce
from urlpath import URL
Test 1: Path ends with filename -> BUG
url1 = URL('s3://host/object/with%20with%20space/file') print(url1) # Actual: s3://host/object/with%2520with%2520space/file # Expected: s3://host/object/with%20with%20space/file
Test 2: Path ends with directory -> Correct
url2 = URL('s3://host/object/with%20with%20space') print(url2) # Output: s3://host/object/with%20with%20space (correct)
Expected Behavior
Percent-encoded sequences (like %20 for space) should be preserved intact in all path segments, regardless of whether the path ends with a file or directory.
Actual Behavior
- Path ends with file: Existing %20 sequences are corrupted to %2520
(% → wrongly encoded as %25 + 20 = %2520) - Path ends with directory: Encoding is preserved correctly
Impact Analysis
This violates RFC 3986 (URI standard) which mandates:
"Percent-encoding MUST be preserved for characters that are not allowed in a segment."
The bug corrupts valid URLs and causes downstream failures when used with AWS S3 (e.g., s3:// URLs become unusable with boto3).
Environment
• Library Version: urlpath 1.2.0
• Python Version: 3.11
• OS: MacOS 13.6.7
Proposed Fix Direction
The URL parser should:
- Preserve existing percent-encoded sequences
- Only encode unreserved characters not already encoded
- Handle trailing segments consistently regardless of file/directory type