Reduce memory usage of urllib.unquote and unquote_to_bytes
| BPO | 44334 |
|---|---|
| Nosy | @terryjreedy, @gpshead, @orsenthil, @mustafaelagamey |
| PRs |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = None
closed_at = None
created_at = <Date 2021-06-07.15:07:38.990>
labels = ['extension-modules', '3.11', '3.9', '3.10', 'performance']
title = 'Use bytearray in urllib.unquote_to_bytes'
updated_at = <Date 2021-06-07.20:09:11.522>
user = 'https://github.com/mustafaelagamey'
bugs.python.org fields:
activity = <Date 2021-06-07.20:09:11.522>
actor = 'gregory.p.smith'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation = <Date 2021-06-07.15:07:38.990>
creator = 'eng.mustafaelagamey'
dependencies = []
files = []
hgrepos = []
issue_num = 44334
keywords = ['patch']
message_count = 2.0
messages = ['395280', '395281']
nosy_count = 4.0
nosy_names = ['terry.reedy', 'gregory.p.smith', 'orsenthil', 'eng.mustafaelagamey']
pr_nums = ['26576']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue44334'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
- PR: gh-96763
'eng' claimed in original title that "urllib.parse.parse_qsl cannot parse large data". On original PR, said problem with 6-7 millions bytes.
Claim should be backed up by a generated example that fails with original code and succeeds with new code. Claims of 'faster' also needs some examples.
Original PRs must nearly all propose merging a branch created from main into main. Performance enhancements are often not backported.
fwiw this sort of thing may be reasonable to backport to 3.9 as it is more than just a performance enhancement but also a resource consumption bug and should result in no behavior change.
""" In case of form contain very large data ( in my case the string to parse was about 6000000 byte ) Old code use list of bytes during parsing consumes a lot of memory New code will use bytearry , which use less memory """ - text from the original PR
The PR was closed due to technicalities (pointing to the wrong branch, CLA) and the OP didn’t follow up.
Unless someone object I will close this issue as well.
I created a new PR and included fixing a similar legacy design issue in unquote() as well as the original report's unquote_to_bytes(). Some performance microbenchmarks need running before I'll consider moving forward with it.
If someone wanted to consider this a security issue it could be backported. It is at most a fixed constant factor (roughly $len(input) * sizeof(PyObject)$ memory consumption vs a maximally antagonistic input though. That doesn't smell DoS worthy.