cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Reduce memory usage of urllib.unquote and unquote_to_bytes

Open ad186d5a-3642-4a78-96eb-191b051d514d opened this issue 4 years ago • 4 comments

BPO 44334
Nosy @terryjreedy, @gpshead, @orsenthil, @mustafaelagamey
PRs
  • python/cpython#26576
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-06-07.15:07:38.990>
    labels = ['extension-modules', '3.11', '3.9', '3.10', 'performance']
    title = 'Use bytearray in urllib.unquote_to_bytes'
    updated_at = <Date 2021-06-07.20:09:11.522>
    user = 'https://github.com/mustafaelagamey'
    

    bugs.python.org fields:

    activity = <Date 2021-06-07.20:09:11.522>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2021-06-07.15:07:38.990>
    creator = 'eng.mustafaelagamey'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44334
    keywords = ['patch']
    message_count = 2.0
    messages = ['395280', '395281']
    nosy_count = 4.0
    nosy_names = ['terry.reedy', 'gregory.p.smith', 'orsenthil', 'eng.mustafaelagamey']
    pr_nums = ['26576']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue44334'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
    
    • PR: gh-96763

    'eng' claimed in original title that "urllib.parse.parse_qsl cannot parse large data". On original PR, said problem with 6-7 millions bytes.

    Claim should be backed up by a generated example that fails with original code and succeeds with new code. Claims of 'faster' also needs some examples.

    Original PRs must nearly all propose merging a branch created from main into main. Performance enhancements are often not backported.

    terryjreedy avatar Jun 07 '21 19:06 terryjreedy

    fwiw this sort of thing may be reasonable to backport to 3.9 as it is more than just a performance enhancement but also a resource consumption bug and should result in no behavior change.

    """ In case of form contain very large data ( in my case the string to parse was about 6000000 byte ) Old code use list of bytes during parsing consumes a lot of memory New code will use bytearry , which use less memory """ - text from the original PR

    gpshead avatar Jun 07 '21 20:06 gpshead

    The PR was closed due to technicalities (pointing to the wrong branch, CLA) and the OP didn’t follow up.

    Unless someone object I will close this issue as well.

    iritkatriel avatar Sep 11 '22 10:09 iritkatriel

    I created a new PR and included fixing a similar legacy design issue in unquote() as well as the original report's unquote_to_bytes(). Some performance microbenchmarks need running before I'll consider moving forward with it.

    If someone wanted to consider this a security issue it could be backported. It is at most a fixed constant factor (roughly $len(input) * sizeof(PyObject)$ memory consumption vs a maximally antagonistic input though. That doesn't smell DoS worthy.

    gpshead avatar Sep 12 '22 08:09 gpshead