cpython gh-121192: Add fast path to `deepcopy()` for empty list/tuple/dict/set

deepcopy() can be surprisingly slow when called with empty containers like lists, tuples, dicts, sets or frozensets.

This adds a fast path for this case similar to #114266 which speeds up these cases by about 4x - 21x while having a small impact on the default path.

Here are some benchmarks comparing this PR against main. Let me know if there are more benchmarks that I can run to verify that there a no performance regressions for the common case:

import pyperf
runner = pyperf.Runner()

setup = """
import copy

a = {"list": [1, 2 ,3 ,4], "t": (1, 2, 3), "str": "hello", "subdict": {"a": True}}

t = ()
fs = frozenset()
l = []
s = set()
d = {}
deep = [[], (), {}, set(), frozenset()]
"""

runner.timeit(name="deepcopy dict", stmt=f"b = copy.deepcopy(a)", setup=setup)
runner.timeit(name="deepcopy empty tuple", stmt=f"b = copy.deepcopy(t)", setup=setup)
runner.timeit(name="deepcopy empty frozenset", stmt=f"b = copy.deepcopy(fs)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt=f"b = copy.deepcopy(l)", setup=setup)
runner.timeit(name="deepcopy empty set", stmt=f"b = copy.deepcopy(s)", setup=setup)
runner.timeit(name="deepcopy empty dict", stmt=f"b = copy.deepcopy(d)", setup=setup)
runner.timeit(name="deepcopy multiple empty containers", stmt=f"b = copy.deepcopy(deep)", setup=setup)

deepcopy empty tuple: Mean +- std dev: [baseline] 287 ns +- 4 ns -> [opt-empty] 49.8 ns +- 1.7 ns: 5.77x faster
deepcopy empty frozenset: Mean +- std dev: [baseline] 1.46 us +- 0.02 us -> [opt-empty] 67.1 ns +- 1.8 ns: 21.70x faster
deepcopy empty list: Mean +- std dev: [baseline] 327 ns +- 6 ns -> [opt-empty] 67.1 ns +- 8.2 ns: 4.88x faster
deepcopy empty set: Mean +- std dev: [baseline] 1.44 us +- 0.02 us -> [opt-empty] 67.9 ns +- 3.8 ns: 21.26x faster
deepcopy empty dict: Mean +- std dev: [baseline] 329 ns +- 4 ns -> [opt-empty] 68.1 ns +- 3.0 ns: 4.83x faster

Benchmark hidden because not significant (2): deepcopy dict, deepcopy multiple empty containers

Geometric mean: 4.85x faster

Issue: gh-121192

Jul 01 '24 00:07 lgeiger

~~I ran the relevant pyperformance deepcopy benchmark which does seem to suggest that this change makes the non empty case slightly slower:~~EDIT: I re-ran the pyperformance benchmarks with the latest changes in d87901edccdb811075711f7a6d1651044afc8086 and I now can't see any performance regression for the common case.

Performance version: 1.11.0
Python version: 3.14.0a0 (64-bit) revision 1a84bdc237
Report on macOS-14.5-arm64-arm-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2024-07-02 01:23:14.690599
End date: 2024-07-02 01:23:51.844329

### deepcopy ###
Mean +- std dev: 123 us +- 12 us -> 120 us +- 1 us: 1.03x faster
Significant (t=2.25)

### deepcopy_memo ###
Mean +- std dev: 14.1 us +- 0.2 us -> 14.1 us +- 0.3 us: 1.00x slower
Not significant

### deepcopy_reduce ###
Mean +- std dev: 1.22 us +- 0.06 us -> 1.21 us +- 0.01 us: 1.00x faster
Not significant

Jul 01 '24 01:07 lgeiger

The optimization applied in the latest iteration of the PR is only applied when no memo is passed to deepcopy. The advantage is that any recursive structures like {'a': [1, 2, 3, 4, ,5]} will have no (or near zero) overhead. So I agree that performance regression for many common cases is not a big concern.

The question then is whether a deepcopy directly on an empty builtin like like and set (and not a deepcopy on a recursive structure containing empty builtins, e.g, ( {}, [], ())) is a common enough operation to apply this PR. I am not convinced on this, In the corresponding issue an example is given for pydantic basemodels. But there the deepcopy overhead can be avoided by using a default factory. Also the corresponding situation for plain dataclasses cannot occur:

from dataclasses import dataclass

@dataclass
class DC:
    a : list = []
    
x=DC()

raises a ValueError: mutable default <class 'list'> for field a is not allowed: use default_factory. With attrs an empty list is allowed, but no deepcopy is performed on instantiation. For example

import attrs
@attrs.define
class DC:
    a : list = []
    
x=DC()
y=DC()

x.a.append(1)

print(x)
print(y)

results in

DC(a=[1])
DC(a=[1])

Jul 02 '24 20:07 eendebakpt

For reference: I performed a benchmark of main, this PR and the deepcopy C implementation from https://github.com/python/cpython/pull/91610. On benchmark

import pyperf
runner = pyperf.Runner()

setup="""
import copy
from dataclasses import dataclass

@dataclass
class A:
    a : list
    b : str
    c : bool
    d: tuple
    e : set
    
dc=A([1,2,3], 'hello', True, (), {})

empty_tuple = tuple()
empty_list = []
"""

runner.timeit(name="deepcopy empty tuple", stmt="b=copy.deepcopy(empty_tuple)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt="b=copy.deepcopy(empty_list)", setup=setup)
runner.timeit(name="deepcopy dataclass", stmt="b=copy.deepcopy(dc)", setup=setup)

Results are

deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_fastpath_empty] 203 ns +- 7 ns: 4.06x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_fastpath_empty] 233 ns +- 24 ns: 4.29x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_fastpath_empty] 8.22 us +- 0.07 us: 1.02x slower

Geometric mean: 2.57x faster

(env312) c:\projects\misc\cpython\PCbuild>python -m pyperf compare_to dc_main.json dc_c.json
deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_c] 90.6 ns +- 9.8 ns: 9.13x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_c] 216 ns +- 24 ns: 4.63x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_c] 4.06 us +- 0.43 us: 1.97x faster

Geometric mean: 4.37x faster

Jul 02 '24 20:07 eendebakpt

@eendebakpt it can happen for dataclass, example:

from dataclasses import dataclass, as_dict

@dataclass 
class Some:
    field: set[int]

as_dict(Some(set()))

Jul 03 '24 09:07 sobolevn

@eendebakpt it can happen for dataclass, example:

Indeed. Here a more complete example, including the cases where this PR helps

from dataclasses import dataclass, asdict

@dataclass 
class Some:
    a: str
    t: tuple[int]
    field: set[int] # benefits
    l : list[int]
    d : dict[str, str]
    fs: frozenset[int] # benefits
    nonempty : set[int]

s=Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
asdict(s)

Jul 03 '24 09:07 eendebakpt

Indeed. Here a more complete example, including the cases where this PR helps

Thanks for the example! This PR indeed shows a decent improvement here:

import pyperf
runner = pyperf.Runner()

setup = """
from dataclasses import dataclass, asdict

@dataclass
class Some:
    a: str
    t: tuple[int]
    field: set[int] # benefits
    l : list[int]
    d : dict[str, str]
    fs: frozenset[int] # benefits
    nonempty : set[int]

s = Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
"""

runner.timeit(name="dataclass asdict", stmt=f"b = asdict(s)", setup=setup)

Mean +- std dev: [asdict-baseline] 6.26 us +- 0.07 us -> [asdict-opt] 3.39 us +- 0.18 us: 1.85x faster

Jul 03 '24 12:07 lgeiger

Other use-cases that we can find in our own code:

dataclass: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/dataclasses.py#L1487 also as_tuple for set and frozenset
weakref: WeakValueDictionary https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/weakref.py#L191 and WeakKeyDictionary: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/weakref.py#L448 where values can be any of the given types, and keys can be empty tuples and empty frozendicts (which is rather rare, I agree)
http: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/http/cookiejar.py#L1819 where self._cookies is empty by default https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/http/cookiejar.py#L1267
coping nested structures where keys/values are empty collections, like: {'data': []}

Jul 03 '24 12:07 sobolevn