gh-121192: Add fast path to `deepcopy()` for empty list/tuple/dict/set
deepcopy() can be surprisingly slow when called with empty containers like lists, tuples, dicts, sets or frozensets.
This adds a fast path for this case similar to #114266 which speeds up these cases by about 4x - 21x while having a small impact on the default path.
Here are some benchmarks comparing this PR against main. Let me know if there are more benchmarks that I can run to verify that there a no performance regressions for the common case:
import pyperf
runner = pyperf.Runner()
setup = """
import copy
a = {"list": [1, 2 ,3 ,4], "t": (1, 2, 3), "str": "hello", "subdict": {"a": True}}
t = ()
fs = frozenset()
l = []
s = set()
d = {}
deep = [[], (), {}, set(), frozenset()]
"""
runner.timeit(name="deepcopy dict", stmt=f"b = copy.deepcopy(a)", setup=setup)
runner.timeit(name="deepcopy empty tuple", stmt=f"b = copy.deepcopy(t)", setup=setup)
runner.timeit(name="deepcopy empty frozenset", stmt=f"b = copy.deepcopy(fs)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt=f"b = copy.deepcopy(l)", setup=setup)
runner.timeit(name="deepcopy empty set", stmt=f"b = copy.deepcopy(s)", setup=setup)
runner.timeit(name="deepcopy empty dict", stmt=f"b = copy.deepcopy(d)", setup=setup)
runner.timeit(name="deepcopy multiple empty containers", stmt=f"b = copy.deepcopy(deep)", setup=setup)
deepcopy empty tuple: Mean +- std dev: [baseline] 287 ns +- 4 ns -> [opt-empty] 49.8 ns +- 1.7 ns: 5.77x faster
deepcopy empty frozenset: Mean +- std dev: [baseline] 1.46 us +- 0.02 us -> [opt-empty] 67.1 ns +- 1.8 ns: 21.70x faster
deepcopy empty list: Mean +- std dev: [baseline] 327 ns +- 6 ns -> [opt-empty] 67.1 ns +- 8.2 ns: 4.88x faster
deepcopy empty set: Mean +- std dev: [baseline] 1.44 us +- 0.02 us -> [opt-empty] 67.9 ns +- 3.8 ns: 21.26x faster
deepcopy empty dict: Mean +- std dev: [baseline] 329 ns +- 4 ns -> [opt-empty] 68.1 ns +- 3.0 ns: 4.83x faster
Benchmark hidden because not significant (2): deepcopy dict, deepcopy multiple empty containers
Geometric mean: 4.85x faster
- Issue: gh-121192
~~I ran the relevant pyperformance deepcopy benchmark which does seem to suggest that this change makes the non empty case slightly slower:~~EDIT: I re-ran the pyperformance benchmarks with the latest changes in d87901edccdb811075711f7a6d1651044afc8086 and I now can't see any performance regression for the common case.
Performance version: 1.11.0
Python version: 3.14.0a0 (64-bit) revision 1a84bdc237
Report on macOS-14.5-arm64-arm-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2024-07-02 01:23:14.690599
End date: 2024-07-02 01:23:51.844329
### deepcopy ###
Mean +- std dev: 123 us +- 12 us -> 120 us +- 1 us: 1.03x faster
Significant (t=2.25)
### deepcopy_memo ###
Mean +- std dev: 14.1 us +- 0.2 us -> 14.1 us +- 0.3 us: 1.00x slower
Not significant
### deepcopy_reduce ###
Mean +- std dev: 1.22 us +- 0.06 us -> 1.21 us +- 0.01 us: 1.00x faster
Not significant
The optimization applied in the latest iteration of the PR is only applied when no memo is passed to deepcopy. The advantage is that any recursive structures like {'a': [1, 2, 3, 4, ,5]} will have no (or near zero) overhead. So I agree that performance regression for many common cases is not a big concern.
The question then is whether a deepcopy directly on an empty builtin like like and set (and not a deepcopy on a recursive structure containing empty builtins, e.g, ( {}, [], ())) is a common enough operation to apply this PR. I am not convinced on this, In the corresponding issue an example is given for pydantic basemodels. But there the deepcopy overhead can be avoided by using a default factory. Also the corresponding situation for plain dataclasses cannot occur:
from dataclasses import dataclass
@dataclass
class DC:
a : list = []
x=DC()
raises a ValueError: mutable default <class 'list'> for field a is not allowed: use default_factory. With attrs an empty list is allowed, but no deepcopy is performed on instantiation. For example
import attrs
@attrs.define
class DC:
a : list = []
x=DC()
y=DC()
x.a.append(1)
print(x)
print(y)
results in
DC(a=[1])
DC(a=[1])
For reference: I performed a benchmark of main, this PR and the deepcopy C implementation from https://github.com/python/cpython/pull/91610. On benchmark
import pyperf
runner = pyperf.Runner()
setup="""
import copy
from dataclasses import dataclass
@dataclass
class A:
a : list
b : str
c : bool
d: tuple
e : set
dc=A([1,2,3], 'hello', True, (), {})
empty_tuple = tuple()
empty_list = []
"""
runner.timeit(name="deepcopy empty tuple", stmt="b=copy.deepcopy(empty_tuple)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt="b=copy.deepcopy(empty_list)", setup=setup)
runner.timeit(name="deepcopy dataclass", stmt="b=copy.deepcopy(dc)", setup=setup)
Results are
deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_fastpath_empty] 203 ns +- 7 ns: 4.06x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_fastpath_empty] 233 ns +- 24 ns: 4.29x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_fastpath_empty] 8.22 us +- 0.07 us: 1.02x slower
Geometric mean: 2.57x faster
(env312) c:\projects\misc\cpython\PCbuild>python -m pyperf compare_to dc_main.json dc_c.json
deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_c] 90.6 ns +- 9.8 ns: 9.13x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_c] 216 ns +- 24 ns: 4.63x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_c] 4.06 us +- 0.43 us: 1.97x faster
Geometric mean: 4.37x faster
@eendebakpt it can happen for dataclass, example:
from dataclasses import dataclass, as_dict
@dataclass
class Some:
field: set[int]
as_dict(Some(set()))
@eendebakpt it can happen for dataclass, example:
Indeed. Here a more complete example, including the cases where this PR helps
from dataclasses import dataclass, asdict
@dataclass
class Some:
a: str
t: tuple[int]
field: set[int] # benefits
l : list[int]
d : dict[str, str]
fs: frozenset[int] # benefits
nonempty : set[int]
s=Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
asdict(s)
Indeed. Here a more complete example, including the cases where this PR helps
Thanks for the example! This PR indeed shows a decent improvement here:
import pyperf
runner = pyperf.Runner()
setup = """
from dataclasses import dataclass, asdict
@dataclass
class Some:
a: str
t: tuple[int]
field: set[int] # benefits
l : list[int]
d : dict[str, str]
fs: frozenset[int] # benefits
nonempty : set[int]
s = Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
"""
runner.timeit(name="dataclass asdict", stmt=f"b = asdict(s)", setup=setup)
Mean +- std dev: [asdict-baseline] 6.26 us +- 0.07 us -> [asdict-opt] 3.39 us +- 0.18 us: 1.85x faster
Other use-cases that we can find in our own code:
- dataclass: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/dataclasses.py#L1487 also
as_tupleforsetandfrozenset - weakref:
WeakValueDictionaryhttps://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/weakref.py#L191 andWeakKeyDictionary: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/weakref.py#L448 where values can be any of the given types, and keys can be empty tuples and empty frozendicts (which is rather rare, I agree) - http: https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/http/cookiejar.py#L1819 where
self._cookiesis empty by default https://github.com/python/cpython/blob/26d24eeb90d781e381b97d64b4dcb1ee4dd891fe/Lib/http/cookiejar.py#L1267 - coping nested structures where keys/values are empty collections, like:
{'data': []}