config: support checkout_jobs
Bug Report
checkout: slow checkouts
Description
Checkout copies all files in parallel, leading to disk saturation, and excessive checkout times. E.g. At this time, lsof for the dvc process shows 331 files open.
Reproduce
dvc pull
Expected
Parallelization in moderation, respecting the jobs: parameter in .dvc/config, or some similar parameter.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 3.11.1 (pip)
-------------------------
Platform: Python 3.10.10 on Linux-6.1.0-11-amd64-x86_64-with-glibc2.36
Subprojects:
dvc_data = 2.10.1
dvc_objects = 0.24.1
dvc_render = 0.5.3
dvc_task = 0.3.0
scmrepo = 1.1.0
Supports:
http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
ssh (sshfs = 2023.7.0)
Config:
Global: /home/john/.config/dvc
System: /etc/xdg/dvc
Additional Information (if any):
https://discuss.dvc.org/t/is-jobs-n-ignored-on-local-stores/1768
I had a quick look at it, and I need to dig down deeper but at first sight, the jobs parameter (which is renamed batch_size at some point) seems lost between here https://github.com/iterative/dvc-data/blob/aea2be100b0cf4c8bcdb1dc0755bcee10bff296c/src/dvc_data/hashfile/transfer.py#L224-L237
and here
https://github.com/iterative/dvc-data/blob/aea2be100b0cf4c8bcdb1dc0755bcee10bff296c/src/dvc_data/hashfile/transfer.py#L58-L67
It is maybe reused later somewhere using the **kwargs but I haven't had the time to look deeper into it yet.