Exp run: reformats dvc.yaml
Description
In DVC 3.33.3 dvc exp run tends to reformat dvc.yaml file. It was mentioned in Discord that the problem is with default width in ruamel.yaml. Unfortunately besides breaking the lines it tends to merge the lines as well which is pretty inconvenient. The way that dvc.yaml from my practice look like after this formatting is hardly readable (I attached the example below):
I think that dvc.yaml should not be altered during dvc exp run.
Reproduce
git clone https://github.com/Danila89/dvc_empty.git && cd dvc_empty && git pull --all && git checkout dsavenkov/dvc_yaml_formatting && dvc exp run -n something
After running this command dvc.yaml will have unstaged changes.
Expected
dvc.yaml is unchanged
Environment information
Output of dvc doctor:
(base) danila.savenkov@RS-UNIT-0099 dvc_empty % dvc doctor
DVC version: 3.33.3 (pip)
-------------------------
Platform: Python 3.10.9 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
dvc_data = 2.22.6
dvc_objects = 1.4.9
dvc_render = 1.0.0
dvc_task = 0.3.0
scmrepo = 1.5.0
Supports:
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.5.0, boto3 = 1.26.76)
Config:
Global: /Users/danila.savenkov/Library/Application Support/dvc
System: /Library/Application Support/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s3s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/64bbbded2e55036b006c56ceaefa98e1
A similar issue I've struggled with is that something like dvc repro -s some_stage will update the dvc.lock file not only in the part relevant for some_stage, but will reformat it elsewhere as well, generally in a way that leaves trailing white spaces throughout the file. So if the dvc.lock contains something like this
some_stage:
cmd: python -m src.process_zip
long/path/to/data.zip
deps:
- path:
long/path/to/data.zip
hash: md5
md5: a5074fdca2d1bf921dd9ea26c61646a3
size: 13013258
outs:
- path: path/to/out.zip
hash: md5
md5: a5074fdca2d1bf921dd9ea26c61646a3
size: 13013258
other_stage:
cmd: python -m src.other_stuff
deps:
- path: src/other_stuff.py
hash: md5
md5: a5074fdca2d1bf921dd9ea26c61646a3
size: 13013258
outs:
# ...
then doing dvc repro -s other_stage will modify the information under some_stage by inserting a space at the end of
cmd: python -m src.process_zip
and after - path: preceding long/path/to/data.zip
Seems to happen if the line
cmd: python -m src.process_zip long/path/to/data.zip
would exceed 140 chars, and the
- path: long/path/to/data.zip
90 (but not sure).
What I end up doing is running pre-commit run --files dvc.lock || git add dvc.lock to fix it after every update to dvc.lock...