Unable to copy running-config to non-existing startup-config
Current Behavior
No user can copy running to startup if startup doesn't exist or the full path name is given (copy running /cfg/startup-config.cfg)
root@ix-00-00-00:/> copy running-config startup-config
Error: setting group owner wheel (10) on /cfg/startup-config.cfg: No such file or directory
No startup-config.cfg is created here ^
root@ix-00-00-00:~$ ls /cfg/startup-config.cfg
ls: /cfg/startup-config.cfg: No such file or directory
Expected Behavior
root@ix-00-00-00:/> remove startup-config.cfg
Remove /cfg/startup-config.cfg, are you sure (y/N)? y
root@ix-00-00-00:/> copy running-config startup-config
Steps To Reproduce
root@ix-00-00-00:/> remove startup-config.cfg
Remove /cfg/startup-config.cfg, are you sure (y/N)? y
root@ix-00-00-00:/> copy running-config startup-config
Error: setting group owner wheel (10) on /cfg/startup-config.cfg: No such file or directory
Additional information
This problem manifests itself by throwing this chown error but the main problem is that no start-config is created. The command copy running startup takes a different code path than copy running /cfg/startup-config.cfg.
I think the root cause is that following should be && instead of || -- so that we can enter the first if() statement to reach systemf("sysrepocfg -d %s -X%s -f json", srcds->sysrepocfg, fn);, but I don't understand how this could work if the file exists ... 🤔
https://github.com/kernelkit/infix/blob/dd788f5611b1cac232673c0a2655edb0368e5f9a/src/bin/copy.c#L241-L244
I had a quick look.
startup-config is the only infix_config with a path, that causes the error printout from the permission code. But I'm not sure this is purely a problem with the copy code. If the file exists it's updated properly but I don't see any explicit writes to it in the copy code? Is it sysrepo who updates the backing file (/cfg/startup-config.cfg)?
EDIT: To clarify, the copy running /cfg/startup-config.cfg works as it takes a totally different code path as /cfg/startup-config.cfg is treated as a file and not a data store.
CCB: Future
Found the root cause for this now. Turns out the copy command does "nothing" when we save running to startup, instead it's a trigger in confd:core.c that runs -- but only if there have been any changes to the startup datastore.
Took me a while, but I'm now recalling that one of the reasons for this weird design is that we have users that script Infix using sysrepocfg, who expect sysrepocfg -d startup -C running to persist across boots.
The trigger in confd, core_startup_save(), runs for each change registered. E.g., for nine changes in running config, we will call sysrepocfg -X/cfg/startup-config.cfg -d startup -f json nine times. Less than ideal, obviously.
So if /cfg/startup-config.cfg is removed manually, running-config is updated, followed by another copy running-config startup-config from the CLI, the file will be created, but with the wrong permissions.
There are multiple ways forward here, a radical one is to remove the tools sysrepocfg and sysrepoctl -- do we really need them if we ensure the CLI and our own command line tools link with libsysrepo and use the C API instead?
Another, perhaps less invasive which would allow us to drop core_startup_save() is to patch sysrepo to support an alternative (single) backend-store file for startup, /cfg/startup-config.cfg.
Personally I vote for the section option, either way a decision needs to be made before the copy command can be simplified/refactored.
After discussions during today's CCB we agreed we need something fundamental to fix this issue, and @wkz suggested instead patch sysrepo to add a post-hook with a datastore argument. That way we could at the same time remove our own num_changes hack in confd:core.c
Despite the alternatives, I've found a way just tell sysrepocfg to export the source datastore in the case where the destination in the copy command is a file. This works and fixes the bug.