Reading Config from SymLink
I'm working with the Sparklyr package and trying to write a custom config.yml file.
As part of my workflow, I work across multiple devices on a linux system. In order to use a single config.yml file, I'm trying to symlink from one drive to another. When I create the symlink, the config package is not able to read the file on the symlink drive.
Has anyone else run into this problem?
Also, I'm pretty sure this isn't traversing the directory correctly.
I tested putting an actual file, not symlink, in a parent directory. this is unable to find that file
@bradleyfay: If to the sparklyr::spark_config function you pass either an absolute path or a path relative to R's current working directory (more on this below), then the underlying config::get function call should be able to find and read from your configuration file if the file (or symlink to the actual file) exists at the specified location (more on this below).
However, I am surprised to see sparklyr::spark_config rely on config::get for locating the configuration file, because if for example you do have a config file at /tmp/config.yml, do not have a config file at /tmp/subdir/config.yml, and call config::get("/tmp/subdir/config.yml"), /tmp/config.yml is loaded with no error or warning displayed to the user. config::get's behavior here is documented so this isn't a config::get bug. In my view, this is a sparklyr::spark_config bug as sparklyr is not warning users of this surprising behavior. @bradleyfay, I suggest you open an issue about this on sparklyr's issue tracker.
All that said, taking a brief look at the code in config::get responsible for locating a config file, I see a few of issues.
- Let's take the file and directory structure from the example in paragraph 2 above. Ensure that R's current working directory is
/tmp/subdir, and invokeconfig::get("config.yml").config::getis unable to ascend to the parent directory because bothnormalizePathcalls return"config.yml"(line 31, line 35), sofile_dir <- dirname(file)assigns"."tofile_dirandparent_dir <- dirname(file_dir)assigns"."toparent_dir, and the while loop is broken out of. - If the config file is located in the root directory, it's possible for
parent_dirto equal"/", in which case the assignmentfile <- file.path(parent_dir, basename(file))will assign"//config.yml"tofile, an invalid path thatnormalizePathcan't fix. - Assuming no symlinks are involved, one solution to (1) would be to change
file_dir <- dirname(file)tofile_dir <- normalizePath(dirname(file), mustWork = FALSE). However, this solution is fundamentally broken because it fails to account for symlinked directories. For example, assume that/tmpis a regular directory containing a regular fileconfig.yml, but that/tmp/subdiris now a symlink that ultimately resolves to a directory that does not have/tmpanywhere above it in the directory hierarchy; say,/home/user/subdir. Then file_dir <- dirname(file) will assign/home/user/subdirtofile_dir, resulting in traversing the directories/home/user/subdir,/home/user,/home, and/. Either no config file will be found, or even worse, a differentconfig.ymlfile may be unexpectedly loaded. Users would probably expect the behavior that would result by replacing calls tonormalizePathwith something like Python'sos.path.abspath, but as an R newbie, I don't know if R has such a function.
I just noticed that when R is launched from within or under a symlinked directory, R's getwd() resolves the current working directory to an absolute path without any symlinks. I imagine this is what's causing issue (3) above, making my suggested improvement of something like Python's os.path.abspath unworkable.
Surprising that R does this!