config icon indicating copy to clipboard operation
config copied to clipboard

Reading Config from SymLink

Open bradleyfay opened this issue 9 years ago • 3 comments

I'm working with the Sparklyr package and trying to write a custom config.yml file.

As part of my workflow, I work across multiple devices on a linux system. In order to use a single config.yml file, I'm trying to symlink from one drive to another. When I create the symlink, the config package is not able to read the file on the symlink drive.

Has anyone else run into this problem?

bradleyfay avatar Dec 01 '16 14:12 bradleyfay

Also, I'm pretty sure this isn't traversing the directory correctly.

I tested putting an actual file, not symlink, in a parent directory. this is unable to find that file

bradleyfay avatar Dec 01 '16 15:12 bradleyfay

@bradleyfay: If to the sparklyr::spark_config function you pass either an absolute path or a path relative to R's current working directory (more on this below), then the underlying config::get function call should be able to find and read from your configuration file if the file (or symlink to the actual file) exists at the specified location (more on this below).

However, I am surprised to see sparklyr::spark_config rely on config::get for locating the configuration file, because if for example you do have a config file at /tmp/config.yml, do not have a config file at /tmp/subdir/config.yml, and call config::get("/tmp/subdir/config.yml"), /tmp/config.yml is loaded with no error or warning displayed to the user. config::get's behavior here is documented so this isn't a config::get bug. In my view, this is a sparklyr::spark_config bug as sparklyr is not warning users of this surprising behavior. @bradleyfay, I suggest you open an issue about this on sparklyr's issue tracker.

All that said, taking a brief look at the code in config::get responsible for locating a config file, I see a few of issues.

  1. Let's take the file and directory structure from the example in paragraph 2 above. Ensure that R's current working directory is /tmp/subdir, and invoke config::get("config.yml"). config::get is unable to ascend to the parent directory because both normalizePath calls return "config.yml" (line 31, line 35), so file_dir <- dirname(file) assigns "." to file_dir and parent_dir <- dirname(file_dir) assigns "." to parent_dir, and the while loop is broken out of.
  2. If the config file is located in the root directory, it's possible for parent_dir to equal "/", in which case the assignment file <- file.path(parent_dir, basename(file)) will assign "//config.yml" to file, an invalid path that normalizePath can't fix.
  3. Assuming no symlinks are involved, one solution to (1) would be to change file_dir <- dirname(file) to file_dir <- normalizePath(dirname(file), mustWork = FALSE). However, this solution is fundamentally broken because it fails to account for symlinked directories. For example, assume that /tmp is a regular directory containing a regular file config.yml, but that /tmp/subdir is now a symlink that ultimately resolves to a directory that does not have /tmp anywhere above it in the directory hierarchy; say, /home/user/subdir. Then file_dir <- dirname(file) will assign /home/user/subdir to file_dir, resulting in traversing the directories /home/user/subdir, /home/user, /home, and /. Either no config file will be found, or even worse, a different config.yml file may be unexpectedly loaded. Users would probably expect the behavior that would result by replacing calls to normalizePath with something like Python's os.path.abspath, but as an R newbie, I don't know if R has such a function.

manselmi avatar Dec 01 '16 22:12 manselmi

I just noticed that when R is launched from within or under a symlinked directory, R's getwd() resolves the current working directory to an absolute path without any symlinks. I imagine this is what's causing issue (3) above, making my suggested improvement of something like Python's os.path.abspath unworkable.

Surprising that R does this!

manselmi avatar Dec 02 '16 13:12 manselmi