box icon indicating copy to clipboard operation
box copied to clipboard

Keep secondary storage cache of installed modules

Open klmr opened this issue 4 years ago • 0 comments

Currently loading a module that isn’t already in cache parses and evaluates all its source files, which is potentially time-consuming, especially when compared to loading a package: installed R packages aren’t loaded from source. Instead they’re loaded from a lazy-load database.

‘box’ could maintain a secondary storage cache (unless disabled) that is queried before the source version of a module is loaded, unless the latter has a more recent timestamp. In that case, the cache would be invalidated, the source version loaded, and subsequently cached.

R doesn’t seem to provide a public API for generating lazy-load databases, but I don’t understand the purpose of lazy loading for exported names anyway — using RDS with a custom serialisation hook for package/module dependencies seems easier.

Lastly, keeping modules cached also means we can finally implement byte-compilation of modules without a prohibitive overhead on loading.

Some notes:

  • Cache path: box.cache (overridden by R_BOX_CACHE)
    • defaults to XDG_CACHE_HOME/R/%v/%p/box (placeholders as for R_LIBS_*) or equivalent
    • explicitly set to NULL to disable
  • Is a modification timestamp sufficient to establish cache validity or is a hash required?
  • Figure out how to customise serialisation of dependencies.
  • Terminology in API: term “cache” is now overloaded because we unfortunately already have the function purge_cache.
  • Cache module help as well?
  • What about integration of compiled native code?
  • Hook to run on “installation” of a module into the cache? (see #163)
  • #14
  • Add exported function to explicit add/remove modules to/from cache (e.g. install/uninstall)?
    • As an included module? As a command line utility?

klmr avatar Jan 20 '22 23:01 klmr