Keep secondary storage cache of installed modules
Currently loading a module that isn’t already in cache parses and evaluates all its source files, which is potentially time-consuming, especially when compared to loading a package: installed R packages aren’t loaded from source. Instead they’re loaded from a lazy-load database.
‘box’ could maintain a secondary storage cache (unless disabled) that is queried before the source version of a module is loaded, unless the latter has a more recent timestamp. In that case, the cache would be invalidated, the source version loaded, and subsequently cached.
R doesn’t seem to provide a public API for generating lazy-load databases, but I don’t understand the purpose of lazy loading for exported names anyway — using RDS with a custom serialisation hook for package/module dependencies seems easier.
Lastly, keeping modules cached also means we can finally implement byte-compilation of modules without a prohibitive overhead on loading.
Some notes:
- Cache path:
box.cache(overridden byR_BOX_CACHE)- defaults to
XDG_CACHE_HOME/R/%v/%p/box(placeholders as forR_LIBS_*) or equivalent - explicitly set to
NULLto disable
- defaults to
- Is a modification timestamp sufficient to establish cache validity or is a hash required?
- Figure out how to customise serialisation of dependencies.
- Terminology in API: term “cache” is now overloaded because we unfortunately already have the function
purge_cache. - Cache module help as well?
- What about integration of compiled native code?
- Hook to run on “installation” of a module into the cache? (see #163)
- #14
- Add exported function to explicit add/remove modules to/from cache (e.g.
install/uninstall)?- As an included module? As a command line utility?