Keep secondary storage cache of installed modules

Open klmr opened this issue 4 years ago • 0 comments

Currently loading a module that isn’t already in cache parses and evaluates all its source files, which is potentially time-consuming, especially when compared to loading a package: installed R packages aren’t loaded from source. Instead they’re loaded from a lazy-load database.

‘box’ could maintain a secondary storage cache (unless disabled) that is queried before the source version of a module is loaded, unless the latter has a more recent timestamp. In that case, the cache would be invalidated, the source version loaded, and subsequently cached.

R doesn’t seem to provide a public API for generating lazy-load databases, but I don’t understand the purpose of lazy loading for exported names anyway — using RDS with a custom serialisation hook for package/module dependencies seems easier.

Lastly, keeping modules cached also means we can finally implement byte-compilation of modules without a prohibitive overhead on loading.

Some notes:

Cache path: box.cache (overridden by R_BOX_CACHE)
- defaults to XDG_CACHE_HOME/R/%v/%p/box (placeholders as for R_LIBS_*) or equivalent
- explicitly set to NULL to disable
Is a modification timestamp sufficient to establish cache validity or is a hash required?
Figure out how to customise serialisation of dependencies.
Terminology in API: term “cache” is now overloaded because we unfortunately already have the function purge_cache.
Cache module help as well?
What about integration of compiled native code?
Hook to run on “installation” of a module into the cache? (see #163)
#14
Add exported function to explicit add/remove modules to/from cache (e.g. install/uninstall)?
- As an included module? As a command line utility?

Jan 20 '22 23:01 klmr