Caching interfaces
All packages made for performance (ForwardDiff.jl, FiniteDiff.jl, SparseDiffTools.jl), include some kind of caching interface. For example, instead of ForwardDiff.jacobian(f,x), you should call ForwardDiff.jacobian(f,x,config). config, cache, etc. is all important for storing the cache vectors. So it would be good to extend the interface for allow each backend to have an (optional) config struct, which is just created on demand if not supplied by the user (which is how it's done in those packages anyways).
Agreed. One complication here is that the end user may construct a backend to pass to some package with no knowledge of which sequence of AD functions are being called within the package. e.g. the package may call value_and_pullback_function, which for some backends natively computes their "pullback", while for others it calls gradient on an anonymous function constructed internally in AD.jl. So how could even the constructor of such a backend know which cache to create? And making the struct mutable to support changing the cache if necessary just loses type-stability.
A similar complication arose when considering how to support compiled tapes (https://github.com/JuliaDiff/AbstractDifferentiation.jl/pull/29#issuecomment-1017482219 and following).
SparseDiffTools now has a caching interface corresponding to ADTypes.jl https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/highlevel/common.jl