alibi-detect Kernel function extension

This PR aims to increase the number of supported kernels and associated operations.

The objectives of the PR can be summarized as follows:

[x] Allow different kernels to be built upon a base kernel class (i.e. periodic kernel).
[x] Allow each kernel to select a subset of the feature dimensions, enabling different feature dimensions to be treated differently (i.e. use periodic kernel on time-related features and RBF on other numeric features).
[x] Implement combined kernels by adding or product two or more different kernels.
[x] Provide generic methods to initialise and train different kernel parameters as a whole set.
[x] Allow the base and composite kernels to be serialised

The work can therefore be divided into the following stages:

[x] Design and change the current kernel implementation, adding base classes for kernels and combined kernels.
[x] Change the parameter initialisation and training behaviour among existing kernel-based detectors.
[x] Adding docs and tutorial notebooks on the new kernels and usages.
[x] Make additional testing on the introduced functions and procedures.
[x] Ensure the serialisation for all introduced kernels is compatible with the existing framework.

Design notes

As inspired by multiple Gaussian Process libraries (i.e. GPy, GPytorch), a base kernel class is the typical implementation for supporting different built-in and user-defined kernel functions.

Concerning alibi-detect, we aim to support the following functionalities. (1) provides a generalised template for all kernels. (initialisation, compute kernel value). (2) gives unified management for relevant kernel parameters. (initialisation, inference, training, view and modify). (3) allows the kernel value to be computed on specific feature dimensions.

While ideally, the above functions should be integrated into a single base class, we have decided to distribute them into separate parts given the current implementation of alibi detect.

On (1), at the moment, the base kernel class is kept minimum. It only inherits the backend nn module and provides a holder for kernel parameters parameter_dict. This dictionary will be helpful when the kernel has multiple parameters, as we can loop over the dictionary keys to operate on every parameter without explicitly call the name (i.e. sigma.)

On (2), previously, we hard-coded the initialisation, inference and training of the sigma parameter within the RBF kernel (and detectors). This behaviour is lifted now as a separate class called KernelParameter. This class is implemented as a wrapper over the corresponding backend variable (tf.Varaible, torch.nn.Parameter). The class includes init_fn, requires_grad, requires_init as its attributes. Therefore, each parameter can be separately inspected over these attributes and invoke the corresponding procedures. As a result, any manipulation over kernel parameters can now be moved outside of the kernel implementation and only requires to declare of an instance of KernelParameter when the kernel is written. On the other hand, such an implementation maintains the arguments and logistics of the previous RBF kernel therefore would require limited modification over existing detectors.

On (3), following the review comments from @ojcobb, coding the selection within the base class (hence each kernel) will result in duplication of codes and increase the complexity for the users to implement customised kernels. The solution now is to write a wrapper kernel DimensionSelectKernel that performs the selection before passing it to a given kernel. Following the conviction of GPy, we refer to the selection argument as active_dims.

Jul 18 '22 10:07 Srceh

Nice work! Indeed the previous coupling to GaussianRBF got a bit nasty in places and is nice to have got rid of all that and extended with some new possibilities!

A few things that may be worth further consideration:

At the moment specifying parameters, their initialisation functions and trainability is a bit tricky. It seems currently they must be all trainable or all fixed. Moreover if some are specified and some aren't then they all get the initialisation function applied regardless. Is there something we can do to make this a bit cleaner?
Following on from the brief discussion we had on call -- I realise that using dunder methods to define the addition and multiplication of kernels syntactically raises the issue of making associated weights trainable. However given that the vast majority of use cases won't require trainable combinations it seems that perhaps a favourable approach is to proceed with the syntactic approach and then deal with trainability separately when desired. For example the deep kernel would be a lightweight class that defines a composite_kernel = w_a * kernel_a + w_b * kernel_b where w_a and w_b have been defined as trainable parameters.
I know we mentioned this before (and I commented above) but unless there's a compelling reason I'm not aware of we should try to remove the necessity to duplicate active_dims and feature_axis logic within each kernel.

Aug 08 '22 13:08 ojcobb

Self-requesting a review for this since I expect this PR will require moderate changes to the save/load functionality.

Aug 08 '22 16:08 ascillitoe

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Aug 18 '22 10:08 review-notebook-app[bot]

@Srceh is support for combining more than 2 kernels out of scope for this PR? Similarly for using the Python data model with dunder methods for adding/multiplying etc. kernels?

I actually prefer building up slow but if we have these extensions in mind in the future then would be good to clarify and take into account in the design.

Aug 18 '22 15:08 jklaise

@Srceh is support for combining more than 2 kernels out of scope for this PR? Similarly for using the Python data model with dunder methods for adding/multiplying etc. kernels?

I actually prefer building up slow but if we have these extensions in mind in the future then would be good to clarify and take into account in the design.

I mentioned this earlier to @Srceh that we would want arbitrary amounts of kernels to be able to be combined, e.g. via passing nn.ModuleList with any amount of kernels (possibly incl. weights for weighted combinations) instead of 2 kernels.

Aug 18 '22 15:08 arnaudvl

Are none of the other examples affected at all? They still work with the updated kernels producing the same results? Would be great if true, but it sounds almost too good to be true...

Oct 21 '22 14:10 arnaudvl

Are none of the other examples affected at all? They still work with the updated kernels producing the same results? Would be great if true, but it sounds almost too good to be true...

Should be fine on the pytest level, but might be worth some further investigation.

Oct 24 '22 11:10 Srceh