emukit Parameter: objective and model representation

In my understanding, parameter representation in emukit is similar to GPyOpt (whereby the latter states this more explicit): Each parameter has two representations, 1. domain / objective representation 2. model representation. E.g. a categorical parameter can be represented by ('car', 'bike', 'train') and ([1,0,0], [0,1,0], [0,0,1]). The objective function (called user_function in emukit) should get objective representation as input, the model should get model representation as input.

I've got several issues with this:

Input data format of a objective function. Objective representations of parameters are numbers (e.g. continuous param) or strings (e.g. categorical param). User functions expect a 2d-array as input. Mixed typed numpy arrays are only possible as structured arrays afaik. Same applies in GPyOpt
We use model representation as objective function input. E.g. here in sensitivity analysis, the result is quite weird e.g. if you get n sensitivity effects for the one-hot encoded feature dimensions of a categorical parameter. But for e.g. string objective representations this algorithm would also not work the way it is.
There is no generic way to transform objective to model representation and back. GPyOpt provides this on the parameter and space level, which is quite handy!

Do I've got a conceptual misunderstanding or are these real issues?

Thanks in advance. Cheers, David

Apr 09 '19 13:04 dekuenstle

Very good questions!

This all stems from the fact that Emukit is model-agnostic. So when users come to Emukit, it is expected that they have already done necessary preparations in order to be able to create a model. For one, it is assumed that they have already dealt with encoding their categorical features. Emukit needs to know what encoding is used by the model, hence the Encoding class, and a few most common implementations.

Emukit essentially takes both objective and model as library inputs. But it is reasonable to expect that these two don't take input in the same form, especially when there are categoricals in there. Emukit tries to be as non-opinionated as possible about how people tackled that bit while building a model.

It is briefly touched upon in this notebook, but we could have probably done a better job explaining this. Is there anything that you, as a user, would find most helpful?

It is also possible that there is a way for the library to cover this question better, but at the moment I don't see one.

Apr 11 '19 10:04 apaleyes

Thanks for your detailed answer. The emukit_friendly_objective_function in the linked notebook is a good example.
So basically emukit expects both the model and the objective to use the same encoding, all transformations are up to the user. I think the explainations in the library are fine for this, I just confused myself by the terms used in GPyOpt (objective/model encoding and objective function). The user has to define it's config space in emukit anyway. I think it would therefore help for a end-to-end experience (bring objective and model, define the space and loop, emukit deals with the rest), if emukit would provide more utility to encode/decode the data. E.g. as user_function-wrappers and/or methods of the space and parameters.

Apr 12 '19 08:04 dekuenstle

We can definitely consider encode/decode utility methods in emukit. Space seems like the best place for these, but i haven't think it through yet. Stay tuned for updates

Apr 12 '19 11:04 apaleyes