[WIP] Minimal changes necessary for Spock integration
Encompasses the minimal amount of changes needed to integrate Spock.
Unit tests adjusted for all changes. 100% pass.
The only fundamental change to the API is that TreeBandit parameters are specified (as of sklearn>=1.0.2) instead of being an empty Dict. This is due to the nature of Spock type checking where the underlying type of a dictionary value cannot be typing.Any as isinstance raises an exception on these types (as it is not really a type per say). Thus the configuration for TreeBandit is now:
class _DTCCriterion(Enum):
squared_error = "squared_error"
friedman_mse = "friedman_mse"
absolute_error = "absolute_error"
poisson = "poisson"
class _DTCSplitter(Enum):
best = "best"
random = "random"
@spock
class TreeBandit:
"""TreeBandit Neighborhood Policy.
This policy fits a decision tree for each arm using context history.
It uses the leaves of these trees to partition the context space into regions
and keeps a list of rewards for each leaf.
To predict, it receives a context vector and goes to the corresponding
leaf at each arm's tree and applies the given context-free MAB learning policy
to predict expectations and choose an arm.
The TreeBandit neighborhood policy is compatible with the following
context-free learning policies only: EpsilonGreedy, ThompsonSampling and UCB1.
The TreeBandit neighborhood policy is a modified version of
the TreeHeuristic algorithm presented in:
Adam N. Elmachtoub, Ryan McNellis, Sechan Oh, Marek Petrik
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees, UAI 2017
Attributes
----------
tree_parameters: Dict, **kwarg
Parameters of the decision tree.
The keys must match the parameters of sklearn.tree.DecisionTreeClassifier.
When a parameter is not given, the default parameters from
sklearn.tree.DecisionTreeClassifier will be chosen.
Default value is an empty dictionary.
Example
-------
>>> from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy
>>> list_of_arms = ['Arm1', 'Arm2']
>>> decisions = ['Arm1', 'Arm1', 'Arm2', 'Arm1']
>>> rewards = [20, 17, 25, 9]
>>> contexts = [[0, 1, 2, 3], [1, 2, 3, 0], [2, 3, 1, 0], [3, 2, 1, 0]]
>>> mab = MAB(list_of_arms, LearningPolicy.EpsilonGreedy(epsilon=0), NeighborhoodPolicy.TreeBandit())
>>> mab.fit(decisions, rewards, contexts)
>>> mab.predict([[3, 2, 0, 1]])
'Arm2'
"""
criterion: Optional[_DTCCriterion] = _DTCCriterion.squared_error
splitter: Optional[_DTCSplitter] = _DTCSplitter.best
max_depth: Optional[int] = None
min_samples_split: int = 2
min_samples_leaf: int = 1
min_weight_fraction_leaf: float = 0.0
max_features: Optional[int] = None
random_state: Optional[int] = None
max_leaf_nodes: Optional[int] = None
min_impurity_decrease: float = 0.0
ccp_alpha: float = 0.0
def __post_hook__(self):
try:
if self.ccp_alpha is not None:
ge(self.ccp_alpha, bound=0.0)
except Exception as e:
raise ValueError(
f"`{self.__class__.__name__}` could not be instantiated -- spock message: {e}"
)
Notable Fixes
- Fixed failing examples (Random, CustomizedMAB, ParallelMAB)
- Fixed incorrect definitions of TreeBandit parameters -- implementation uses
DecisionTreeRegressorwhile configuration was usingDecisionTreeClassifierwhich have different parameters - Fixed all NoReturn types which are incorrect type hints
@bkleyn @skadio
This is the minimal set of changes needed for Spock integration without heavily refactoring the backend and/or front-end API
Closing this one due to over-time (many thanks to @ncilfone!)