Problem with ExpectedErrorReduction for instance selection

Open JuliaMasche opened this issue 5 years ago • 1 comments

Hi,

I am running experiments for multi-class classification and get the following error for EpectedErrorReduction: File "/home/julia/master_thesis/env/lib/python3.6/site-packages/alipy/query_strategy/query_labels.py", line 829, in select score.append(pv[i, yi] * self.log_loss(prob)) IndexError: index 4 is out of bounds for axis 1 with size 4

I think the Error is that my initial seed set (label index) does not contain all labels which can be found in y. In the following code, shouldn't it be

classes = np.unique(label_y) instead of classes = np.unique(self.y)?

`` if self.X is None or self.y is None: raise Exception('Data matrix is not provided.') if model is None: model = LogisticRegression(solver='liblinear') model.fit(self.X[label_index if isinstance(label_index, (list, np.ndarray)) else label_index.index], self.y[label_index if isinstance(label_index, (list, np.ndarray)) else label_index.index])

    unlabel_x = self.X[unlabel_index]
    label_y = self.y[label_index]
    ##################################

    classes = np.unique(self.y)
    pv, spv = _get_proba_pred(unlabel_x, model)
    scores = []
    for i in range(spv[0]):
        new_train_inds = np.append(label_index, unlabel_index[i])
        new_train_X = self.X[new_train_inds, :]
        unlabel_ind = list(unlabel_index)
        unlabel_ind.pop(i)
        new_unlabel_X = self.X[unlabel_ind, :]
        score = []
        for yi in classes:
            new_model = copy.deepcopy(model)
            new_model.fit(new_train_X, np.append(label_y, yi))
            prob = new_model.predict_proba(new_unlabel_X)
            score.append(pv[i, yi] * self.log_loss(prob))
        scores.append(np.sum(score))

    return unlabel_index[nsmallestarg(scores, batch_size)]``

Jun 21 '20 18:06 JuliaMasche

Hi, the problem is exactly what you think. Note that most of the existing AL methods do not expect the examples in unseen classes, unless they are proposed for openset problems.

So, please re-split your dataset and ensure that each class contains at least one example. (Use the split function in alipy can achieve this goal :) )

Jun 23 '20 01:06 tangypnuaa