clusters= in cross_validated doesn't seem to work properly with more than 3 clusters
clusters= in cross_validated doesn't seem to work properly with more than 3 clusters:
import optunity import optunity.cross_validation import numpy as np
def f(x_train, y_train, x_test, y_test): if(bool(set(x_test)&set(x_train))): print("test and set clusters overlap:",set(x_test)&set(x_train)) print("train data:\t" + str(x_train) + "\t train labels:\t" + str(y_train)) print("test data:\t" + str(x_test) + "\t test labels:\t" + str(y_test)) return 0.0
function to create a list of clusters from a group/cluster variable
def ind_group(gr3): i1=[] for n1 in set(gr3): i2=np.in1d(gr3,n1).nonzero()[0].astype(int).tolist() i1.append(i2) return(i1)
create data with group/cluster structure:
data = np.repeat(range(4), 2) print("data:",data) groups = ind_group(data) print("groups",groups) f_clustered = optunity.cross_validated(x=data, y=data, clusters=groups[1:], num_folds=3)(f) f_clustered()
('data:', array([0, 0, 1, 1, 2, 2, 3, 3])) ('groups', [[0, 1], [2, 3], [4, 5], [6, 7]])
('test and set clusters overlap:', set([0])) train data: [2 2 1 1 0] train labels: [2 2 1 1 0] test data: [3 3 0] test labels: [3 3 0]
('test and set clusters overlap:', set([0])) train data: [2 2 3 3 0] train labels: [2 2 3 3 0] test data: [1 1 0] test labels: [1 1 0]
Process finished with exit code 0
Thanks for reporting this bug and a the MWE, I will look into this.