toad
toad copied to clipboard
使用iris前100列训练时 combiner和transer 分箱结果不一致
code:
from sklearn.datasets import load_iris
import toad
iris = load_iris()
X = iris.data[:100]
y = iris.target[:100]
target_names = ['slen', 'swid', 'plen', 'pwid']
data = pd.DataFrame(X, columns =target_names)
data["label"] =y
c = toad.transform.Combiner()
# 使用特征筛选后的数据进行训练:使用稳定的卡方分箱,规定每箱至少有5%数据, 空值将自动被归到最佳箱。
c.fit(data, y = "label", method = 'step',empty_separate = True)
c_trans = c.transform(data)
transer = toad.transform.WOETransformer()
train_woe = transer.fit_transform(c_trans, data["label"], exclude=["label"])
card = toad.ScoreCard(
combiner = c,
transer = transer,
tol =1e-8,
max_iter=50,
n_jobs=1
)
card.fit(data[target_names], data["label"])
toad version 0.0.64.0 python 3.6
报错信息: Exception: column 'plen' is not matched, assert 9 bins but given 10
等频分箱时可以正常训练,但是等宽就不可以。
@topxxuki 等宽分箱会出现有空箱的情况,这个问题就是空箱导致的,建议更换分箱方法