toad icon indicating copy to clipboard operation
toad copied to clipboard

使用iris前100列训练时 combiner和transer 分箱结果不一致

Open topxxuki opened this issue 4 years ago • 1 comments

code:

from sklearn.datasets import load_iris
import toad 
iris = load_iris()
X = iris.data[:100]
y = iris.target[:100]
target_names = ['slen', 'swid', 'plen', 'pwid']
data = pd.DataFrame(X, columns =target_names)
data["label"] =y
c = toad.transform.Combiner()
# 使用特征筛选后的数据进行训练:使用稳定的卡方分箱,规定每箱至少有5%数据, 空值将自动被归到最佳箱。
c.fit(data, y = "label", method = 'step',empty_separate = True)
c_trans = c.transform(data)

transer = toad.transform.WOETransformer()
train_woe = transer.fit_transform(c_trans, data["label"], exclude=["label"])
card = toad.ScoreCard(
    combiner = c,
    transer = transer,
    tol =1e-8,
    max_iter=50,
    n_jobs=1
)
card.fit(data[target_names], data["label"])

toad version 0.0.64.0 python 3.6

报错信息: Exception: column 'plen' is not matched, assert 9 bins but given 10

等频分箱时可以正常训练,但是等宽就不可以。

topxxuki avatar Nov 25 '21 08:11 topxxuki

@topxxuki 等宽分箱会出现有空箱的情况,这个问题就是空箱导致的,建议更换分箱方法

Secbone avatar Nov 27 '21 12:11 Secbone