pure-predict
pure-predict copied to clipboard
Error when predict with converted model built with CountVectorizer(binary=True)
Describe the bug
An error is raised when making an inference with a converted sklearn model built with CountVectorizer(binary=True). It's ok if binary=False
To Reproduce
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from pure_sklearn.map import convert_estimator
vectorizer = CountVectorizer(binary=True)
model = LogisticRegression(random_state=0)
pipeline = Pipeline([
('vect', vectorizer),
('clf', model)
])
X_train = ['one text', 'two text', 'three text']
y_train = ['1', '2', '3']
pipeline.fit(X_train, y_train)
converted = convert_estimator(pipeline)
converted.predict(['four'])
It's ok if a vectorizer is created with binary=False.
Expected behavior There shouldn't be any errors.
Additional context Add any other context about the problem here.