Error while using the train.fit() function in MultiTrain
I am trying to implement an example of using MultiTrain, from this article: https://www.analyticsvidhya.com/blog/2022/09/make-model-training-and-testing-easier-with-multitrain/
I receive an error in this part of the code:
After splitting the features and labels into train, test is appended to a variable named split. This variable then holds X_train, X_test, y_train, and y_test; we would need it in the next function below. train.fit(X: str = None, y: str = None, split_self: bool = False, X_train: str = None, X_test: str = None, y_train: str = None, y_test: str = None, split_data: str = None, splitting: bool = False, kf: bool = False, fold: int = 5, excel: bool = False, return_best_model: str = None, show_train_score: bool = False)
fit = train.fit(X=features, y=labels, splitting=True, split_data=split)
THE ERROR : LinearRegression(n_jobs=-1) fitting LinearRegression(n_jobs=-1)
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/MultiTrain/regression/regression_models.py in fit(self, X, y, split_self, X_train, X_test, y_train, y_test, split_data, splitting, kf, fold, excel, return_best_model, show_train_score) 396 mae = mean_absolute_error(true, pred) 397 rmse = np.sqrt(mean_squared_error(true, pred)) --> 398 r2 = r2_score(true, pred, force_finite=True) 399 try: 400 rmsle = np.sqrt(mean_squared_log_error(true, pred))
TypeError: r2_score() got an unexpected keyword argument 'force_finite'
Can you give me some guidance on this?
-
Can you please provide me with the version of MultiTrain you're working with? The force_finite argument was removed in the latest version of MultiTrain(v0.13.11)
-
You would need to use the train.split() method to split your dataset, only then can you assign split_data = split and splitting = True.
import pandas as pd
from MultiTrain import MultiClassifier
train = MultiClassifier()
df = pd.read_csv('file.csv')
features = df.drop("nameOflabelcolumn", axis = 1)
labels = df["nameOflabelcolumn"]
split = train.split(X=features,
y=labels,
sizeOfTest=0.3,
randomState=42,
strat=True,
shuffle_data=True)
fit = train.fit(splitting=True,
split_data=split)
- If you used the train_test_split method directly from sklearn, you would have to set each variable to their corresponding arguments. E.g
import pandas as pd
from sklearn.model_selection import train_test_split
from MultiTrain import MultiClassifier
train = MultiClassifier()
df = pd.read_csv('filename.csv')
features = df.drop('labelName', axis=1)
labels = df['labelName']
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
fit = train.fit(X_train=X_train,
X_test=X_test,
y_train=y_train,
y_test=y_test,
split_self=True, #always set this to true if you used the traditional train_test_split
)
I installed it like this:
- https://github.com/LOVE-DOCTOR/MultiTrain#installation !pip install MultiTrain
- Successfully installed MultiTrain-0.1.30 catboost-1.1 pyaml-21.10.1 scikit-optimize-0.9.0
And after that I did:
- If you experience issues or come across a bug while using MultiTrain,
- make sure to update to the latest version with !pip install --upgrade MultiTrain
Now I am not sure which version I have at this point
Check your version of MultiTrain by doing this
import MultiTrain
print(MultiTrain.__version__)
Ensure that your version is 0.13.11
If it's lower
pip install MultiTrain==0.13.11
If that doesn't work, please provide me with your os, Python version, MultiTrain version.
@jbdatascience
Have you been able to fix this error?