SparkXGBClassifier: Update params getting ignored
Hi folks. Is there a way to update a SparkXGBClassifier? Some of the other APIs have process_type, updater, and refresh_leaf params but I don't see those on the pyspark API. And if I set those params (along with xgb_model) they get ignored and added to arbitrary_params_dict.
"arbitrary_params_dict: arbitrary_params_dict This parameter holds all of the additional parameters which are not exposed as the the XGBoost Spark estimator params but can be recognized by underlying XGBoost library. It is stored as a dictionary. (default: {}, current: {'process_type': 'update', 'updater': 'refresh', 'refresh_leaf': True})"
I put this into 2.1 roadmap, hopefully we can drop the experimental tag for pyspark.
Thanks @trivialfis. So right now there's no way to get SparkXGBClassifier to use these params?
I think you can use it:
clf = SparkXGBClassifier(
max_depth=10,
n_estimators=2,
xgb_model=init_booster,
process="update",
updater="refresh",
)
Sorry, been traveling.
As I said above, I used the following: 'process_type': 'update', 'updater': 'refresh', 'refresh_leaf': True. And all three got pushed into arbitrary_params_dict. (I also passed in the original xgb_model, of course; that param was recognized.) Is there something else I should be doing here?
It's fine to push the parameters to that dictionary, it's just a Python hack to not state every parameter for every interface.
Oh, I see. So even though those params are going into arbitrary_params_dict xgboost still recognizes and uses the params?
Apologies for missing the reply, yes, it's recognized.