xgboost SparkXGBClassifier: Update params getting ignored

Hi folks. Is there a way to update a SparkXGBClassifier? Some of the other APIs have process_type, updater, and refresh_leaf params but I don't see those on the pyspark API. And if I set those params (along with xgb_model) they get ignored and added to arbitrary_params_dict.

"arbitrary_params_dict: arbitrary_params_dict This parameter holds all of the additional parameters which are not exposed as the the XGBoost Spark estimator params but can be recognized by underlying XGBoost library. It is stored as a dictionary. (default: {}, current: {'process_type': 'update', 'updater': 'refresh', 'refresh_leaf': True})"

Sep 13 '23 21:09 moygit

I put this into 2.1 roadmap, hopefully we can drop the experimental tag for pyspark.

Sep 14 '23 16:09 trivialfis

Thanks @trivialfis. So right now there's no way to get SparkXGBClassifier to use these params?

Sep 14 '23 18:09 moygit

I think you can use it:

        clf = SparkXGBClassifier(
            max_depth=10,
            n_estimators=2,
            xgb_model=init_booster,
            process="update",
            updater="refresh",
        )

Sep 14 '23 20:09 trivialfis

Sorry, been traveling.

As I said above, I used the following: 'process_type': 'update', 'updater': 'refresh', 'refresh_leaf': True. And all three got pushed into arbitrary_params_dict. (I also passed in the original xgb_model, of course; that param was recognized.) Is there something else I should be doing here?

Sep 25 '23 12:09 moygit

It's fine to push the parameters to that dictionary, it's just a Python hack to not state every parameter for every interface.

Oct 12 '23 03:10 trivialfis

Oh, I see. So even though those params are going into arbitrary_params_dict xgboost still recognizes and uses the params?

Oct 12 '23 13:10 moygit

Apologies for missing the reply, yes, it's recognized.

Jan 04 '24 09:01 trivialfis