Support LightGBM categorical features
I was trying to convert a LightGBMRegressor model that has categorical features and I got the following error:
Exception has occurred: ValueError could not convert string to float: '0||1||2||3||4||10||13||14||21||22||23||25||27||29||30' File "../python3.7/site-packages/hummingbird/ml/operator_converters/_tree_implementations.py", line 227, in __init__
As it seems, you do not support LightGBM categorical features, because their thresholds are represented as and string (integers separated by ||)
Do you have plans to add support for this?
Edit:
I got the error message above when debugging. The actual error message I get when running the conversion is:
Unable to find converter for SklearnLGBMRegressor type <class 'NoneType'> with extra config: {}. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented. Please fill an issue at https://github.com/microsoft/hummingbird.
Perhaps the error message could also be improved, it wasn't really obvious what was going on without debugging.
Hi @klesouza, thanks for reporting this. Yes we don't support categorical values in lightgbm yet. +1 on the better reporting error. As a momentary workaround, once #244 is in, we can onehot encode the categorical features before lgbm. String support in pytorch is not that great, so adding native support of categorical features is not that straightforward.
Hi, I am trying to convert my LightGBM trained model using hummingbird but I am getting the same error (ValueError: could not convert string to float: '2||5||13||18'). Are the categorical features still not supported?
Version: hummingbird-ml = 0.0.4
Yes, sorry, categoricals still aren't implemented in LGBM as of 0.4.4.