support for categorical features and missing values in LGBM #28

paulperry · 2019-11-28T13:02:18Z

In loading a model I get:

rankeval_lgb_model = RTEnsemble('lgb.model', name="LightGBM model", format="LightGBM")
[...]
AssertionError: Decision Tree not supported. RankEval does not support categorical features and missing values.

Is there a way to work around this? Or will there be support for LGBM cat features and missing values?

The text was updated successfully, but these errors were encountered:

strani · 2019-12-02T16:35:32Z

This error could be raised by two conditions:

your dataset has missing values and/or categorical features
you are using an old version of the LightGBM library (at some point in time, they slightly changed the file format of the model and this modification broke the reader. I fixed it by being compliant with the new version, but the old one is now no more working).

If you are in the second case, it should be enough to update LightGBM to the new version. If you are in the first case, right now rankeval does not support missing values and categorical features and I'm not sure when this features will be added. Indeed both features are framework dependant while rankeval whould be agnostic regardless the framework adopted.

anuragreddygv323 · 2019-12-02T22:40:54Z

If I fill the missing values with '-999' WILL IT WORK ?

also, regarding categorical variables, LightGBM has support for categorical variables...why does rank_eval not have support for categorical variables?

paulperry · 2019-12-02T23:24:37Z

I'm using lightgbm 2.3.0 . I'm in the first case and have missing values and cat data. As @anuragreddygv323 asks, can I transform my input to numerical and still have a reasonable comparison?

strani · 2019-12-03T09:18:28Z

Missing values are tackled differently from fixed values from machine learning algorithms, especially from the ones using decision trees. So to answer your question, if you modify the dataset by removing missing values, you need also the refit the model accordingly. However, the performance of the final model could be negatively affected by doing that. Transforming categorical data is on the other hand impossible, since categories are discrete and unordered while traditional features are continuous and they impose an ordering.

Regarding categorical variables, the reason it is still not supported by rankeval is because standard learning to rank datasets do not have this kind of variables (WEB30K, Istella, Yahoo). But we could start thinking about introducing this feature in rankeval soon or later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for categorical features and missing values in LGBM #28

support for categorical features and missing values in LGBM #28

paulperry commented Nov 28, 2019

strani commented Dec 2, 2019

anuragreddygv323 commented Dec 2, 2019

paulperry commented Dec 2, 2019 •

edited

Loading

strani commented Dec 3, 2019

support for categorical features and missing values in LGBM #28

support for categorical features and missing values in LGBM #28

Comments

paulperry commented Nov 28, 2019

strani commented Dec 2, 2019

anuragreddygv323 commented Dec 2, 2019

paulperry commented Dec 2, 2019 • edited Loading

strani commented Dec 3, 2019

paulperry commented Dec 2, 2019 •

edited

Loading