Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>
Problem:
When running the get_feature_importance, it fails with the following error.
CatBoostError: /src/catboost/catboost/private/libs/algo/features_data_helpers.h:118: Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>
Any idea what maybe creating this issue? here is the code that I am running.
from catboost import Pool
train_pool = Pool('quantized://' + '/local_disk0/' + 'train_pool_1.pool')
feature_importance = model_c.get_feature_importance(train_pool)
feature_importance
catboost version: 1.2.7
okay it seems to be an issue with the quantized pool. If I recreate the pool it seems to work. for example this works.
test_pool = Pool(
data=X_test_pd,
label=y_test,
group_id=qid_test
,cat_features = category_index
)
feature_importance = model_c.get_feature_importance(test_pool)
feature_importance
This is a bug when the quantized pool borders contain -/+ inf. Catboost predict cannot be called on the quantized training pool if the borders file contains inf for any feature value. If you recreate the pool and do not call quantize then catboost.predict works. If you call catboost.predict(trainpool_quantized) on the same pool that was used for and quantized pre training you will get this error.
Actually, it seems to happen even with quantized pools with finite borders. If you recreate the pool and do not quantize it works but that defeats the purpose of quantizing the dataframe to save memory.
train_pool = Pool(
data=X_train,
label=y_train,
group_id=qid_train,
cat_features = category_index
)
train_pool.quantize(border_count=10, feature_border_type='UniformAndQuantiles', random_seed=1)
model = catboost.CatBoostClassifier()
model.fit(train_pool)
model.predict(train_pool)
I am also seeing this, opening borders.tsv I see lines like the following:
122 270
122 279.5
122 290.5
122 297.5
123 -3.402823466e+38 Min
125 1.5
126 0.5
126 1.5
126 2.5
126 3.5
Will attempt to find suspicious values in the data (like inf) and report back.