catboost Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>

Problem:

When running the get_feature_importance, it fails with the following error.

CatBoostError: /src/catboost/catboost/private/libs/algo/features_data_helpers.h:118: Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>

Any idea what maybe creating this issue? here is the code that I am running.

from catboost import Pool
train_pool = Pool('quantized://' + '/local_disk0/' + 'train_pool_1.pool')
feature_importance = model_c.get_feature_importance(train_pool)
feature_importance

catboost version: 1.2.7

Sep 11 '24 04:09 timpiperseek

okay it seems to be an issue with the quantized pool. If I recreate the pool it seems to work. for example this works.

test_pool = Pool(
    data=X_test_pd,
    label=y_test,
    group_id=qid_test
    ,cat_features = category_index
) 

feature_importance = model_c.get_feature_importance(test_pool)
feature_importance

Sep 11 '24 04:09 timpiperseek

This is a bug when the quantized pool borders contain -/+ inf. Catboost predict cannot be called on the quantized training pool if the borders file contains inf for any feature value. If you recreate the pool and do not call quantize then catboost.predict works. If you call catboost.predict(trainpool_quantized) on the same pool that was used for and quantized pre training you will get this error.

Sep 17 '24 21:09 antipisa

Actually, it seems to happen even with quantized pools with finite borders. If you recreate the pool and do not quantize it works but that defeats the purpose of quantizing the dataframe to save memory.

Sep 18 '24 13:09 antipisa

train_pool = Pool(
    data=X_train,
    label=y_train,
    group_id=qid_train,
   cat_features = category_index
) 
train_pool.quantize(border_count=10, feature_border_type='UniformAndQuantiles', random_seed=1)
model = catboost.CatBoostClassifier()
model.fit(train_pool)
model.predict(train_pool)

Sep 18 '24 13:09 antipisa

I am also seeing this, opening borders.tsv I see lines like the following:

122	270
122	279.5
122	290.5
122	297.5
123	-3.402823466e+38	Min
125	1.5
126	0.5
126	1.5
126	2.5
126	3.5

Will attempt to find suspicious values in the data (like inf) and report back.

Nov 29 '24 23:11 8W9aG