catboost icon indicating copy to clipboard operation
catboost copied to clipboard

Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>

Open timpiperseek opened this issue 1 year ago • 4 comments

Problem:

When running the get_feature_importance, it fails with the following error.

CatBoostError: /src/catboost/catboost/private/libs/algo/features_data_helpers.h:118: Internal CatBoost Error (contact developers for assistance): Should be IDynamicBlockIteratorPtr<TCatValue>

Any idea what maybe creating this issue? here is the code that I am running.

from catboost import Pool
train_pool = Pool('quantized://' + '/local_disk0/' + 'train_pool_1.pool')
feature_importance = model_c.get_feature_importance(train_pool)
feature_importance

catboost version: 1.2.7

timpiperseek avatar Sep 11 '24 04:09 timpiperseek

okay it seems to be an issue with the quantized pool. If I recreate the pool it seems to work. for example this works.

test_pool = Pool(
    data=X_test_pd,
    label=y_test,
    group_id=qid_test
    ,cat_features = category_index
) 

feature_importance = model_c.get_feature_importance(test_pool)
feature_importance

timpiperseek avatar Sep 11 '24 04:09 timpiperseek

This is a bug when the quantized pool borders contain -/+ inf. Catboost predict cannot be called on the quantized training pool if the borders file contains inf for any feature value. If you recreate the pool and do not call quantize then catboost.predict works. If you call catboost.predict(trainpool_quantized) on the same pool that was used for and quantized pre training you will get this error.

antipisa avatar Sep 17 '24 21:09 antipisa

Actually, it seems to happen even with quantized pools with finite borders. If you recreate the pool and do not quantize it works but that defeats the purpose of quantizing the dataframe to save memory.

antipisa avatar Sep 18 '24 13:09 antipisa

train_pool = Pool(
    data=X_train,
    label=y_train,
    group_id=qid_train,
   cat_features = category_index
) 
train_pool.quantize(border_count=10, feature_border_type='UniformAndQuantiles', random_seed=1)
model = catboost.CatBoostClassifier()
model.fit(train_pool)
model.predict(train_pool)

antipisa avatar Sep 18 '24 13:09 antipisa

I am also seeing this, opening borders.tsv I see lines like the following:

122	270
122	279.5
122	290.5
122	297.5
123	-3.402823466e+38	Min
125	1.5
126	0.5
126	1.5
126	2.5
126	3.5

Will attempt to find suspicious values in the data (like inf) and report back.

8W9aG avatar Nov 29 '24 23:11 8W9aG