bnlearn icon indicating copy to clipboard operation
bnlearn copied to clipboard

Inference with missing Data

Open PARODBE opened this issue 4 years ago • 7 comments

Hi!

I am using a model to make inference about some data that have missing in order to predict the missings and to be able to complete them with the previously created model. But I am getting some glitches:

image

As you can see, I make a loop in which at the end I am preparing the variables and evidences and then using the model to use it ...:

image

This is the error I get, I think it is because I am using a tan in which I predefine the class_node ...

Could you help me please?

Thanks!

Pablo

PARODBE avatar Sep 16 '21 05:09 PARODBE

If you can post an example where you reproduce the error, I can check whether this has valueError has something to do with bnlearn.

erdogant avatar Sep 17 '21 20:09 erdogant

I believe I am having a similar issue. This happens whenever I call inference.fit and include a variable that has more than two levels. Below is a snippet of the error output that is relevant to bnlearn:

~\.conda\envs\bayes\lib\site-packages\bnlearn\inference.py in fit(model, variables, evidence, to_df, verbose)
     92     query = model_infer.query(variables=variables, evidence=evidence, show_progress=(verbose>0))
     93     # Store also in dataframe
---> 94     query.df = bnlearn.query2df(query) if to_df else None
     95     if verbose>=3: print(query)
     96     # Return

~\.conda\envs\bayes\lib\site-packages\bnlearn\bnlearn.py in query2df(query)
    417     """
    418     df = pd.DataFrame(data = list(itertools.product([0, 1], repeat=len(query.variables))), columns=query.variables)
--> 419     df['p'] = query.values.flatten()
    420     return df
    421 

And the resulting error: ValueError: Length of values (4) does not match length of index (2) (in this case I passed one variable that had 4 levels).

I believe the issue is that when the dataframe is created in query2df, it assumes that all variables have exactly 2 levels (hence doing the product of just [0, 1] n times, where n is the number of variables).

The length of df.index is 2^n , but the length of the flattened query values is (correctly) the product of c_1, c_2, ..., c_n, where c_i is the cardinality of the i-th variable.

EDIT: A dataframe with an index of the correct length can be created in query2df in the following way:

df = pd.DataFrame(data = list(itertools.product(*[range(c) for c in query.cardinality])), columns=query.variables)

However, I am not 100% certain that the level combinations are created in the same order that the query values are returned...

prevay avatar Nov 02 '21 14:11 prevay

which version are u using?

import bnlearn
bnlearn.__version__

erdogant avatar Nov 04 '21 06:11 erdogant

Your fix also seems to bring a solution. In an earlier version I incorporated this fix:

df = pd.DataFrame(data = list(itertools.product(np.arange(0, len(query.values)), repeat=len(query.variables))), columns=query.variables) Let me know if it still causes issues. Otherwise, I will close this issue.

erdogant avatar Nov 19 '21 08:11 erdogant

reopen if still causes issues.

erdogant avatar Dec 18 '21 16:12 erdogant

Hi @erdogant, I am facing the same issue with the latest bnlearn version 0.7.3. When trying to infer more than one variable when the variables are multiclass Instead of my personal code I tested it with the titanic dataset and still facing the same issue:

# Load example DataFrame
df_as = bn.import_example('titanic')
dfhot, dfnum = bn.df2onehot(df_as)
# Train model
model_as = bn.structure_learning.fit(dfnum, methodtype='hc', scoretype='bic')
model_as_p = bn.parameter_learning.fit(model_as, dfnum, methodtype='bayes')
# Do the inference
query = bn.inference.fit(model_as_p, variables=['Sex', 'Parch'], evidence={'Survived':0, 'Pclass':1})

Error Message:

ValueError                                Traceback (most recent call last)
Input In [42], in <cell line: 8>()
      6 model_as_p = bn.parameter_learning.fit(model_as, dfnum, methodtype='bayes')
      7 # Do the inference
----> 8 query = bn.inference.fit(model_as_p, variables=['Sex', 'Parch'], evidence={'Survived':0, 'Pclass':1})

File ~/anaconda3/envs/env_bnlearn/lib/python3.8/site-packages/bnlearn/inference.py:98, in fit(model, variables, evidence, to_df, verbose)
     96 query = model_infer.query(variables=variables, evidence=evidence, show_progress=(verbose>0))
     97 # Store also in dataframe
---> 98 query.df = bnlearn.query2df(query, variables=variables) if to_df else None
     99 # Print table to screen
    100 if verbose>=3: print(tabulate(query.df.head(), tablefmt="grid", headers="keys"))

ValueError: Length of values (8) does not match length of index (4)

ahmadrana35 avatar Aug 02 '22 10:08 ahmadrana35

Thanks for reporting this again! I fixed it with a different solution where I manually walk through the object and extract the necessary information for the dataframe. It should be fixed now.

Update with: pip install -U bnlearn

Version should be >= v0.7.5

erdogant avatar Aug 05 '22 17:08 erdogant

Closing this one. Reopen if still causes issues.

erdogant avatar Aug 27 '22 18:08 erdogant