Inference with missing Data
Hi!
I am using a model to make inference about some data that have missing in order to predict the missings and to be able to complete them with the previously created model. But I am getting some glitches:

As you can see, I make a loop in which at the end I am preparing the variables and evidences and then using the model to use it ...:

This is the error I get, I think it is because I am using a tan in which I predefine the class_node ...
Could you help me please?
Thanks!
Pablo
If you can post an example where you reproduce the error, I can check whether this has valueError has something to do with bnlearn.
I believe I am having a similar issue. This happens whenever I call inference.fit and include a variable that has more than two levels. Below is a snippet of the error output that is relevant to bnlearn:
~\.conda\envs\bayes\lib\site-packages\bnlearn\inference.py in fit(model, variables, evidence, to_df, verbose)
92 query = model_infer.query(variables=variables, evidence=evidence, show_progress=(verbose>0))
93 # Store also in dataframe
---> 94 query.df = bnlearn.query2df(query) if to_df else None
95 if verbose>=3: print(query)
96 # Return
~\.conda\envs\bayes\lib\site-packages\bnlearn\bnlearn.py in query2df(query)
417 """
418 df = pd.DataFrame(data = list(itertools.product([0, 1], repeat=len(query.variables))), columns=query.variables)
--> 419 df['p'] = query.values.flatten()
420 return df
421
And the resulting error: ValueError: Length of values (4) does not match length of index (2) (in this case I passed one variable that had 4 levels).
I believe the issue is that when the dataframe is created in query2df, it assumes that all variables have exactly 2 levels (hence doing the product of just [0, 1] n times, where n is the number of variables).
The length of df.index is 2^n , but the length of the flattened query values is (correctly) the product of c_1, c_2, ..., c_n, where c_i is the cardinality of the i-th variable.
EDIT: A dataframe with an index of the correct length can be created in query2df in the following way:
df = pd.DataFrame(data = list(itertools.product(*[range(c) for c in query.cardinality])), columns=query.variables)
However, I am not 100% certain that the level combinations are created in the same order that the query values are returned...
which version are u using?
import bnlearn
bnlearn.__version__
Your fix also seems to bring a solution. In an earlier version I incorporated this fix:
df = pd.DataFrame(data = list(itertools.product(np.arange(0, len(query.values)), repeat=len(query.variables))), columns=query.variables)
Let me know if it still causes issues. Otherwise, I will close this issue.
reopen if still causes issues.
Hi @erdogant, I am facing the same issue with the latest bnlearn version 0.7.3. When trying to infer more than one variable when the variables are multiclass Instead of my personal code I tested it with the titanic dataset and still facing the same issue:
# Load example DataFrame
df_as = bn.import_example('titanic')
dfhot, dfnum = bn.df2onehot(df_as)
# Train model
model_as = bn.structure_learning.fit(dfnum, methodtype='hc', scoretype='bic')
model_as_p = bn.parameter_learning.fit(model_as, dfnum, methodtype='bayes')
# Do the inference
query = bn.inference.fit(model_as_p, variables=['Sex', 'Parch'], evidence={'Survived':0, 'Pclass':1})
Error Message:
ValueError Traceback (most recent call last)
Input In [42], in <cell line: 8>()
6 model_as_p = bn.parameter_learning.fit(model_as, dfnum, methodtype='bayes')
7 # Do the inference
----> 8 query = bn.inference.fit(model_as_p, variables=['Sex', 'Parch'], evidence={'Survived':0, 'Pclass':1})
File ~/anaconda3/envs/env_bnlearn/lib/python3.8/site-packages/bnlearn/inference.py:98, in fit(model, variables, evidence, to_df, verbose)
96 query = model_infer.query(variables=variables, evidence=evidence, show_progress=(verbose>0))
97 # Store also in dataframe
---> 98 query.df = bnlearn.query2df(query, variables=variables) if to_df else None
99 # Print table to screen
100 if verbose>=3: print(tabulate(query.df.head(), tablefmt="grid", headers="keys"))
ValueError: Length of values (8) does not match length of index (4)
Thanks for reporting this again! I fixed it with a different solution where I manually walk through the object and extract the necessary information for the dataframe. It should be fixed now.
Update with:
pip install -U bnlearn
Version should be >= v0.7.5
Closing this one. Reopen if still causes issues.