"ValueError: shape mismatch" when running difference test after setting own priors in BetaBinomialx

Open samlinderoth opened this issue 1 year ago • 0 comments

Hi,

running spotify_confidence in ipynb, the following happens when running this line of code:

# Set df
df = import.groupby(['exp_var']).agg({'participants': 'count',
                                          k: 'sum'}).reset_index()

# Set priors
df['prior_alpha'] = 10000
df['prior_beta'] = 10000

## df:
# | index | exp_var | participants |  k      | prior_alpha | prior_beta
# 0        | control | 12345 |   1234  |  10000 | 10000
# 1         | intervention | 54321 |   4321 |  10000 | 10000


# Test using confidence library
test = spotify_confidence.BetaBinomial(data_frame=df,
                        numerator_column=k,
                        denominator_column='participants',
                        categorical_group_columns=['exp_var'],
                        prior_alpha_column = 'prior_alpha',
                        prior_beta_column = 'prior_beta')

# Result
results = test.difference('control', 'intervention', absolute=False)

raises error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[***](***) in <cell line: 22>()
     20 
     21 # Result
---> 22 kpi_results = test.difference('control', 'intervention', absolute=False)

7 frames
[) in _sample_posterior(self, group_df, posterior_sample_size)
    120             posterior_sample_size = self._monte_carlo_sample_size
    121         posterior_alpha, posterior_beta = self._posterior_parameters(group_df)
--> 122         posterior_samples = np.random.beta(posterior_alpha, posterior_beta, size=posterior_sample_size)
    123         return posterior_samples
    124 

numpy/random/mtrand.pyx in numpy.random.mtrand.RandomState.beta()

_common.pyx in numpy.random._common.cont()

_common.pyx in numpy.random._common.cont_broadcast_2()

__init__.cython-30.pxd in numpy.PyArray_MultiIterNew3()

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (500000,) and arg 1 with shape (2,).

From what I gather, this is due to the way the priors are handled in the BetaBinomial class:

if prior_alpha_column is None or prior_beta_column is None:
            self._alpha_prior, self._beta_prior = (0.5, 0.5)
        else:
            self._alpha_prior = data_frame[prior_alpha_column]
            self._beta_prior = data_frame[prior_beta_column]

When the difference test is run, the whole column is applied rather than the individual values for the priors in the rows of the column. This may be mitigated by setting self._alpha_prior = data_frame[prior_alpha_column][0] or a similar solution.

Also, one may want to consider a setup such that the user applies a prior only to alpha or beta, whereas the other parameter will be defaulted. Currently, if a prior is applied to only alpha or only beta, the function will retort to the default for both parameters without the user being informed of this.

Last and least, the sampling procedure in _sample_posterior is not MCMC as the comments suggest. It's Monte Carlo, but it's not Markov Chain Monte Carlo. Which is actually a good thing in this setting, since the posterior is known and we thus do not need to waste computational time on MCMC.

Best regards and many thanks in advance! Sam

Sep 16 '24 09:09 samlinderoth