hyppo
hyppo copied to clipboard
Index error in MGCX
My issue is about an IndexError that appears using MGCX.test(). This error is originally thrown by scipy multiscale_graphcorr (cfr stacktrace).
I'm very surprised this depends on the random number generation, i.e. it fails for some seeds but not all of them. Increasing the number of replications (reps) seems to increase the probability that an error occurs. Setting reps=1000 makes seed 16 fail as well.
The former actually makes me think I messed up somewhere, but I can't get where
Reproducing code example:
import sys
import pandas as pd
import numpy as np
from hyppo.time_series import MGCX
def test(seed):
print(f"Testing seed {seed}")
reps=100
df = pd.DataFrame([[1, 1],
[2, 1],
[3, 1],
[4, 4],
[5, 5],
[6, 6]], columns=["a", "b"])
i_test = MGCX()
rstate = np.random.RandomState(seed)
stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
print(f"stat: {stat}, pval: {pval}, d: {d}")
if __name__ == "__main__":
if len(sys.argv) > 1:
seed = int(sys.argv[1])
test(seed)
else:
test(16)
test(0)
Error message
Testing seed 16
stat: 0.886004262777708, pval: 0.0297029702970297, d: {'opt_lag': 0, 'opt_scale': [6, 4]}
Testing seed 0
Traceback (most recent call last):
File "/home/f/TRAVAIL/csod/misc/hyppo/problem.py", line 32, in <module>
AIL/csod/misc/hyppo/problem.py", line 22, in test
stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 194, in test
stat, pvalue, stat_list = super(MGCX, self).test(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 130, in test
Parallel(n_jobs=workers)(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
return output if self.return_generator else list(output)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 159, in _perm_stat
perm_stat = calc_stat(distx, permy)[0]
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 106, in statistic
stat, opt_lag = compute_stat(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/_utils.py", line 93, in compute_stat
indep_test_stat = indep_test.statistic(x, y)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/independence/mgc.py", line 161, in statistic
mgc = multiscale_graphcorr(distx, disty, compute_distance=None, reps=0)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6490, in multiscale_graphcorr
stat, stat_dict = _mgc_stat(x, y)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6541, in _mgc_stat
stat = stat_mgc_map[m - 1][n - 1]
IndexError: index 5 is out of bounds for axis 0 with size 1
Version information
- OS: Arch Linux 6.6.7-arch1-1 (64-bit)
- Python Version 3.10
- Package Version
hyppo==0.4.0,sci-py==1.11.4,joblib==1.3.2
Sorry for the late response, this just got on my radar right now. I'll take a look into this