Unit tests fail when setting svd_seed
When adding the ability to set an svd seed to use when picking the randomized svd solver in select_svd, I ran across a number of unit tests that failed if we passed None into the randomized SVD.
By default, in sklearn, the randomized svd sets random_state=0 (vs. None). When I passed in None as a value, tests broke in the following places (actually, tests broke everywhere, and most I was able to fix, but a few require a closer look by an SME who isn't going to break more things than they fix by trying to fix it).
The proper thing to do is fix all of our unit tests so that they properly test things and don't rely on random_state=0 as the specified seed, but it's not clear exactly how we should fix it in the following cases.
Results of commenting out line 303 in svd.py svd_seed = svd_seed if svd_seed is not None else 0:
============================================================================= FAILURES ==============================================================================
_________________________________________________________________________ TestGMP.test_sim __________________________________________________________________________
self = <tests.test_match.TestGMP object at 0x1448ff640>
def test_sim(self):
n = 150
rho = 0.9
n_per_block = int(n / 3)
n_blocks = 3
block_members = np.array(n_blocks * [n_per_block])
block_probs = np.array(
[[0.2, 0.01, 0.01], [0.01, 0.1, 0.01], [0.01, 0.01, 0.2]]
)
directed = False
loops = False
A1, A2 = sbm_corr(
block_members, block_probs, rho, directed=directed, loops=loops
)
ase = AdjacencySpectralEmbed(n_components=3, algorithm="truncated")
x1 = ase.fit_transform(A1)
x2 = ase.fit_transform(A2)
xh1 = SignFlips().fit_transform(x1, x2)
S = xh1 @ x2.T
res = self.barygm.fit(A1, A2, S=S)
> assert 0.7 <= (sum(res.perm_inds_ == np.arange(n)) / n)
E assert 0.7 <= (78 / 150)
E + where 78 = sum(array([ 86, ...47, 148, 149]) == array([ 0, ...47, 148, 149])
E Use -v to get the full diff)
tests/test_match.py:251: AssertionError
___________________________________________________________________ TestDCSBM.test_DCSBM_nparams ____________________________________________________________________
self = <tests.test_models.TestDCSBM object at 0x131e2c6d0>
def test_DCSBM_nparams(self):
n_verts = 3000
n_class = 4
graph = self.g
labels = self.labels
e = DCSBMEstimator(directed=True)
e.fit(graph)
> assert e._n_parameters() == (n_verts + n_class - 1 + n_class ** 2)
E assert 3071 == (((3000 + 4) - 1) + (4 ** 2))
E + where 3071 = <bound method DCSBMEstimator._n_parameters of DCSBMEstimator()>()
E + where <bound method DCSBMEstimator._n_parameters of DCSBMEstimator()> = DCSBMEstimator()._n_parameters
tests/test_models.py:422: AssertionError
__________________________________________________ TestAdjacencySpectralEmbed.test_transform_closeto_fit_transform __________________________________________________
self = <tests.test_spectral_embed.TestAdjacencySpectralEmbed testMethod=test_transform_closeto_fit_transform>
def test_transform_closeto_fit_transform(self):
atol = 0.15
for diag_aug in [True, False]:
for g, A in self.testgraphs.items():
ase = AdjacencySpectralEmbed(
n_components=2, diag_aug=diag_aug, svd_seed=9001
)
ase.fit(A)
Y = ase.fit_transform(A)
if isinstance(Y, np.ndarray):
X = ase.transform(A)
np.testing.assert_allclose(X, Y, atol=atol)
elif isinstance(Y, tuple):
with self.assertRaises(TypeError):
X = ase.transform(A)
X = ase.transform((A.T, A))
np.testing.assert_allclose(X[0], Y[0], atol=atol)
> np.testing.assert_allclose(X[1], Y[1], atol=atol)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0.15
E
E Mismatched elements: 1 / 40 (2.5%)
E Max absolute difference: 0.18669627
E Max relative difference: 5.41864066
E x: array([[ 1.523434, -0.9013 ],
E [ 1.624558, -0.937225],
E [ 0.753984, -0.933442],...
E y: array([[ 1.586404, -0.957384],
E [ 1.711405, -0.99416 ],
E [ 0.791389, -0.990951],...
tests/test_spectral_embed.py:178: AssertionError
=======================================================================
Originally posted by @daxpryce in https://github.com/microsoft/graspologic/pull/814#discussion_r679533656