Unit tests fail when setting svd_seed

Open daxpryce opened this issue 4 years ago • 0 comments

When adding the ability to set an svd seed to use when picking the randomized svd solver in select_svd, I ran across a number of unit tests that failed if we passed None into the randomized SVD.

By default, in sklearn, the randomized svd sets random_state=0 (vs. None). When I passed in None as a value, tests broke in the following places (actually, tests broke everywhere, and most I was able to fix, but a few require a closer look by an SME who isn't going to break more things than they fix by trying to fix it).

The proper thing to do is fix all of our unit tests so that they properly test things and don't rely on random_state=0 as the specified seed, but it's not clear exactly how we should fix it in the following cases.

Results of commenting out line 303 in svd.py svd_seed = svd_seed if svd_seed is not None else 0:

============================================================================= FAILURES ==============================================================================
_________________________________________________________________________ TestGMP.test_sim __________________________________________________________________________

self = <tests.test_match.TestGMP object at 0x1448ff640>

    def test_sim(self):
        n = 150
        rho = 0.9
        n_per_block = int(n / 3)
        n_blocks = 3
        block_members = np.array(n_blocks * [n_per_block])
        block_probs = np.array(
            [[0.2, 0.01, 0.01], [0.01, 0.1, 0.01], [0.01, 0.01, 0.2]]
        )
        directed = False
        loops = False
        A1, A2 = sbm_corr(
            block_members, block_probs, rho, directed=directed, loops=loops
        )
        ase = AdjacencySpectralEmbed(n_components=3, algorithm="truncated")
        x1 = ase.fit_transform(A1)
        x2 = ase.fit_transform(A2)
        xh1 = SignFlips().fit_transform(x1, x2)
        S = xh1 @ x2.T
        res = self.barygm.fit(A1, A2, S=S)

>       assert 0.7 <= (sum(res.perm_inds_ == np.arange(n)) / n)
E       assert 0.7 <= (78 / 150)
E        +  where 78 = sum(array([ 86,  ...47, 148, 149]) == array([  0,  ...47, 148, 149])
E           Use -v to get the full diff)

tests/test_match.py:251: AssertionError
___________________________________________________________________ TestDCSBM.test_DCSBM_nparams ____________________________________________________________________

self = <tests.test_models.TestDCSBM object at 0x131e2c6d0>

    def test_DCSBM_nparams(self):
        n_verts = 3000
        n_class = 4
        graph = self.g
        labels = self.labels
        e = DCSBMEstimator(directed=True)
        e.fit(graph)
>       assert e._n_parameters() == (n_verts + n_class - 1 + n_class ** 2)
E       assert 3071 == (((3000 + 4) - 1) + (4 ** 2))
E        +  where 3071 = <bound method DCSBMEstimator._n_parameters of DCSBMEstimator()>()
E        +    where <bound method DCSBMEstimator._n_parameters of DCSBMEstimator()> = DCSBMEstimator()._n_parameters

tests/test_models.py:422: AssertionError

__________________________________________________ TestAdjacencySpectralEmbed.test_transform_closeto_fit_transform __________________________________________________

self = <tests.test_spectral_embed.TestAdjacencySpectralEmbed testMethod=test_transform_closeto_fit_transform>

    def test_transform_closeto_fit_transform(self):
        atol = 0.15
        for diag_aug in [True, False]:
            for g, A in self.testgraphs.items():
                ase = AdjacencySpectralEmbed(
                    n_components=2, diag_aug=diag_aug, svd_seed=9001
                )
                ase.fit(A)
                Y = ase.fit_transform(A)
                if isinstance(Y, np.ndarray):
                    X = ase.transform(A)
                    np.testing.assert_allclose(X, Y, atol=atol)
                elif isinstance(Y, tuple):
                    with self.assertRaises(TypeError):
                        X = ase.transform(A)
                    X = ase.transform((A.T, A))
                    np.testing.assert_allclose(X[0], Y[0], atol=atol)
>                   np.testing.assert_allclose(X[1], Y[1], atol=atol)
E                   AssertionError:
E                   Not equal to tolerance rtol=1e-07, atol=0.15
E
E                   Mismatched elements: 1 / 40 (2.5%)
E                   Max absolute difference: 0.18669627
E                   Max relative difference: 5.41864066
E                    x: array([[ 1.523434, -0.9013  ],
E                          [ 1.624558, -0.937225],
E                          [ 0.753984, -0.933442],...
E                    y: array([[ 1.586404, -0.957384],
E                          [ 1.711405, -0.99416 ],
E                          [ 0.791389, -0.990951],...

tests/test_spectral_embed.py:178: AssertionError
=======================================================================

Originally posted by @daxpryce in https://github.com/microsoft/graspologic/pull/814#discussion_r679533656

Aug 02 '21 21:08 daxpryce