iris icon indicating copy to clipboard operation
iris copied to clipboard

pytest gallery tests

Open rcomer opened this issue 3 years ago • 16 comments

🚀 Pull Request

Description

Uses pytest parametrization to remove the duplication from the gallery tests. We now only need one test file instead of 24 🙂. To get it working, I pulled the check_graphic method off the GraphicsTest class so we can call it as a stand-alone function.

Needs https://github.com/SciTools/iris-test-data/pull/79

~In draft for now because~

  • ~Probably the context managers could be replaced with pytest fixtures.~ done
  • ~I have inserted some deliberate fails into two examples so we can see that the tests work as expected.~ reverted

Consult Iris pull request check list

rcomer avatar Jun 10 '22 15:06 rcomer

Relevant test output
[gw2] [  4%] FAILED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_hovmoller] 
[gw1] [  8%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_wind_barbs] 
[gw11] [ 12%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_polynomial_fit] 
[gw13] [ 16%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_lineplot_with_legend] 
[gw16] [ 20%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_anomaly_log_colouring] 
[gw20] [ 25%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_polar_stereo] 
[gw21] [ 29%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_cross_section] 
[gw8] [ 33%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_load_nemo] 
[gw6] [ 37%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_wind_speed] 
[gw0] [ 41%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_COP_1d] 
[gw23] [ 45%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_custom_aggregation] 
[gw18] [ 50%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_projections_and_annotations] 
[gw22] [ 54%] FAILED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_inset] 
[gw19] [ 58%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_SOI_filtering] 
[gw9] [ 62%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_atlantic_profiles] 
[gw15] [ 66%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_rotated_pole_mapping] 
[gw5] [ 70%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_TEC] 
[gw17] [ 75%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_global_map] 
[gw14] [ 79%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_custom_file_loading] 
[gw3] [ 83%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_COP_maps] 
[gw7] [ 87%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_deriving_phenomena] 
[gw10] [ 91%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_orca_projection] 
[gw12] [ 95%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_coriolis] 
[gw4] [100%] PASSED docs/gallery_tests/test_gallery_examples.py::test_plot_example[plot_lagged_ensemble] 
=================================== FAILURES ===================================
______________________ test_plot_example[plot_hovmoller] _______________________
[gw2] linux -- Python 3.8.13 /tmp/cirrus-ci-build/.nox/doctest/bin/python
example_code = 'plot_hovmoller'
    @pytest.mark.parametrize("example_code", gallery_examples())
    def test_plot_example(example_code):
        with fail_any_deprecation_warnings():
            with add_gallery_to_path():
                module = importlib.import_module(example_code)
                print(module.__file__)
            with show_replaced_by_check_graphic(
                f"gallery_tests.test_{example_code}"
            ):
>               module.main()
docs/gallery_tests/test_gallery_examples.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
docs/gallery_code/meteorology/plot_hovmoller.py:22: in main
    iris.util.approx_equal(3, 4)
lib/iris/util.py:413: in approx_equal
    warn_deprecated(wmsg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
msg = 'iris.util.approx_equal has been deprecated and will be removed, please use math.isclose instead.'
stacklevel = 2
    def warn_deprecated(msg, stacklevel=2):
        """
        Issue an Iris deprecation warning.
        Calls :func:`warnings.warn', to emit the message 'msg' as a
        :class:`warnings.warning`, of the subclass :class:`IrisDeprecationWarning`.
        The 'stacklevel' keyword is passed through to warnings.warn.  However by
        default this is set to 2, which ensures that the identified code line is in
        the caller, rather than in this routine.
        See :mod:`warnings` module documentation.
        For example::
            >>> from iris._deprecation import warn_deprecated
            >>> def arrgh():
            ...    warn_deprecated('"arrgh" is deprecated since version 3.5.')
            ...    return 1
            ...
            >>> arrgh()
            __main__:2: IrisDeprecation: "arrgh" is deprecated since version 3.5.
            1
            >>> arrgh()
            1
            >>>
        """
>       warnings.warn(msg, IrisDeprecation, stacklevel=stacklevel)
E       iris._deprecation.IrisDeprecation: iris.util.approx_equal has been deprecated and will be removed, please use math.isclose instead.
lib/iris/_deprecation.py:47: IrisDeprecation
----------------------------- Captured stdout call -----------------------------
/tmp/cirrus-ci-build/docs/gallery_code/meteorology/plot_hovmoller.py
________________________ test_plot_example[plot_inset] _________________________
[gw22] linux -- Python 3.8.13 /tmp/cirrus-ci-build/.nox/doctest/bin/python
example_code = 'plot_inset'
    @pytest.mark.parametrize("example_code", gallery_examples())
    def test_plot_example(example_code):
        with fail_any_deprecation_warnings():
            with add_gallery_to_path():
                module = importlib.import_module(example_code)
                print(module.__file__)
            with show_replaced_by_check_graphic(
                f"gallery_tests.test_{example_code}"
            ):
>               module.main()
docs/gallery_tests/test_gallery_examples.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
docs/gallery_code/general/plot_inset.py:65: in main
    qplt.show()
docs/gallery_tests/gallerytest_util.py:66: in replacement_show
    check_graphic(unique_id)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
unique_id = 'gallery_tests.test_plot_inset.0'
    def check_graphic(unique_id):
        """
        Check the hash of the current matplotlib figure matches the expected
        image hash for the current graphic test.
        To create missing image test results, set the IRIS_TEST_CREATE_MISSING
        environment variable before running the tests. This will result in new
        and appropriately "<hash>.png" image files being generated in the image
        output directory, and the imagerepo.json file being updated.
        """
        from PIL import Image
        import imagehash
        dev_mode = os.environ.get("IRIS_TEST_CREATE_MISSING")
        repo_fname = os.path.join(_RESULT_PATH, "imagerepo.json")
        with open(repo_fname, "rb") as fi:
            repo: Dict[str, List[str]] = json.load(codecs.getreader("utf-8")(fi))
        try:
            #: The path where the images generated by the tests should go.
            image_output_directory = os.path.join(
                os.path.dirname(__file__), "result_image_comparison"
            )
            if not os.access(image_output_directory, os.W_OK):
                if not os.access(os.getcwd(), os.W_OK):
                    raise IOError(
                        "Write access to a local disk is required "
                        "to run image tests.  Run the tests from a "
                        "current working directory you have write "
                        "access to to avoid this issue."
                    )
                else:
                    image_output_directory = os.path.join(
                        os.getcwd(), "iris_image_test_output"
                    )
            result_fname = os.path.join(
                image_output_directory, "result-" + unique_id + ".png"
            )
            if not os.path.isdir(image_output_directory):
                # Handle race-condition where the directories are
                # created sometime between the check above and the
                # creation attempt below.
                try:
                    os.makedirs(image_output_directory)
                except OSError as err:
                    # Don't care about "File exists"
                    if err.errno != 17:
                        raise
            def _create_missing():
                fname = "{}.png".format(phash)
                base_uri = (
                    "https://scitools.github.io/test-iris-imagehash/"
                    "images/v4/{}"
                )
                uri = base_uri.format(fname)
                hash_fname = os.path.join(image_output_directory, fname)
                uris = repo.setdefault(unique_id, [])
                uris.append(uri)
                print("Creating image file: {}".format(hash_fname))
                figure.savefig(hash_fname)
                msg = "Creating imagerepo entry: {} -> {}"
                print(msg.format(unique_id, uri))
                lock = filelock.FileLock(
                    os.path.join(_RESULT_PATH, "imagerepo.lock")
                )
                # The imagerepo.json file is a critical resource, so ensure
                # thread safe read/write behaviour via platform independent
                # file locking.
                with lock.acquire(timeout=600):
                    with open(repo_fname, "wb") as fo:
                        json.dump(
                            repo,
                            codecs.getwriter("utf-8")(fo),
                            indent=4,
                            sort_keys=True,
                        )
            # Calculate the test result perceptual image hash.
            buffer = io.BytesIO()
            figure = plt.gcf()
            figure.savefig(buffer, format="png")
            buffer.seek(0)
            phash = imagehash.phash(Image.open(buffer), hash_size=_HASH_SIZE)
            if unique_id not in repo:
                # The unique id might not be fully qualified, e.g.
                # expects iris.tests.test_quickplot.TestLabels.test_contour.0,
                # but got test_quickplot.TestLabels.test_contour.0
                # if we find single partial match from end of the key
                # then use that, else fall back to the unknown id state.
                matches = [key for key in repo if key.endswith(unique_id)]
                if len(matches) == 1:
                    unique_id = matches[0]
            if unique_id in repo:
                uris = repo[unique_id]
                # Extract the hex basename strings from the uris.
                hexes = [
                    os.path.splitext(os.path.basename(uri))[0] for uri in uris
                ]
                # Create the expected perceptual image hashes from the uris.
                to_hash = imagehash.hex_to_hash
                expected = [to_hash(uri_hex) for uri_hex in hexes]
                # Calculate hamming distance vector for the result hash.
                distances = [e - phash for e in expected]
                if np.all([hd > _HAMMING_DISTANCE for hd in distances]):
                    if dev_mode:
                        _create_missing()
                    else:
                        figure.savefig(result_fname)
                        msg = (
                            "Bad phash {} with hamming distance {} " "for test {}."
                        )
                        msg = msg.format(phash, distances, unique_id)
                        if _DISPLAY_FIGURES:
                            emsg = "Image comparison would have failed: {}"
                            print(emsg.format(msg))
                        else:
                            emsg = "Image comparison failed: {}"
>                           raise AssertionError(emsg.format(msg))
E                           AssertionError: Image comparison failed: Bad phash ebff6992b50096ad9267dac4d644943294924cdbc95d4b699d29952dcea46ad4 with hamming distance [20, 18, 6] for test gallery_tests.test_plot_inset.0.
lib/iris/tests/__init__.py:343: AssertionError
----------------------------- Captured stdout call -----------------------------
/tmp/cirrus-ci-build/docs/gallery_code/general/plot_inset.py

rcomer avatar Jun 10 '22 16:06 rcomer

~Note:~

  • ~this will not work so should either update or remove.~
  • ~this advice may need updating (currently the Makefile in docs/src does not have gallerytest).~

Done.

rcomer avatar Jun 10 '22 16:06 rcomer

~Possibly should have some version of the setUp and tearDown here.~

~[https://github.com/SciTools/iris/blob/4fa3f06caf2175b8654fdb4a2bcd83080b7a1418/lib/iris/tests/init.py#L1220-L1236]~

Done

rcomer avatar Jun 10 '22 16:06 rcomer

I pulled the check_graphic method off the GraphicsTest class so we can call it as a stand-alone function.

Huh. I see @wjbenfold already did that at #4759.

rcomer avatar Jun 10 '22 19:06 rcomer

I had a rethink on the test strategy: previously we created each figure and called check_graphic on it before creating the next figure. Current code creates all the figures for a particular example script and then loops through them to call check_graphic.

The new method is less efficient as we are holding more figures in memory and if the first one fails we have created the others unnecessarily. However, I think the new method is simpler/clearer (relevant commit is https://github.com/SciTools/iris/pull/4792/commits/63cdd66b9419b981284284b78bca3fa7841456ff).

rcomer avatar Jun 17 '22 10:06 rcomer

Ready for review I think. For examples of how the failure would look if an image changed or if functionality was deprecated, see the CI for the penultimate commit.

rcomer avatar Jun 17 '22 12:06 rcomer

@SciTools/iris-devs is this one likely to be reviewed before pillow is unpinned? I'm just wondering whether to fix the imagerep.json conflicts now or wait for it to change again.

rcomer avatar Jun 23 '22 10:06 rcomer

@SciTools/iris-devs is this one likely to be reviewed before pillow is unpinned? I'm just wondering whether to fix the imagerep.json conflicts now or wait for it to change again.

We're planning to unpin pillow pretty soon(tm), so I'd hang fire :)

wjbenfold avatar Jun 23 '22 12:06 wjbenfold

Rebased following #4759 and #4826. recreate_imagerepo.py came in very handy :sunglasses:

Needs https://github.com/SciTools/iris-test-data/pull/79

rcomer avatar Jun 28 '22 10:06 rcomer

Hi @wjbenfold great to “see” you again 😊

Thanks for the review. I didn’t see any errors when I checked locally but maybe I missed something. I suspect that #4871 needs prioritising so will wait for that to go in, rebase both branches, and check again.

rcomer avatar Jul 20 '22 21:07 rcomer

OK I checked and yes, I do get failures for those examples. They match the gallery test failures in #4846. So once I rebase and pick up #4847, all should be well.

I'm kinda surprised the plot_lagged_ensemble didn't fall over in the pp-loading though :confused:

rcomer avatar Jul 21 '22 08:07 rcomer

#4871 is abandoned for now, so I've gone ahead and rebased. Both gallery tests and main tests are passing for me locally. I've also added a whatsnew entry and bumped the IRIS_TEST_DATA_VERSION in anticipation of https://github.com/SciTools/iris-test-data/pull/79.

rcomer avatar Jul 29 '22 09:07 rcomer

@bjlittle you're assigned to this: is this on your list before holidays or should someone else take a look?

trexfeathers avatar Aug 10 '22 09:08 trexfeathers

When I run the tests I now get

lib/iris/__init__.py:104: in <module>
    from ._version import version as __version__  # noqa: F401
E   ModuleNotFoundError: No module named 'iris._version'

I'm sufficiently out of the loop to not understand what's broken here - I can't see a file named _version?

wjbenfold avatar Aug 11 '22 07:08 wjbenfold

When I run the tests I now get

lib/iris/__init__.py:104: in <module>
    from ._version import version as __version__  # noqa: F401
E   ModuleNotFoundError: No module named 'iris._version'

I'm sufficiently out of the loop to not understand what's broken here - I can't see a file named _version?

@wjbenfold this branch needs #4892

trexfeathers avatar Aug 11 '22 08:08 trexfeathers

Thanks chaps. I have rebased again. Gallery tests all pass for me locally. In the main tests I see two failures

  • the regridding one discussed at #4893
  • ~something to do with a dot file, which I think I've seen before and isn't generally reproducible.~ Never mind, I think I've found the cause of that.

So I think CI should pass once https://github.com/SciTools/iris-test-data/pull/79 is merged.

Edit: I hadn't registered the _version thing either!

rcomer avatar Aug 11 '22 08:08 rcomer

Rebased again. Tests all passing for me locally.

rcomer avatar Aug 12 '22 10:08 rcomer

Thanks @wjbenfold! Is there more to do here or can this be merged?

rcomer avatar Aug 13 '22 11:08 rcomer

Thanks @wjbenfold! Is there more to do here or can this be merged?

I was just checking through the conversations to check they were resolved. It's now in!

wjbenfold avatar Aug 13 '22 15:08 wjbenfold

Thanks @wjbenfold! :tada:

rcomer avatar Aug 13 '22 16:08 rcomer

Thanks so much @rcomer and @wjbenfold. We've been pretty thin on the ground during these summer weeks so your contribution is greatly appreciated 🥰

trexfeathers avatar Aug 15 '22 08:08 trexfeathers