MaterialX Improve render test compare performance

Proposed Changes

Use OpenCV to to compare performance. It's about 3x or more faster based on local tests of OSL vs GLSL RTS. ( ~2.3 vs 7.5 seconds ).

OpenCV is C++ based so is generally faster. Also addition utilities could be added as desired using this library.

Feb 11 '25 16:02 kwokcb

Does fellow ASWF project OpenImageIO perform poorly compared to those two?

Feb 11 '25 16:02 lgritz

That's a good question. I did a small experiment and OIIO is also about 3x slower for performing the compare at least. About 7 seconds so better than PIL + utils (slowest). This with the caveat I'm no OIIO or OpenCV expert :).

This can all be updated with different options depending on image size or format (e.g. OIIO can handle EXRs -- which is not the current output format as the RTS avoids dependencies on OpenEXR by default).

import OpenImageIO as oiio
import numpy as np
import os

def computeDiff(image1Path, image2Path, imageDiffPath):
    try:
        # Remove the diff image if it already exists
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)

        # Check if input images exist
        if not os.path.exists(image1Path):
            print("Image diff input missing: " + image1Path)
            return

        if not os.path.exists(image2Path):
            print("Image diff input missing: " + image2Path)
            return

        # Load images using OpenImageIO (no reshaping, directly as NumPy arrays)
        image1 = oiio.ImageInput.open(image1Path)
        image2 = oiio.ImageInput.open(image2Path)

        if not image1 or not image2:
            print("Failed to load images.")
            return

        # Read the image data in a single call (as float)
        pixels1 = image1.read_image(format=oiio.FLOAT)
        pixels2 = image2.read_image(format=oiio.FLOAT)

        image1.close()
        image2.close()

        # Directly access the image specifications
        spec1 = image1.spec()
        spec2 = image2.spec()

        # Ensure images have the same dimensions
        if spec1.width != spec2.width or spec1.height != spec2.height:
            print("Image dimensions do not match.")
            return

        # Slice the arrays to ignore the alpha channel (keep only RGB)
        pixels1_rgb = pixels1[:, :, :3]  # Take the first 3 channels (RGB)
        pixels2_rgb = pixels2[:, :, :3]  # Take the first 3 channels (RGB)

        # Compute the absolute difference (using vectorized NumPy operations)
        diff = np.abs(pixels1_rgb - pixels2_rgb)

        # Save the difference image
        out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, oiio.FLOAT)
        out_image = oiio.ImageOutput.create(imageDiffPath)
        if out_image:
            out_image.open(imageDiffPath, out_spec)
            out_image.write_image(diff)
            out_image.close()
        else:
            print("Failed to write difference image.")

        # Compute normalized RMS error 
        normalized_rms = np.sqrt(np.mean(diff ** 2)) / (3.0 * 1.0)

        return normalized_rms

    except Exception as e:
        print(f"Error during image comparison: {e}")
        return

Feb 11 '25 17:02 kwokcb

I'll try a couple experiments on my end to see where the time is going. What's the resolution, format, and number of channels of the images you're comparing?

Feb 11 '25 23:02 lgritz

@kwokcb @lgritz From my perspective, as long as OpenImageIO generates the same results as PIL, it's purely a bonus if it's also faster, since we'd be replacing an external dependency with an ASWF dependency!

Maybe other teams would feel differently, but after spending over an hour to render out reference-quality GLSL and OSL images, the additional 7.5 seconds to generate difference images isn't at the top of my mind. :D

Feb 12 '25 00:02 jstone-lucasfilm

Are there a couple of these images I can try on my end (without having to build all of MaterialX and run a full testsuite) that I can use as a proxy to investigate performance? I find it hard to believe I can't speed it up further if I have a concrete example.

Feb 12 '25 02:02 lgritz

@jstone-lucasfilm, @lgritz

I can replace the OpenCV code with OIIO and it can be reviewed here.
As for tests, I can place a sample here of a few pairs of files and the number of pairs tested.

Side notes:

I'm guessing the number of comparison pairs is a important factor esp if you run the full RTS vs the default subset.
4 channel RGB for GLSL vs 3 channel RGB for OSL could affect performance (minimally it's inconsistent).

Feb 12 '25 14:02 kwokcb

Here are 2 pairs. The subset has 121 pairs to compare. standard_surface_chess_set.zip

Feb 12 '25 14:02 kwokcb

Replace OpenCV with OIIO.

Leaving OpenCV implementation for comparison / posterity

def computeDiff(image1Path, image2Path, imageDiffPath):
    try:
        # Remove the diff image if it already exists
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)

        # Check if input images exist
        if not os.path.exists(image1Path):
            print("Image diff input missing: " + image1Path)
            return

        if not os.path.exists(image2Path):
            print("Image diff input missing: " + image2Path)
            return

        # Load images using OpenCV
        image1 = cv2.imread(image1Path)
        image2 = cv2.imread(image2Path)

        # Ensure images were loaded successfully
        if image1 is None or image2 is None:
            print("Failed to load images.")
            return

        # Convert images to RGB (OpenCV loads images in BGR by default)
        image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2RGB)
        image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2RGB)

        # Compute the absolute difference between the two images
        diff = cv2.absdiff(image1, image2)

        # Save the difference image
        cv2.imwrite(imageDiffPath, cv2.cvtColor(diff, cv2.COLOR_RGB2BGR))

        # Compute normalized RMS error
        normalized_rms = np.sqrt(np.mean(diff**2))  / (3.0 * 255.0)

        return normalized_rms

    except Exception as e:
        # Clean up and print error message
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)
        print("Failed to create image diff between: " + image1Path + ", " + image2Path)
        print(f"Error: {e}")

Feb 12 '25 15:02 kwokcb

@kwokcb have you seen Flip? https://github.com/NVlabs/flip We use this for our image diffs using YardVFX RenderBenchmarks and Rez.

Feb 12 '25 20:02 fpliu

So you're reading in two PNG images? What are you writing? Also PNG?

Feb 12 '25 23:02 lgritz

Please don't merge this yet, I will have some important review comments...

Feb 13 '25 00:02 lgritz

OK, one part of why OIIO is around 3.5x slower is that you're inadvertently asking it to do a lot of extra copies and data conversions. The input files are 8 bit channels, they're staying that way in your OpenCV code (don't know about the original PIL, but I suspect so), but for OIIO, you were saying:

    pixels1 = image1.read_image(format=oiio.FLOAT)
    pixels2 = image2.read_image(format=oiio.FLOAT)

you could instead just let it stay in the native format:

    pixels1 = image1.read_image(format=oiio.UNKNOWN)
    pixels2 = image2.read_image(format=oiio.UNKNOWN)

and that saves some time. (Note, because now the values you pull into NumPy will be uint8, you need to change the normalization factor back to 3*255.0 like you did with OpenCV, not 3*1.0, if you catch my drift.)

If you comment out the image write entirely, just to see how the reads compare, after the above changes, OIIO reading from PNG is within 1.2x the speed of OpenCV. So most of the remaining difference is specifically the writing of PNG, which honestly is not something we've spent much time optimizing in OIIO land, as it is distinctly not a "VFX format" (don't get me started).

The first thing I spotted is that when you were writing, you said

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, oiio.FLOAT)

which is requesting that you write a float file, but PNG doesn't have float, so it will pick the next best thing, which is uint16, but again that is not apples-to-apples with the OpenCV or PIL code which is writing 8 bit PNG files. So instead I would just advise outputting the diff image in the same data type that you read from the original input:

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, spec1.format)

At this point, with the writing enabled again, OIIO takes 2.5x the time of OpenCV (versus ~3.5x when we started), and we know that most of that difference is the PNG write. Like I said, PNG write isn't the most optimized path we had, so just for fun, I changed the name of the output file to end in ".tif" instead of ".png". Can we write TIFF files (same 8 bit data) faster than PNG?

Boy howdy, can we!

Outputting .tif instead makes the OIIO version run in 0.94x the time that it takes OpenCV.

Is that just an inherent difference between TIFF and PNG, and now I'm hobbling OpenCV by having it write PNG while OIIO writes TIFF? Nope, change the OpenCV version of the script to output a TIFF file and OIIO's overall edge improves even more to 0.92x the time of OpenCV for all the same work.

YMMV on the exact timing. I'm using OIIO main (which doesn't currently differ in any important way from the supported 3.0 release) and running on a 2020 era 8 core Intel-based MacBookPro. I was reading your "Queen" sample files, and just doing your comparison subroutine 100 times in a loop.

Feb 13 '25 00:02 lgritz

But wait, there's more.

This may be more complicated than it's worth, but for these particular files, the GLSL ones anyway are 4-channel PNG files. But alpha = 1.0 everywhere. I happen to know that since PNG files are by specification unassociated alpha, and OIIO is automatically converting them to associated alpha on the way in, for the particular case of alpha=1.0 everywhere, that's a no-op, but OIIO's PNG reader has no way to know this ahead of time. I betcha that OpenCV doesn't know anything about associated vs unassociated alpha and isn't doing this extra conversion that OIIO is doing (needlessly, in this case).

So... there's a special trick to tell OIIO's PNG reader not to convert to associated alpha:

    configspec = oiio.ImageSpec()
    configspec.attribute("oiio:UnassociatedAlpha", 1)

    # Load images using OpenImageIO (no reshaping, directly as NumPy arrays)
    image1 = oiio.ImageInput.open(image1Path, config=configspec)
    image2 = oiio.ImageInput.open(image2Path, config=configspec)

And with this change, now OIIO does the whole read-diff-write cycle (to a TIFF output) in 0.86x the time it takes OpenCV to do the same.

So this full set of changes moves us from an original 3.5x slower to 15% faster. (Or, if we really insist on outputting PNG instead of TIFF, OIIO is still 2.5x slower until we actually crack open OIIO's PNG writing code and see what is wasteful there.)

Feb 13 '25 00:02 lgritz

One more! Back to PNG, but just checking if maybe we're not using the best compression choice. (I have no idea what OpenCV is using.) Instead of using the default compression, we use "pngfast", which is just a different speed setting on the underlying zlib calls,

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, spec1.format)
    out_spec.attribute("compression", "pngfast")

that closes the gap to 1.35x even when outputting PNG files. This is just further fooling around, I don't know if switching to this faster compression has any important change in the file sizes or anything else that will matter, and the degree to which it matters might depend on the content of the files. But hey, another set of tradeoffs to consider.

Feb 13 '25 01:02 lgritz

Thanks for the deep analysis, @lgritz!

For this use case, I'm guessing PNG was selected because the source and difference images are being organized into an HTML page for sharing/publication, and PNG is a lossless image format that browsers natively understand. We're not married to that choice, though, and any other lossless image format that browsers can handle would work in its place.

I'll pause on integrating this change until @kwokcb has had a chance to take a look at the optimizations that you've suggested, and my hope is that OpenImageIO can be the path forward for this workflow.

Feb 13 '25 03:02 jstone-lucasfilm

Yes, for web display and wide browser support, PNG makes total sense. It's a little weird as an image comparison format, simply because it's LDR, so the clipping of any values > 1 could mask differences between the rendered images if you have hot highlights or something like that.

Anyway, to summarize what I've found, you can get from about 3.5x longer than OpenCV to only 1.35x longer with the combination of:

Read as 8 bit, without needless conversion to float.
Write as 8 bit, without needlessly saving it as uint16 (via a request to save as float, which PNG can't even do).
Hint input to skip OIIO's usual automatic conversion to associated alpha -- but only if you think that's safe (circumstantial evidence: you weren't converting to associated with OpenCV or PIL, and these images are A=1.0 everywhere anyway, but I don't know if that's true of all your images).
Hint output to use the "faster" PNG compression -- but only if, after checking, you verify that it's not having a hugely negative effect on file size in practice.

If you're moving in this direction and 1.35x isn't close enough, I can also look at OIIO's PNG writer and see if there's some fat to squeeze out.

Feb 13 '25 05:02 lgritz

For comparison, as @jstone-lucasfilm notes it's not a huge difference so we can stick with OIIO to reduce non-ASWF dependencies as suggested and incorporate your changes @lgritz. (@fpliu I didn't try Flip and I'm guessing we won't add this dependency for this internal tool).

Random (2-cent) thoughts on topic separation:

For the image format, I think we can spin off a separate issue for pros / cons of format depending on workflow. From what I recall png was chosen as it's a common format (that is web friendly) and supported by the default image loader (stb_image). Also I think we were using fixed-point hardware buffers and image saves are still just buffer grabs (Metal/OpenGL). It will be interesting to see what is available in a GPU runner for CI as well. Probably "low hanging fruit" is to allow a codec option on captured / rendered images.
There has been a request make OIIO more accessible from the apps in the distribution as another issue. IMO we should rework this to be via the usage of image handlers dynamically registered via Python / Javascript (runtime vs compile time). We could look at a plugin compare as well but I don't think we want to over engineer this thing until we have more clarity on the direction of the RTS, and viewer / editor apps.

Feb 13 '25 15:02 kwokcb

There actually is somebody allegedly working on FLIP comparison within OIIO.

Feb 13 '25 18:02 lgritz