ReferenceTests.jl icon indicating copy to clipboard operation
ReferenceTests.jl copied to clipboard

Clarification on psnr_equality(threshold)

Open juliohm opened this issue 4 years ago • 11 comments

From the docstring it seems that the threshold t has a counter-intuitive definition. It is not the usual tolerance (a - b) <= t, but instead a measure of agreement (a - b) >= t?

The higher the threshold the more strict is the test? Could you please confirm?

juliohm avatar Feb 09 '21 15:02 juliohm

Correct, this psnr_equality(t) serves as an alternative to ; the closer the two images are, the lower ||a - b|| is, and the higher the PSNR is.

johnnychen94 avatar Feb 10 '21 05:02 johnnychen94

Perhaps you could consider flipping the definition? It would be more consistent with the Base approach (e.g. isapprox) where equality and thresholds have a standard meaning. The word threshold for me always refers to a form of tolerance in a comparison.

In either case, do you have a general rule of thumb for the actual value used in the threshold? I see that the default value is 25.

juliohm avatar Feb 10 '21 10:02 juliohm

Perhaps you could consider flipping the definition?

Higher PSNR means lower tolerance, how could we flip the definition? We need a true to indicate a test pass here. If you need a custom reference equality test metric, it's very easy to define one, see ImageBinarization for an example.

In either case, do you have a general rule of thumb for the actual value used in the threshold? I see that the default value is 25.

No, I don't have one, it's chosen by experiences to replace the previous test_approx_eq_sigma_eps macro used in Images.jl

johnnychen94 avatar Feb 10 '21 10:02 johnnychen94

Feel free to open PR(s) to add more "equality" test functors by using metrics from Distances.jl if you're not satisfied with the current PSNR equality.

johnnychen94 avatar Feb 10 '21 10:02 johnnychen94

Higher PSNR means lower tolerance, how could we flip the definition?

I think the issue is more about the naming? When I read psnr_equality(threshold) I immediately mix the notion of peak-signal-to-noise-ratio and equality, and consequently I think of threshold as something that is an upper bound, not a lower bound, makes sense?

Going even further, I think in most practical cases no one will want to customize the metrics of comparison assuming that someone in ReferenceTests.jl has thought about this issue carefully and will provide a good metric by default. I would expect a more intuitive option like:

@test_reference "foo.png" img tol=1e-2

juliohm avatar Feb 10 '21 10:02 juliohm

Would it be reasonable to add an option tol besides by with a canonical threshold semantic? I dont usually need to customize the metric, but I need to customize the tolerance of the test.

juliohm avatar Feb 10 '21 10:02 juliohm

A missing euclidean_equality(threshold) is all you need, I guess.

johnnychen94 avatar Feb 10 '21 10:02 johnnychen94

There's a plan to deprecate the current API @test_reference in favor of match_referece which returns a Bool value.

- @test_reference "foo.png" img
+ @test match_reference("foo.png", img)

so I don't want to add more keywords to the current API. I will definitely consider a better default equality option for image types when it's in progress.

johnnychen94 avatar Feb 10 '21 10:02 johnnychen94

I think the issue is more profound. It is about considering a consistent and simple-to-use approach to comparing two objects according to a tolerance. Somewhat a generalization of isapprox where people in ReferenceTests.jl spent time figuring out a good default for the behavior of each type. If PSNR is a good default for images or Euclidean distance, that is fine. Do you think we can always express the comparison in terms of (a-b) < t ? For example, in the PSNR case I would simply flip the sign of the inequality by multiplying the result of PSNR by minus one (a-b) >= t -----> -(a-b) <= -t This way both Euclidean and PSNR follow the same notion of "tolerance" as an upper bound.

juliohm avatar Feb 10 '21 10:02 juliohm

I think the point of this issue is to say that the threshold may confuse new users of the package. I got confused coming from VisualRegressionTests.jl for example.

juliohm avatar Feb 10 '21 10:02 juliohm

I'm not standing in the way of adding an euclidean_equality or whatever equality to replace the default psnr_equality here for image types, as long as it makes sense.

Unlike VisualRegressionTests.jl, ReferenceTests.jl does not provide a strong assumption on the input type and reference type, and this is why we expose the by keyword instead of tol. This package is designed for package developers so flexibility and faithfulness are more important than convenience. Said that, I don't think @test_reference "foo.png" img tol=1e-2 is more convenient than @test_reference "foo.png" img by=euclidean_equality(1e-2), except that it's two words less.

I'll update the docstring of psnr_equality to clarify the meaning of tolerance. (#75)

johnnychen94 avatar Feb 10 '21 11:02 johnnychen94

Thank you for clarifying all issues @johnnychen94. Closing the discussion.

juliohm avatar Jul 23 '23 02:07 juliohm