masktools Replace mt_merge blending formula

As mentioned here, mt_merge uses a slightly incorrect formula. Test script:

src = mt_lutspa(expr="x 255 *")
mask = mt_lut(y=-255)
mt_merge(mt_lut(y=-0), src, mask)
mt_lutxy(last, src, "x y - abs 75 *").grayscale()

While output clip must be all zero, it is not.

Jun 23 '13 12:06 tp7

Vapoursynth's formula

dstp[x] = srcp1[x] + (((srcp2[x] - srcp1[x]) * (maskp[x] > 2 ? maskp[x] + 1 : maskp[x]) + 128) >> 8);

seems to be quite hard if at all possible to get correct in SIMD using 2 bytes/pixel. At this point I'm tempted to say that masktool's approximation is reasonable.

Jun 29 '13 03:06 tp7

It's the same old formula, but with mask parameter with 0, 1, 2, 4, 5, 6, ..., 256 instead of 0 - 255.

Oct 19 '13 14:10 innocenat

__forceinline static __m128i overlay_blend_sse2_core(const __m128i& p1, const __m128i& p2, const __m128i& mask, const __m128i& v128, const __m128i& v257) {
  __m128i tmp1 = _mm_mullo_epi16(_mm_sub_epi16(p2, p1), mask);
  __m128i tmp2 = _mm_mulhi_epu16(_mm_add_epi16(tmp1, v128), v257);
  return _mm_add_epi16(p1, tmp2);
}

Just a note on reasonably correct implementation. It passes test above but I did not test any more than that.

The idea is that divide by 255 can be done by multiply by 2^16/255 and shift right by 16, hence mulhi_epu16(x, 257).

Mar 21 '14 11:03 innocenat

The problem is actually quite bad.

According to the merge formula resolved for the mask=255 case: result = (ovr<<8 + main - ovr + 128) >> 8 so the result may be ovr+1 or ovr-1 when main-ovr difference is larger than 127 or less than -128. In other words, half the possible outcomes.

It gets worse progressively: when mask=254 result = (ovr<<8 + 2*(main-ovr) + 128) >> 8 which means the thresholds are approx. 64 and -64 so 75% of outcomes are wrong.

Culminating in case when mask=127 or 129 result = (ovr<<8 + 129*(main-ovr) + 128) >> 8 any change in relative luma >= 2 borks the result by 1 (99% of outcomes).

For example, when a static colored image of any color is overlayed on a dynamic video, the overlayed image will change its colors by 1 whenever the underlying video differs from the overlayed picture for more than the mentioned threshold values. Depending on the video such ±1 change can make the overlay image/clip flicker or otherwise get noticeably ugly.

A reliable test for the new merging formula would be overlaying a full-range horizontal gradient on a full-range vertical gradient:

blankclip(256, 1024, 1024, "yv12")
horiz_gradient = mt_lutspa(expr="x 255 *",u=-128,v=-128)
vert_gradient = mt_lutspa(expr="y 255 *",u=-128,v=-128)
mt_merge(vert_gradient, horiz_gradient, mt_lut(y=-255), true)

To make the artifact more realistically obvious and annoying we can make the main video change its luminance randomly and play it back:

blankclip(256, 1024, 1024, "yv12")
horiz_gradient = mt_lutspa(expr="x 255 *",u=-128,v=-128)
vert_gradient = mt_lutspa(expr="y 255 *",u=-128,v=-128).scriptclip("""tweak(bright=rand(255))""")
mt_merge(vert_gradient, horiz_gradient, mt_lut(y=-255), true)

Sep 04 '16 22:09 tophf