Skip to content
This repository has been archived by the owner on Jul 12, 2024. It is now read-only.

Replace mt_merge blending formula #12

Open
tp7 opened this issue Jun 23, 2013 · 4 comments
Open

Replace mt_merge blending formula #12

tp7 opened this issue Jun 23, 2013 · 4 comments

Comments

@tp7
Copy link
Owner

tp7 commented Jun 23, 2013

As mentioned here, mt_merge uses a slightly incorrect formula. Test script:

src = mt_lutspa(expr="x 255 *")
mask = mt_lut(y=-255)
mt_merge(mt_lut(y=-0), src, mask)
mt_lutxy(last, src, "x y - abs 75 *").grayscale()

While output clip must be all zero, it is not.

@ghost ghost assigned tp7 Jun 23, 2013
@tp7
Copy link
Owner Author

tp7 commented Jun 29, 2013

Vapoursynth's formula

dstp[x] = srcp1[x] + (((srcp2[x] - srcp1[x]) * (maskp[x] > 2 ? maskp[x] + 1 : maskp[x]) + 128) >> 8);

seems to be quite hard if at all possible to get correct in SIMD using 2 bytes/pixel.
At this point I'm tempted to say that masktool's approximation is reasonable.

@innocenat
Copy link

It's the same old formula, but with mask parameter with 0, 1, 2, 4, 5, 6, ..., 256 instead of 0 - 255.

@innocenat
Copy link

__forceinline static __m128i overlay_blend_sse2_core(const __m128i& p1, const __m128i& p2, const __m128i& mask, const __m128i& v128, const __m128i& v257) {
  __m128i tmp1 = _mm_mullo_epi16(_mm_sub_epi16(p2, p1), mask);
  __m128i tmp2 = _mm_mulhi_epu16(_mm_add_epi16(tmp1, v128), v257);
  return _mm_add_epi16(p1, tmp2);
}

Just a note on reasonably correct implementation. It passes test above but I did not test any more than that.

The idea is that divide by 255 can be done by multiply by 2^16/255 and shift right by 16, hence mulhi_epu16(x, 257).

@tophf
Copy link

tophf commented Sep 4, 2016

The problem is actually quite bad.

According to the merge formula resolved for the mask=255 case: result = (ovr<<8 + main - ovr + 128) >> 8 so the result may be ovr+1 or ovr-1 when main-ovr difference is larger than 127 or less than -128. In other words, half the possible outcomes.

It gets worse progressively: when mask=254 result = (ovr<<8 + 2*(main-ovr) + 128) >> 8 which means the thresholds are approx. 64 and -64 so 75% of outcomes are wrong.

Culminating in case when mask=127 or 129 result = (ovr<<8 + 129*(main-ovr) + 128) >> 8 any change in relative luma >= 2 borks the result by 1 (99% of outcomes).

For example, when a static colored image of any color is overlayed on a dynamic video, the overlayed image will change its colors by 1 whenever the underlying video differs from the overlayed picture for more than the mentioned threshold values. Depending on the video such ±1 change can make the overlay image/clip flicker or otherwise get noticeably ugly.

A reliable test for the new merging formula would be overlaying a full-range horizontal gradient on a full-range vertical gradient:

blankclip(256, 1024, 1024, "yv12")
horiz_gradient = mt_lutspa(expr="x 255 *",u=-128,v=-128)
vert_gradient = mt_lutspa(expr="y 255 *",u=-128,v=-128)
mt_merge(vert_gradient, horiz_gradient, mt_lut(y=-255), true)

To make the artifact more realistically obvious and annoying we can make the main video change its luminance randomly and play it back:

blankclip(256, 1024, 1024, "yv12")
horiz_gradient = mt_lutspa(expr="x 255 *",u=-128,v=-128)
vert_gradient = mt_lutspa(expr="y 255 *",u=-128,v=-128).scriptclip("""tweak(bright=rand(255))""")
mt_merge(vert_gradient, horiz_gradient, mt_lut(y=-255), true)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants