-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write image diff to disk even if test passed #234
Comments
You're testing apples and oranges. At the very least, if I were you I'd set
up a docker container on your Mac for the snapshot generation. That's
really your best bet is it you want these snapshots to be meaningful.
We'll review the request though to see if it's a worthwhile feature though.
…On Fri, Aug 21, 2020, 01:43 Ned Twigg ***@***.***> wrote:
I have been forced to set my image diff threshold pretty high:
customDiffConfig: {
threshold: 0.3,
},
failureThreshold: 0.1,
failureThresholdType: "percent",
The thing I'm snapshotting is text rendered by puppeteer. The snapshots
are created on mac, but CI runs on linux. Small changes in font rendering
(especially font width) add up across the width of the image. I tried ssim
and its various modes, but they required me to set threshold even higher,
20-40%.
As a result of the high threshold, I'd like to have the option to manually
audit CI builds by looking at the image snapshots as artifacts, even if the
tests fell below the 10% threshold needed to fail. Maybe an option like
dumpDiffToDiskEvenOnPass=true
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#234>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKQJRXQUIA2GMCR4N55CTSBYQSXANCNFSM4QHADD5A>
.
|
This describes my situation exactly, including my experience with |
I wonder about an image diff algorithm like this:
Obviously that's a huge feature request, and I'll be honest that it's definitely not going to make the top of my todo list. But screenshot comparison which is robust to minor font changes but rejects content changes would be super useful :D. You need something which can drift horizontally across the page, which the algorithm above can do. It would fail on text reflow though :( Container is definitely easier if you don't mind complexity in the dev workflow. |
I think there's some general confusion about what pixelmatch and SSIM do, and why you're not achieving the desired results. Both metrics are designed to tell how different one image is from a reference image. They fundamentally treat the reference image as a pure signal (think signal to noise ratio) and calculate degradation. This degradation can happen on what seems like identical platforms (Linux Chrome vs. Linux Chrome). For instance, let's say one of the chips is a brand new AMD Ryzen, and the other is an old Intel Xeon. Because the CPU vectorized runtime selected instructions don't match, they produce imperceptibly different output. However, pixel values generated are significantly different because of the way the values were calculated. An alternative case, still on the same operating system, is when Chrome will offload to the GPU. In these cases, SSIM is a far superior metric relative to pixelmatch because the images are no longer apples to apples because the filtering and transformation of the pixels produce different output. SSIM achieves excellent results in these cases because it is a metric derived from the mean, variance, and covariance over pixel windows (say 11x11 squares) from each pixel. As a result, SSIM can now compare two identical images produced through different transformations in terms of the pixels relationships between each other -- restoring the apples to apples comparison it should be. Now let's compare this to the case you're describing. You're trying to determine not whether or not the two images match -- but whether the outputs are acceptable to the user. This sits somewhere between functional equivalence and a computer vision problem. In an ideal world, you'd use something like a Bayes algorithm (think SPAM analysis) to do a fuzzy match analysis. But how do you do that at scale? This requires extensive training of the algorithm to know whether or not all of the information communicated is communicated equivalently. For this particular case, you might benefit from an OCR comparison derived comparison to ensure all of the characters are extracted, and the extraction is equal -- but that's outside the scope of a pure image comparison function. |
If I understand correctly, you're looking to edge detect subimages inside images and then compare them against what you expect to be subimages inside another image, is that right? |
Exactly - to take take the very hard problem of the OCR and semantic meaning of the screenshot, and turn it into an image processing problem. SSIM doesn't need neural-net object detection to identify and ignore compression artifacts, and I don't think OCR is required to ignore changes in font spacing. The reason SSIM works badly on minor font-spacing changes is that it assumes there's no drift, only local artifacts. It works quite well for the first few tiles of text, but past ~100px the minor change in font spacing has caused the two images to be completely uncorrelated. If you draw a horizontal scanline, find the median RGB and define it as zero, and then count how many times the scanline crosses that zero, then that count alone will be a very good signature for the content of the text. It would be hard to add, remove, or change a letter, without changing that metric. But if you change the spacing or weight of the font, it would not affect the metric at all. The problem with the "zero-crossing" approach is that it's hard to reconcile back into "% pixels" different, which is why the weighted-dot-product approach is probably a more natural fit. |
I'm open to suggestions, and I don't particularly care about the percent or pixel threshold. If it needs to be adjusted or changed for circumstances, it's not a big deal. If you want to make a specific suggestion for how to implement this, please check out weberSsim.ts in ssim.js 3.2. it has my new implemention that can calculate any individual variance covariance or mean in constant time, and it can do that over any size square window of pixels. If it's doable and it works, I'll implement it and post it here. |
This issue is stale because it has been open 30 days with no activity. |
I have been forced to set my image diff threshold pretty high:
The thing I'm snapshotting is text rendered by puppeteer. The snapshots are created on mac, but CI runs on linux. Small changes in font rendering (especially font width) add up across the width of the image. I tried
ssim
and its various modes, but they required me to set threshold even higher, 20-40%.As a result of the high threshold, I'd like to have the option to manually audit CI builds by looking at the image snapshots as artifacts, even if the tests fell below the 10% threshold needed to fail. Maybe an option like
dumpDiffToDiskEvenOnPass=true
The text was updated successfully, but these errors were encountered: