Alternative graphic test methods/workflows #4465

wjbenfold · 2021-12-17T15:10:11Z

wjbenfold
Dec 17, 2021
Collaborator

This is motivated by the pinned dependency on Pillow that Iris currently has for the graphic tests.

Current process

When the graphic tests run, each test creates an image and then imagehashes it. This imagehash is tested against the known good results in imagerepo.json. If there's a close enough match then the test passes, if not then if fails. In the failure case, the developer then follows a process of checking their images and (if the failure should be a pass) adding updated success options to both imagerepo.json and the test-iris-imagehash repository.

Current issues

An update to Pillow or imagehash can break all of the imagehashes simultaneously. The last time this happened, we responded by pinning the version of Pillow.
The test-iris-imagehash repo only grows over time (it's currently ~45Mb).

Suggestions

Use imagehash to just compare images in the moment (rather than storing historical imagehashes)

imagerepo.json could store known good sha256 values of graphic test results, with tests failing every time the sha256 doesn't match an acceptable value.
The process for fixing the tests could involve comparing the freshly computed imagehash of the known good image(s) in test-iris-imagehash (these now being indexed by test) and the test result. A successful pass here would add a sha256 to imagerepo.json, a failure that was deemed acceptable by the developer could lead to a new image being uploaded to test-iris-imagehash.

Good:

Uses imagehash as designed
Future-proofed against changes to imagehash, pillow or anything else

Bad:

Large-scale rewrite of graphic test infrastructure
imagerepo.json could end up with a lot of sha256 values in for all of the permutations of cartopy / matplotlib / etc. versions

Keep using imagehash as we have, but make it easier to adapt to changes in the hash algorithm

Given that a symbolic link takes very little space and we could generate them programmatically, we could have a script in the test-iris-imagehash repo that will generate a new folder with symbolic links to a centrally stored folder of images. We only need the images to exist once. If we're worried about hash collisions, particularly between hashes generated with different algorithms, we could store some metadata in each folder specifying which versions of Pillow and imagehash it's good for. If we haven't got a certain combination covered then we could automatically generate them by pulling a known good version of Iris and running the tests (or for each image in the central image store, knowing which lockfile and Iris commit it was generated with).

We would also need to update the values in imagerepo.json to match the new Pillow version. If we want to be able to specify Pillow and imagehash versions to make tests stricter then we'd also need a bit of a tweak to the infrastructure in Iris.

Good:

Not too much rewriting
Tracks full history of acceptable values

Bad:

test-iris-imagehash repo gets a lot of symlinks

Questions

Do we need to be able to run the tests using obsolete versions of Pillow?
How often do hash collisions happen? Do we need to worry about them?
Have I missed anything huge and important?
Are we bothered by the size of the test-iris-imagehash repo? 45Mb doesn't seem huge?

Alternative approaches I've not gone into

Split the tests off from core Iris This rather avoids the motivating problem as Pillow ceases to be a dependency of core Iris, and might be valuable in its own right, but feels a separate discussion).
Implement our own version of what Pillow does This is sort of putting off the problem as if we ever need to update that code for some reason we're back in the same place. It's also not easy - @jamesp put some work into seeing if it was quick and it isn't.

trexfeathers · 2022-02-18T12:46:12Z

trexfeathers
Feb 18, 2022
Maintainer

The first one! It's brilliant!

I'd like to rename/restructure iris-test-imagehash to make it clear that the file names no longer have anything to do with their imagehash values, with corresponding renaming of the list items in imagerepo.json.

Then each graphical test:

Runs its contents
Hashes the output image
Hashes each applicable image referenced by imagerepo.json
(So this is the only change from currently).
Compares the test result with the acceptable list for a match

Not entirely sure if this is what @wjbenfold meant, but I like it anyway.

If it ends up taking longer - due to 're-hashing' all potential image matches every time - then I recommend creating a dedicated CI run.

7 replies

rcomer Feb 23, 2022
Collaborator

Another option to reduce hashing time (if necessary) might be to compare each known good hash to our result before calculating the next one. So we stop once we’ve found a match. Even better if there’s a way to tell which image was added most recently as, intuitively, that should be the most likely to match. So we could work backwards chronologically through the options.

trexfeathers Feb 24, 2022
Maintainer

Another option to reduce hashing time (if necessary) might be to compare each known good hash to our result before calculating the next one. So we stop once we’ve found a match. Even better if there’s a way to tell which image was added most recently as, intuitively, that should be the most likely to match. So we could work backwards chronologically through the options.

That would work really well, as most of the older images are there just to support backwards compatibility. And I don't think this would be extra work either since a loop will need writing even in the most basic case.

wjbenfold Feb 24, 2022
Collaborator Author

This is most of what I meant, but I also intended to have a list of known good past hashes for each image, so you only start hashing images from the repo if none of the list of good hashes in the repo works for your current test. That means that most of the time we'll only be hashing the image we're testing (except times that Pillow or some other dependency that changes the hash but not the image is updated).

jamesp Feb 24, 2022
Maintainer

There is no need for a hashing algorithm if we are re-evaluating the known good images every test run. We could just compare the images directly.

wjbenfold Feb 24, 2022
Collaborator Author

The fuzziness of the hashing means that we can just store an image for each acceptable hash, rather than an every possible image (which I assume is more, but I don't know).

wjbenfold · 2022-02-24T14:02:44Z

wjbenfold
Feb 24, 2022
Collaborator Author

Another question - other than the ones in the post - how do we provide backwards compatibility? Can someone trying to test an older version of Iris just use an older version of the image repo too, or do we need to support it more?

1 reply

trexfeathers Feb 24, 2022
Maintainer

This is the reason that most tests have a series of potential matches in the image repo. However I've never been clear about why we bother.

wjbenfold · 2022-02-24T16:47:10Z

wjbenfold
Feb 24, 2022
Collaborator Author

Proposal in a PR: #4602

1 reply

wjbenfold Mar 15, 2022
Collaborator Author

New proposal in a new PR (this time it actually redoes the hashes every time) #4640

jamesp · 2022-03-08T16:54:13Z

jamesp
Mar 8, 2022
Maintainer

I'll document the investigation I've done into this here, so if anyone else wants to pursue it while I'm away they can.

Attached is a notebook for exploring image hash.

Based on reviewing the imagehash library the algorithm calculating the perceptual image hash for a single image goes something like this:

LOAD image
GRAYSCALE image
RESIZE image to NxN square
DISCRETE FOURIER TRANSFORM image
FILTER FOR N lowest frequency amplitudes
MAKE BINARY by comparing each frequency amplitude to the median amplitude
CONVERT TO STRING by making the binary array a hex string

The only steps that requires PIL are 2 and 3. The rest rely on numpy and scipy.

GRAYSCALE is easy, we can produce a bit-for-bit reproduction of that (see notebook). RESIZE is more tricky. A general resize is straightforward using scipy instead of pillow. But the imagehash algo that we used to generate all the hashes uses a specific resize algorithm, the Lanczos filter. I've had a couple of quick goes at reimplementing that in in numpy - it's not impossible but its also not simple. Again, see the notebook.

So if we wanted to drop the PIL/Pillow dependency, a couple of options:

Invest dev time in reimplementing the Lanczos filter without PIL to retain the same hash values we had previously.
Or we can drop the Lanczos filter and use a scipy-built-in resizer. Then rebuild the index of hashes using that algo.

imghash_new2.zip

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative graphic test methods/workflows #4465

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Alternative graphic test methods/workflows #4465

wjbenfold Dec 17, 2021 Collaborator

Current process

Current issues

Suggestions

Use imagehash to just compare images in the moment (rather than storing historical imagehashes)

Keep using imagehash as we have, but make it easier to adapt to changes in the hash algorithm

Questions

Alternative approaches I've not gone into

Replies: 4 comments · 9 replies

trexfeathers Feb 18, 2022 Maintainer

rcomer Feb 23, 2022 Collaborator

trexfeathers Feb 24, 2022 Maintainer

wjbenfold Feb 24, 2022 Collaborator Author

jamesp Feb 24, 2022 Maintainer

wjbenfold Feb 24, 2022 Collaborator Author

wjbenfold Feb 24, 2022 Collaborator Author

trexfeathers Feb 24, 2022 Maintainer

wjbenfold Feb 24, 2022 Collaborator Author

wjbenfold Mar 15, 2022 Collaborator Author

jamesp Mar 8, 2022 Maintainer

wjbenfold
Dec 17, 2021
Collaborator

Replies: 4 comments 9 replies

trexfeathers
Feb 18, 2022
Maintainer

rcomer Feb 23, 2022
Collaborator

trexfeathers Feb 24, 2022
Maintainer

wjbenfold Feb 24, 2022
Collaborator Author

jamesp Feb 24, 2022
Maintainer

wjbenfold Feb 24, 2022
Collaborator Author

wjbenfold
Feb 24, 2022
Collaborator Author

trexfeathers Feb 24, 2022
Maintainer

wjbenfold
Feb 24, 2022
Collaborator Author

wjbenfold Mar 15, 2022
Collaborator Author

jamesp
Mar 8, 2022
Maintainer