Gilles Dubuc (@MonsieurPerf) is the engineering manager of the Performance Team at the Wikimedia Foundation.
Kornel Lesiński, of ImageOptim fame, did an excellent talk at performance.now() about image optimisation. In it, he suggested that a single-frame AV1 video could already outperform a JPG or WebP image in terms of compression ratio. With the single-frame AV1 being half the size of the JPG. I decided to verify that claim.
Compression performance
First, I took a reference image from Wikimedia Commons and generated a “perfect thumbnail” for it, as a PNG. PNG being losslesss, this gives us the reference to aim for. If image compression was lossless, it would end up being 100% identical to that PNG. But by definition, with JPG, WebP and AV1, we’re talking about lossy compression. The main question when you use lossy compression is: how much visual quality do you lose?
The images embedded in the article are lossless PNGs generated from the mentioned source. This is to ensure that they can all be displayed here. Their visual quality loss remains the same, which is the point of showing the images.
Reference PNG
Then I took the JPG and the WebP thumbnails Wikimedia currently generates in production for this image. The quality settings we use for our thumbnails have been determined by community consensus over the years, and whenever I consider introducing something new – like WebP was recently – I ensure that the new thumbnails match the visual quality of old ones. For other websites, the compression settings are always a tradeoff between size and quality, and it’s up to you to decide what’s best for the situation.
Production JPG | Production WebP |
In order to compare visual quality, I decided to use DSSIM. There are other methods to do this, but DSSIM is very ubiquitous and simple to use and it’s what I used previously to determine Wikimedia’s WebP compression settings. I used the command line tool written by Kornel to check dssim scores. This tool compares 2 PNGs, it’s easy enough to convert any lossy compressed image I have into a lossless PNG version of it for comparison’s sake.
$ convert 400px-President_Barack_Obama.webp 400px-President_Barack_Obama.webp.png $ dssim 400px-President_Barack_Obama.png 400px-President_Barack_Obama.webp.png 0.006170 400px-President_Barack_Obama.webp.png $ exiftool -v -v -v 400px-President_Barack_Obama.webp | grep RIFF | grep 'VP8 ' RIFF 'VP8 ' chunk (36426 bytes of data):
Finally, using the latest versions of ffmpeg and libaom from master, I set out to compress the reference PNG thumbnail with AV1, stored in an MKV container. While WebP and MKV as formats contain different amounts of metadata, using exiftool in verbose mode lets me find out what size the image data itself really is. Through a bisecting method, I tweaked the AV1 quality settings until I reached a DSSIM score that was as close as possible to our production WebP thumbnail.
$ ffmpeg -loglevel panic -i 400px-President_Barack_Obama.png -c:v libaom-av1 -crf 41 -b:v 0 -strict experimental -vf format=yuv420p 400px-President_Barack_Obama.av1.mkv $ ffmpeg -loglevel panic -i 400px-President_Barack_Obama.av1.mkv 400px-President_Barack_Obama.av1.mkv.png $ dssim 400px-President_Barack_Obama.png 400px-President_Barack_Obama.av1.mkv.png 0.005782 400px-President_Barack_Obama.av1.mkv.png $ exiftool -v -v -v 400px-President_Barack_Obama.av1.mkv | grep SegmentHeader + [SegmentHeader directory, 23796 bytes]
Single-frame AV1
If you open these 4 images and flip between them, you’ll see that besides the WebP one that has a little color skewing in the dark blues, it’s very hard to tell them apart. Which is confirmed by the DSSIM scores. Here are the results:
Image type | DSSIM | Image data weight* | Gain compared to JPG |
---|---|---|---|
JPG | 0.005833 | 49314 | 0% |
WebP | 0.006170 | 36426 | 26.1% |
AV1 | 0.005782 | 23796 | 51.7% |
*we look at the image data only, to discount differences in container/metadata size
The hype is indeed real, based on this test of the typical kind of photograph shown on Wikipedia articles. It would need to be performed on a large quantity of images to be confirmed, but this initial result is enough of a signal to confirm that yes, AV1 image compression is likely to be much better than JPG or WebP in terms of file size for a given visual quality.
AVIF
There is already an image format in the works based on AV1, that the Alliance for Open Media is working on, called AVIF. In fact, just a few days ago, the first test AVIF files were released. AVIF aims to be a very feature-complete image format, supporting animations, live photos and more. It’s not trying to have a small feature set and remain lightweight like WebP. Nevertheless, it’s still based on AV1 and if you don’t shove a ton of metadata in it, it should give you all the benefits of the better AV1 compression. We should find the same file size savings as in my small test with AVIF.
With all that information at hand, it should be obvious that soon browsers should support AVIF and with files half the size of JPG for the same quality, everyone should use it for everything, right?
Well, that’s where the full picture is necessary. File size is one thing, but a compressed file needs to be decompressed by the client. When discussing this with image codec engineers at Google, they told me that they observed that AV1 could require 10 to 15 times more processing power/time to decode than VP8. Lossy VP8 is what WebP is based on.
Does the browser spend a lot of time decoding images? What would be the impact of image decoding time taking 15 times longer than it currently does?
Image decoding time
With more efficient AV1 decoders coming out, like dav1d and its impressive performance improvements, 15x might be an overly pessimistic scenario to consider, but we’ll look into it for the sake of getting an idea of what that would mean.
Image decoding doesn’t happen on the main browser thread, which means that with all the multitasking a browser does, extra decoding time might happen on a CPU core that would have otherwise been idle. However, if you consider a single image above the fold, even if the browser has spare processing power, this image will get delayed no matter what by the extra image decoding time. For images below the fold, this is less of a problem, as there is probably time for the extra decoding to happen before the user scrolls to them.
To determine what the impact might be in a worst case scenario, I decided to have WebP decoding time tested on a real device. A “budget” phone, the Motorola G5. Using browsertime to collect the measurements in a HAR file, including some timings extracted from the Chrome Trace logs on the real device, I found that decoding a 37102 bytes WebP took on average 561μs. That doesn’t sound like a lot, but let’s do some calculations.
If the savings found previously for AV1 in my small test hold true for AVIF, an AVIF version of that WebP file could potentially save 12864 bytes from being transmitted. If AVIF decoding is 15x slower, however, it would take an extra 7.854ms to decode. Ignoring potential extra TCP roundtrips, what bandwidth do you need to download 12864 bytes in 7.854ms? 1.6 Mbps.
This means that for anyone with an internet connection faster than 1.6Mbps, users on such budget phones would just be better off downloading the extra bytes of the WebP version than the smaller AVIF and this image would appear faster.
Even in this worst case, it’s not all bad. As Tim Vereecke showed in his earlier contribution to the calendar this year, a growing number of users are signalling via the Save-Data header that data savings are important to them. For users whose data is expensive, the AVIF savings are worth it, even if the images might take a few extra milliseconds to be decoded and displayed as a result. Furthermote, with APIs like Network Information, we have a way to know the effective connection type of users, and we can tell for which users the tradeoff is worth it. We could decide to serve AVIF images to users with effective type “slow-2g”, for example.
Conclusion
An AV1-based image format like AVIF isn’t a performance silver bullet yet, but could already be very useful right now in scenarios where data savings are the most important thing, or when the connectivity is bad. With such a big push for AV1 as a format for the future of video on the web, it’s also possible that future generations of devices will get hardware supporting AV1 decoding at higher performance, which AVIF could benefit from indirectly, potentially making the issue of extra decoding cost go away.
I would like to thank Peter Hedenskog, James Zern, Pascal Massimino and Jai Krishnan for help and discussions that led to this article.