Giving Your Images An Extra Squeeze

29thDec 2012 by Yoav Weiss

ABOUT THE AUTHOR

Yoav Weiss (@yoavweiss) is a developer that likes to get his hands dirty fiddling with various layers of the Web platform stack. Constantly striving towards a faster Web, he's trying to make the world a better place, one Web performance issue at a time. You can follow his rants on Twitter or have a peek at his latest prototypes on Github.

According to the latest HTTP archive stats, the average Web page weighs 1286KB, and 60% of that is image data. That means that properly compressing image data is of utmost importance for the overall page content size and hence its loading time. It also has a significant impact on the data plan hit users incur when they browse the Web on their mobile devices.

Byte distribution per content type: Images 793KB, Scripts 211KB, Stylesheets 35KB, Flash 92KB, HTML 54KB, Other 101KB - Total 1286KB

Yet, when we look at the actual numbers “in the wild”, we see that few developers actually compress their images, and even for those that do, the results are not always ideal.

A few months ago, I downloaded 5.8 million images from Alexa’s top 200,000 sites. Using that image data, I’ll demonstrate how much data can be saved by properly compressing images.

Image Formats

I’m sure most of you know this by now, but here is a short overview of the image formats on the Web:

GIF – Best suited for computer generated images with relatively few number of colors. It works by choosing a palette of up to 256 colors that best fits the image, creating a bitmap that represents the image using the palette’s color numbers, and then compressing that bitmap using a generic compression algorithm. The format supports animation and transparency, but not a full alpha channel.
PNG – Best suited for computer generated images, but can represent more than 256 colors. The format has several subtypes. The subtype usually referred to as PNG8 is very similar to GIF, but uses a different compression algorithm. It does not support animation, but does support a full alpha channel. The subtypes referred to as PNG24 and PNG24α can represent the full RGB color space, with the latter also supporting a full alpha channel. The downside is that both PNG24 subtypes are represented as bitmaps to which a generic compression algorithm is applied. This is usually not ideal in terms of byte size.
JPEG – Best suited for real life photos. It is not a bitmap based format, but represents the images by storing the frequency of color changes between different pixels, eliminating high frequencies that humans are likely not to notice anyway, and then compressing that. It is a lossy image format, which means a JPEG cannot be converted to the original bitmap image with perfect accuracy. For most uses on the Web, this is not a limitation.
WebP – Best suited for both real life photos and computer generated images, since it can employ both lossy and lossless techniques. Based on the VP8 video codec, the WebP format uses predictive coding to achieve its high lossy compression rates and the latest entropy coding techniques to achieve better lossless results. It also supports a full alpha channel and animation.What’s the catch, then? The main issue is that WebP is not really part of the Web platform’s “official” formats since it is only supported by Chrome and Opera at the present. The lack of simple fallback mechanisms (both client and server side) poses a high barrier of entry for developers that want to use WebP today.

Here’s a look at the presence each format has on the Web today based on bytes.

Format Distribution

Image format	% in bytes
JPG	66.9%
Animated GIF	6.4%
Non-animated GIF	5.3%
PNG8	1.3%
PNG24	5.2%
PNG24α	14.3%
icons	0.4%
bitmaps	0.2%

Some of you may say: “You forgot SVG!”. I didn’t. SVG comprise only 0.001% of the overall image data, so it didn’t make it into the format distribution table. Sad, but true.

Lossless Optimization

In my quest for finding image optimization opportunities, I first sought to find the savings that could be achieved without any compromise on quality. I ran lossless optimizations on JPEG and PNG using the jpegtran and pngcrush utilities, as well as conversion to lossless WebP. The results are below.

Optimization	Data Reduction
JPEG EXIF removal	6.6%
JPEG EXIF removal, optimized Huffman	13.3%
JPEG EXIF removal, optimized Huffman, Convert to progressive	15.1%
PNG8 pngcrush	2.6%
PNG8 lossless WebP	23%
PNG24 pngcrush	11%
PNG24 lossless WebP	33.1%
PNG24α pngcrush	14.4%
PNG24α lossless WebP	42.5%

Overall with these lossless optimization techniques about 12.75% of image data can be saved. That is 101KB for an average page! If we use the lossless variant of WebP, we can save 18.2% of overall image data for browsers that support it, which is 144KB.

Lossy Optimization

Now let’s see what happens when we are willing to (slightly) compromise quality for the sake of data savings. I used the SSIM index in order to get an objective idea of the trade-off we make between quality and byte size. Basically, an SSIM score of 100% means identical images. Lower SSIM score means a bigger difference between the images.

JPEG

Using ImageMagick I compressed JPEGs several levels of quality. Then I applied the lossless optimizations that we saw above to them, in order to squeeze these images some more. I also compressed the images using imgmin which is a utility that deploys binary search to find the ideal quality level for each image. Finally, I ran JPEG to WebP conversion to see if the benefits match Google’s result of 30% data reduction.

Quality Level	Data Reduction	SSIM
75	50%	96.22%
50	64.6%	92.28%
30	73.3%	89.13%
imgmin	38.6%	97.52%
WebP 75	68%	95.28%

WebP gives us compression levels close to “quality 30” with “quality 75” image quality. Another way to look at this is that WebP files are 37% smaller than the size of JPEGs with equivalent quality.

PNG24

I tried several lossy optimizations on these images: Kornel Lesiński‘s improved pngquant, conversion to JPEG using ImageMagick+jpegtran and conversion to WebP.

Method	Setting	Data Reduction	SSIM
pngquant	256	57.1%	99.8%
pngquant + lossless WebP	256	63.2%	99.8%
JPEG	75	77%	94.6%
WebP	75	84.7%	95.1%

I’m not sure what’s more impressive here, pngquant’s 57.1% data reduction with practically zero quality loss, or JPEG’s and WebP’s results. Here again, the WebP files were 33% smaller than JPEG. Lossless WebP gave an extra 14.2% compression when applied to PNGs after pngquant. Note: I avoided converting PNGs smaller than 500 bytes to JPEG since this usually resulted in larger file sizes.

PNG24α

For PNGs with an alpha channel, I couldn’t use the above conversion to JPEG, since JPEG doesn’t have an alpha channel. Also, because of problems the original SSIM utility I used had with a full alpha channel, I used Kornel’s dssim utility instead.

Method	Setting	Data Reduction	SSIM
pngquant	256	63.1%	99.8%
pngquant + lossless WebP	256	69%	99.8%
WebP	75	77.9%	94.8%

Again, pngquant’s results are extremely impressive, providing files that are almost 3 times smaller with negligible quality loss. Lossless WebP gave an extra 15.8% compression on these pngquant results. Lossy WebP provides even better compression results with files that are 40% smaller than pngquant and almost 5 time smaller than the original PNGs, although it does that with slight visual quality loss.

Why Don’t Developers Compress Their Images?

While I have no evidence to support that theory, I suspect most developers don’t compress their images since there is no automated process in place. Depending on the workflow, there are a few options to automate image compression:

Build time – For static images, adding image compression utilities to the build process can make sure that no static uncompressed images make it through.
Upload time – For images that are dynamically added by the site’s users or administrators, the developers should find a way to add image compression utilities to the upload process. That may not always be easy (e.g. when working with legacy CMSs), but it is essential to avoid serving bloated images to users.
Serving time – If neither of the previous options is feasible, there’s always the possibility to apply image compression before the images are served to the user. The open source option here is mod page speed‘s image optimization filters. Otherwise, commercial options are also available.

Each developer should choose the optimization options that fit his workflow best, but everyone should automate image optimization, otherwise there’s a strong chance it will not happen.

Conclusions

Even though every Web developer knows that images must be properly compressed, very few actually do that optimally, as we can see from the extra compression we can squeeze out of the Web’s images, with no or little compromise in terms of quality.

Even if developers only choose the truly lossless path, image data can be reduced by 12.75% or 100KB per page. Using lossless WebP turns that into 18.2% or 144KB for supporting browsers.

If every Web developer would employ maximal lossy and lossless techniques to compress his site’s images to the maximal extent, with practically non existent visual impact (i.e. imgmin for JPEG, pngquant for PNG24), the current average page size image data could be reduced by 37.8% or 300KB!

Willingness to apply more lossy techniques (but still maintain good visual quality), can result in 47.5% image data savings or 368KB.

Using WebP would increase the savings to 61% of image data or 483KB for browsers that support it.

That’s huge! Image compression is something that every one of us should add into our workflow, since it can save a large chunk of your site’s Web traffic. All the tools I used are free and open-source software. There are no excuses!

Web Performance Calendar