More than 99 human years are wasted every day because of uncompressed content.

Compressing your content using Gzip is a well-known best practice for achieving fast website performance. Unfortunately, a large percentage of websites are serving content uncompressed and many of the leading CDNs are part of the problem. As crazy as it sounds, most CDNs turn off Gzip by default. I decided to dig into the data we have available in the HTTP Archive to get a better look at the current state of Gzip on the web.

Background on Gzip

Gzip compression works by finding similar strings within a text file, and replacing those strings with a temporary binary representation to make the overall file size smaller. This form of compression is particularly well suited for the web because HTML, JavaScript and CSS files usually contain plenty of repeated strings, such as whitespace, tags, keywords and style definitions.

Browsers are able to indicate support for compression by sending an Accept-Encoding header in each HTTP request. When a web server sees this header in the request, it knows that it can compress the response. It then notifies the web client that the content was compressed via a Content-Encoding header in the response. Gzip was developed by the GNU project and was standardized by RFC 1952.

Since the HTTP Archive has resource-level data, it turns out to be a great way to see how many websites are serving uncompressed content. I looked at the data from the November 15th, 2012 run which crawled 292,999 websites. I then pulled out the hostnames to find the top offenders:

SELECT substring_index(urlShort, '/', 3) AS hostname, COUNT(*) AS num 
FROM requests 
  WHERE pageid >= 4147429 AND pageid <= 4463966 AND resp_content_encoding IS NULL 
  GROUP BY hostname 
  HAVING num > 1 
Original Hostname # Ungzipped Requests 236,628 161,684 154,596 115,270 90,560 78,123 74,270 64,714 56,887 51,832 51,539 45,946 45,332 41,289 41,110 38,302 37,926 37,908

Of course, the results in the previous table are a bit misleading. CDNs are usually implemented using a CNAME record, which allows them to be white-labeled by their customers. To get an accurate list, we need to look up each of the DNS records. Once we unroll the CNAME records, we get a very different list, as shown in the following table.

Not surprisingly, Akamai does more traffic than anyone. Interestingly, while only 40% of the traffic served from is Gzipped, notice Akamai is also listed in fourth place on the list with serving 72.5% Gzipped. From what I understand, Akamai uses the domain for their legacy customers while is used for their newer customers.

Unrolled Hostname Total requests # Gzipped % Gzipped 1,729,000 693,507 40.1% 1,160,989 738,854 63.6% 458,776 386,121 84.2% 454,605 329,810 72.5% 217,870 217,462 99.8% 210,126 26,271 12.5% 183,497 37,255 20.3% 152,074 41,779 27.5% 118,647 44,965 37.9% 113,428 79,458 70.1% 76,043 44,022 57.9% 73,642 16,111 21.9% 63,677 25,440 40.0% 61,281 4,253 6.9% 57,856 7,805 13.5% 57,216 18,096 31.6% 56,840 21,985 38.7% 56,737 27,223 48.0% 55,723 34,339 61.6% 54,044 7,104 13.1%

Dealing with already compressed content

One flaw with the data so far is that we haven’t considered the type of content being served and whether it makes sense for that content to be Gzipped. While Gzip is great for compressing text formats like HTML, CSS and JavaScript, it shouldn’t necessarily be used for everything. Popular image formats used on the web, as well as videos, PDFs and other binary formats, are already compressed. This means Gzipping them won’t provide much additional benefit, and in some cases can actually make the files larger.

I ran a quick experiment using several hundred images from around the web of various sizes and types. The results show an average of 1% reduction in size when these already-compressed files are Gzipped. Considering the extra CPU overhead, it’s probably not worth doing. While the average was only 1%, I did find a handful of outlier images where using Gzip actually made a significant difference. One such example is the logo for Microsoft Azure. The image Microsoft uses is 19.29 KB. When Gzipped, the logo drops to 12.03 KB (a 37% reduction).

Ideally, the decision about whether to use Gzip should be made on a resource-by-resource basis. In practice, most people decide whether or not to Gzip a file based on its content-type and for the majority of cases, that’s a perfectly reasonable decision.


Compressing and decompressing content saves bandwidth, but uses additional CPU. This is almost always a worthwhile tradeoff given the speed of compression and the huge cost of doing anything over the network.

Size matters

Another thing my quick experiment confirmed is that Gzip isn’t great when dealing with really small files. Due to the overhead of compression and decompression, you should only Gzip files when it makes sense. Opinions vary on what the minimum value should be. Google recommends a minimum range between 150 and 1,000 bytes for Gzipping files. Akamai are more precise and claim that the overhead of compressing an object outweighs the performance gain at anything below 860 bytes. Steve Souders uses 1KB as his lower limit while Peter Cranstone, the co-inventor of mod_gzip says 10KB is the lowest practical limit. In practice, it probably doesn’t matter much which of these numbers you pick as long as it’s less than 1KB since it will most likely be transmitted via a single packet anyway.

Taking these factors into consideration, let’s update our query and filter our results to exclude images & other binary formats and limit to files larger than 1KB.

SELECT substring_index(urlShort, '/', 3) AS hostname, COUNT(*) AS num 
FROM requests 
    pageid >= 4147429 
    AND pageid <= 4463966 
    AND resp_content_encoding IS NULL 
    AND mimeType IN (
    AND respSize > 1024 
  GROUP BY hostname 
  HAVING num > 1

Here are the results when you filter the results to only consider text-based resources and a minimum size of 1KB:

Hostname # Ungzipped Requests 14188 5226 4916 4565 4331 4079 3938 3617 3136 3118 3001 2832 2703 2598 2433 2279 2165 2030 1945 1894

I talked with someone on the Google Plus team and they were surprised to see their domain at the top of this list. They’re still not sure why so many requests are being served ungzipped but they are investigating the issue. I think it’s telling that even top-notch engineering companies like Google are still trying to get this right. To be fair, the only reason they are top of the list is because they use a single domain, as we’ll see when we roll up the hostnames.

Update 2/20/13: It turned out there was a bug in WebPagetest that was impacting the accuracy of this data. I’m appears that some headers where being hidden from the browser when loading over https. I have updated the data above which now shows that Google Plus isn’t the worst offender after all (they don’t even make the list). Sorry about that.

Hostname # Ungzipped Requests 41,918 30,107 23,947 17,715 13,546 11,190 10,895 10,635 6,425 5,900 4,650 4,008 3,436 3,016 2,970 2,968 2,391 2,361 2,345 2,218


Doing this research was a great reminder to me of how lucky we are to have the HTTP Archive. It’s a great resource as it makes it easy to do quick analysis like this. Both the code and the data are open sourced so anyone can grab their own copy of the data to check my work, or do a deeper analysis.

The results themselves are pretty shocking. Gzip is one of the simplest optimizations for websites to employ. Turning on Gzip requires very little effort and the performance gains can be huge. So what’s going on? Why are CDNs not doing more to enable compression for their customers? Sadly, as it often turns out, to find the answer you simply need to follow the money. CDNs sell themselves as a tool for improving performance, but they also charge by the byte. The larger the files you send, the more money your CDN makes. This puts their business goals directly at odds with their marketing that says they want to help make your website fast. As a side note, Last-modified headers are another place where this conflict of interest exhibits itself. The shorter the cache life on your content, the more traffic your CDN gets to serve. Shorter TTL’s increase their revenue while hurting your website performance.

As website owners, it’s important for us to understand these business dynamics and be proactive to make sure best practices are being followed on our sites. The good news is that with Real User Measurement (RUM) it’s easier than ever to measure the actual performance that your visitors are experiencing. Less than a year ago there wasn’t a good RUM solution available on the market. Today, hundreds of sites are using Torbit Insight or a similar RUM tool to measure their site speed and correlate their website performance to their business metrics.

RUM is a great way to measure the actual results your CDN is delivering. Perhaps you’ll discover, like Wayfair, that you aren’t getting the performance gains from your CDN that you expect. As I tell people all the time, the first step to improving your speed is making sure you have accurate measurement. The second step is making sure you have covered the basics like enabling Gzip.


Josh Fraser (@joshfraser) is the co-founder and CEO of Torbit, a company that offers next generation web performance with a free Real User Measurement tool that allows you to correlate how your speed impacts your revenue. Torbit also offers Dynamic Content Optimization which can double the speed of an average site. Josh has been coding since he was 10 and is passionate about making the internet faster.