Back in 2008 I took a look at all of the pages that had been tested on WebPagetest to get an idea of how things looked (both from a performance perspective and at how well optimized pages were in general). It was pretty eye-opening and showed that there was a LOT of room for improvement. Here we are, two and a half years later with significantly increased focus on web performance, have things gotten any better? I went and pulled the last 6 months of test data from WebPagetest for both IE7 and IE8 and also brought back the data from 2008 to compare.
Just a really quick word on what we’re looking at before we dive into the results. The data is all from testing done at the Dulles, VA location (the default test location) and only one data point for any given unique URL is used (the most recent successful first-view test for a given URL). There were roughly 100,000 tests (unique URLs) in the 2010 IE7 data set, 20,000 tests in the 2010 IE8 data set and 25,000 tests in the 2008 IE7 data set. No attempts were made to only select URLs that were in all 3 data sets. This is also not a neutral set of pages (like a Top 500 list) so there should theoretically be a skew towards pages where people were explicitly looking at or working on the performance of the site (though all of the Top X pages are included in the data set as well because during the time periods there were several bulk tests with controlled URL sets focused on Alexa, retail, etc).
So, enough with the background and on to what you came here for…
Page Load Time
First up is the overall Page Load times. This is a measurement of the time from when the user first tries to navigate to a pageuntil the onLoad event fires (all of the content and images finish loading).
Start Render Time
Next up is the start render time. This is the amount of time between when the user starts to navigate to a page and the first time they are looking at something other than a completely white page. This doesn’t necessarily mean the content has loaded as it can sometimes be just a CSS background color or image but it is guaranteed that no content was visible before this.
Page Size (bytes)
The Page Size measurement is the number of bytes downloaded over the Internet to load the page.
Pages got substantially heavier from 2008 to 2010 (around 150KB heavier at the median).The really scary part – 20% of pages are over 1 MB and 10% are even over 1.5 MB, YIKES! It is good to see that the IE7 and IE8 lines are basically righton top of each other which shows that even with the different sample sizes the distribution of URLs is pretty similar between both data sets.
Number of Requests
Just like with the page weight, the pages also made significantly more requests in 2010 than they did in 2008 (10 additional requests at the median). As we add more and more third-party widgets to our pages this will continue to get worse and has a pretty big impact on the performance of pages. The 15-20% of pages with over 100 requests is also pretty scary.
So, now that we know how things are performing let’s see how well optimized the pages are. We’re going to look at what I consider to be the most critical optimizations (until you have implemented all of these you shouldn’t be looking at anything else and there is no reason in this day and age that sites shouldn’t be doing these).
Keep-Alives are one of the most basic optimizations and simplest to implement (it’s just a server configuration setting – no changes to code at all). Connection Keep-Alive re-uses a TCP connection for multiple requests from a given domain and without it each request needs to make an additional round-trip to establish a connection (and the round-trips are what kills you for performance). Enabling keep-alives for a sitethat has them disabled can cut the load time almost in half (which is a HUGE return given the time it takes to fix the server configuration).
The good news is that things got better from 2008 to 2010 with 62% of sites now leveraging keep-alives for all of their requests (and 85% of sites getting a passing grade). The bad news is that there are still 12% of sites that get a failing grade on the most basic of optimizations. If you’re using a shared hosting provider that will not enable keep-alives – MOVE! If your CDN provider tries to convince you that you don’t need it, get a better provider! And if your ops team can’t figure out how to get it working they should be fired! In a lot of cases it is unintentional because of an old Apache configuration for SSL that breaks keep-alives for IE (it should target only IE4 and below but targeted all of IE).
Text Compression (GZip)
Again, we see an improvement going from 2008 to 2010 but there are still less than 45% of pages that get a passing grade (and “passing” is pretty liberal here because a score of 80 means you could have saved 20% of the weight of your text resources by enabling compression). Hard to believe that for something that is so trivial to implement we aren’t seeing better scores (and that even sites as prominent as cnn.com are failing).
Similar to the GZip optimization check, the Image Compression check sees how much of the page size you can save by “properly” compressing images. This usually comes in the form of being more aggressive with your jpeg images and compressing them so they are smaller (but still look the same). It is not unusual to find photos whose size can be cut in half by better compressing them. This does take a little more work because you have to put the proper processes and guidelines in place in your publishing process but it’s still relatively easy to implement and doesn’t involve any radical surgery on your site.
Breaking the trend, here we see things actually got worse going from 2008 to 2010 with a full 10% fewer pages getting a passing grade.
Cache Static Content
Caching of static content just means that you’re letting the browser’s cache do it’s thing and store files that don’t change instead of requesting them every time the user comes back to your site. To do it right you need to set an expiration date on the resources that tells the browser to keep it and re-use the cached version (we check to make sure it is at least 30 days but longer is even better). This doesn’t mean letting the browser re-validate resources and serving 304’s – that takes just about as long as downloading the content in the first place. This can have an ENORMOUS impact on making pages faster for someone coming back to your site or even navigating across multiple pages. It can be a bit tricky to implement because you usually need to add version numbers to your resources but it is well worth the time.
Things got marginally better from 2008 to 2010 but things on the caching front are still pretty dismal. Only 15% of pages get a passing grade with ~18% of sites failing completely (as in a score of 0, not just an F).
Here we find another case where things are getting worse going from 2008 to 2010. I expect this is largely because of more and more widgets and third-party content being added to sites and because everyone is using the “easy” implementation of just adding the code to the head of the page. Google Analytics did a great job blazing the trail for good implementation code that is non-blocking – we just need all of the widget makers to pay attention and follow suit.
Content Distribution Network (CDN) Use
The final check is a more controversial one and is the only one that I consider optional, depending on the distribution of your customer base. If all of your customers are in the same region (like within a few hundred miles) where your web server is then you can get by without using a CDN. If your customers are spread out any further than that (and particularly if you target the Internet at-large) I highly recommend using a CDN. Fundamentally, once you have done a lot of optimizing to your site your limiting factor will usually be the round trip times from the user to your web server (which will largely be limited by the speed of light and isn’t getting any faster). A CDN brings your static content closer to the users and reduces that time. It might not be something you notice if you are close to your server but someone half-way around the world will see a huge improvement. I usually recommend saving this for last after you have done all of your other optimizations (including things we don’t check for like image sprites) because a CDN can hide a lot of sins by reducing the time for each request.
In 2010 it looks like things are getting better on the low end of things but scores of 70 and higher are right on top of each other. This is probably caused by the adoption of third-party widgets on pages where the widgets are being served by a CDN but the pages themselves are not. It’s pretty sad that in this day and age of inexpensive (and even free) CDN’s only 15% of pages get a passing grade.
I must say that I’m a little bit disappointed by the progress over the past two years. There have been some small incremental gains but they have also been offset by larger, more complex pages. There is only so much that the browsers themselves can do to make things faster and with round trip latencies usually being the limiting factor you will not see faster computers or connectivity helping either. We are starting to see more solutions that transform pages on the server side to make things faster (as well as new protocols like SPDY). If we can get broad adoption on the server side of things hopefully you’ll see things start to materially improve.