When Steve Souders released High Performance Web Sites in 2007, it was the first time anyone had spent the time to explain what the browser cache does and how to take advantage of it to improve your site’s performance. It may be hard to believe, but before that point the browser cache was a source of mystery for many web developers – some were even blissfully unaware that those images, stylesheets, and JavaScript files being requested would somehow end up on their disk drive.
It’s from this tome that we learned to set far-future expires headers to give users the best chance of not going out over the network for every request. The web development discipline has been collectively tweaking this teaching for years without realizing that as the web has evolved, so has the role of the browser cache.
The way people use the web and the way we develop for the web has been constantly changing. However, there are three very specific changes that point to a need to re-evaluate how we think of the browser cache.
Tabbed browsing
Back in 2007, Internet Explorer was completely dominating the browser market. Internet Explorer 7 had just been released in late 2006, and as there was no browser auto-updating, adoption was going slowly. The good news was that Firefox was finally starting to eat into Internet Explorer’s browser share. All this meant that the majority of users were still using a non-tabbed browser when Souders was writing his book.
Tabbed browsing is so normal today that it seems strange to think of a time when that wasn’t the norm. Yet for year, with Internet Explorer 6’s strangehold on the browser market, that was the way most people used the web. It was quite rare to see a non-developer with multiple browser windows open. Most people were used to visiting one site, then clicking on a bookmark to visit the next. Maybe they would hit the back button to get back to that first site at some point.
The value of the browser cache in that world was that files would be in the cache so you could visit a site in the morning and have it load faster in the evening or if you navigated page-to-page . Also, if you visited the same site every day, the browser cache sped up that experience. The total number of sites you visited was probably relatively small. After all, who could keep track of a long browser history?
When tabbed browsing was introduced, this allowed users to embrace a new mode of using the web: leaving multiple tabs open all day. If you never leave the site, then the role of the browser cache becomes a bit muddy – how is it helping now?
Ajax and single page apps (SPAs)
Once there were tabs and people started leaving sites open for the whole day, it became important for those sites to automatically update themselves. After all, if you could remove the need to hit the refresh, then you could just switch tabs to see new information all day long. Ajax allowed developers to pull in new information without reloading the entire page and that led to the creation of single page apps (SPAs).
Arguably the first SPA that to make a big impact was Gmail. You could easily see new messages as they came in, even reply or create new messages, all without ever reloading the page. So you would just load up Gmail once and leave it open in a browser tab for hours or days, and it would happily keep working.
As SPAs began to proliferate across the web, the role of the browser cache has gotten a bit muddier. If I’m always leaving these pages open, and they are always updating automatically, what is the browser cache doing for me? As I’m writing this, I have four different Gmail accounts open in separate tabs and two different Twitter accounts open in separate tabs, and those tabs have been open for at least the past week. Periodically I’ll be asked to login again, and then I’ll go through a full page refresh, so I suppose the cache is helping there. Or is it? Are those files still in cache days and weeks after I initially loaded those tabs? And is it making a big enough difference given how I’m going to use them?
Continuous deployment
The last big change that really altered the role of the browser cache is continuous deployment. When I started at Yahoo in 2006, we released My Yahoo and the Yahoo homepage every two weeks (unless there were emergencies). That meant nearly every static file was guaranteed not to change for two weeks, which in turn meant that when those files ended up in the browser cache, they would be useful for a while. Come to the page one day and then not again for a few days, you still get the benefit.
Today, continuous deployment is everywhere. Companies start by doing daily releases and some release dozens of times during the day. If you are pushing out changes to your JavaScript, CSS, and images with each push, the cacheability of those files becomes compromised. How long are they valid for? A day? A few hours? A few minutes?
In the world of SPAs, where the full page is only being loaded once every other week or so, it creates a synchronization problem: how can you be sure the user has the most up-to-date version? At what point are there just too many changes and you need to force a full-page reload to get the new files?
After talking with some companies who do continuous deployment, I heard a common thread: they just don’t worry about the browser cache all that much anymore. Everyone is still setting far-future expires headers, but few stress over the cache hit rates in their modern SPAs.
The browser cache today
If I was asked to succinctly explain the role of the browser cache today, I’m not sure I’d be able to do it. There are so many files changing at such a rapid pace, and so many SPAs that are in use on a daily basis, that I don’t know there’s an easy answer. What I do know is that the browser cache’s role today is pretty different as compared to its role in 2007, and it will continue to evolve as technologies like service workers are implemented.
Given the changes, both past and future, it seems like a good time to start questioning our best practices around caching once again. The tabbed, SPA, continuous delivery world is looking for answers.