Clearing cache in the browser

9thDec 2017 by Andrew Betts

ABOUT THE AUTHOR

Andrew Betts (@triblondon) is a web developer and principal developer advocate for Fastly, working with developers across the world to help make the web faster, more secure, more reliable and easier to work with. He founded a web consultancy which was ultimately acquired by the Financial Times, led the team that created the FT's pioneering HTML5 web app, and founded the FT's Labs division. He is also an elected member of the W3C Technical Architecture Group, a committee of nine people who guide the development of the World Wide Web.

Caching assets in the browser is the most common and most obvious way to improve front end performance. But at some point every developer accidentally makes a bad release of an asset with a long cache lifetime. There is a way back! Here’s how to throw the kill-switch.

If you’re a web developer, you, like me, have probably reached that moment in your career when you accidentally shipped a bad release of a front-end asset. And you gave it a cache lifetime of 30 years. Bad news. Your users are screwed until they manually clear their cache. Or are they?

I had breakfast recently with Steve Souders, and this being Steve, the breakfast lasted 3 hours and I left with brainache. One of the things Steve prompted me to think about was this problem: how do we invalidate an object in browser cache, before its cache freshness expires. His worry is less about shipping the wrong cache lifetime by accident but planning to be able to update his Speedcurve LUX script quickly without having to give it a short TTL. Turns out there are lots of ways of doing this.

In all of these solutions, I’m assuming you know the URL of the asset you want to purge, and that your app is still making at least some kind of request to your server for something in which we can embed executable JavaScript, so either a script or an HTML page.

location.reload(true)

Our first solution is one Steve and Stoyan came up with in 2012, and takes advantage of the fact that the reload() method of the location object takes a forcedReload boolean param, which MDN notes:

Is a Boolean flag, which, when it is true, causes the page to always be reloaded from the server

Will it load all the page’s resources from the server regardless of whether they are currently in cache, or just the top document?

Since we won’t want to interfere with what the user is doing by visibly reloading the top level document, we’ll want to use an iframe for this. So, in a piece of script in the top level document, we can do:

const ifr = document.createElement('iframe');
ifr.src = "/forcereload?path=/thing/stuck/in/cache";
ifr.classList.add("hidden-iframe");
document.body.appendChild(ifr);

Then, in the /forcereload response:

<iframe src="/thing/stuck/in/cache"></iframe>
<script>
if (!location.hash) {
  location.hash = "#reloading";
  location.reload(true);
} else {
  location.hash = "#reloaded";
}
</script>

So, to make this work we have to create an iframe, load an HTML document unrelated to the thing we want to invalidate, then load it again, along with also twice loading the thing to invalidate (although the first of those will be from cache). This is pretty bad. Added to all that, you’re left with an iframe attached to the document that you’ll want to clean up somehow, probably with a postMessage from the frame up to the parent to tell it that it can now remove the frame. And as Philip Tellis points out, an ancient but non-auto-updating version of Firefox will go into an infinite reload loop.

Turns out, this doesn’t even behave the way we think it does anyway. The forcedReload argument, whilst documented by MDN, isn’t technically part of the spec for the location interface, and no browser changes whether they perform a network fetch (at least in relation to subresources) based on the value of that argument. However, browsers do vary their behaviour for reload() itself. Chrome always loads the subresource from cache. Firefox, Edge and Safari always load it from the network.

The only effects the forcedReload argument has, seem to be:

In relation to the document itself (the ‘reloader’ iframe in our technique), forcedReload prompts this to be fetched over the network in Firefox if it would otherwise be fetched from cache. All other browsers always reload the document from the network.
In relation to subresources (like the script we’re trying to update), if the browser makes a network request for the reload (all except Chrome), then setting forcedReload will prevent conditional requests being made if any of the resources being reloaded have ETag or Last-Modified headers. In Chrome, there’s no impact of forcedReload here – either way, no network fetch is made.

Another disadvantage of this technique is that there’s realistically no way of preventing spurious entries being added to the browser history.

This is the solution Steve uses, and the test case he created for it in 2012 doesn’t work today in Chrome, confirming what I found in my testing. It seems we can put this down to a change in Chrome’s behaviour. Since this argument is not in the spec it’s not technically a bug but I can imagine people might have implementations of this technique in the wild and it’s a shame it no longer works.

Vary + fetch

OK, let’s move on to a potentially better option. I’m a bit obsessed with the Vary header, and I think we can use it here. All browsers implement it, and they use it as a validator, not as a cache key, which means that if a varied header value changes, the existing cached object will be invalid for the new request, and any new object downloaded will replace the object already in cache (this behaviour differs from CDNs and other ‘shared caches’, which will store multiple variants of the same URL).

So let’s set a Vary header on all responses from the server, varying on something that doesn’t exist:

Vary: Forced-Revalidate

This will have no effect because browsers don’t send a Forced-Revalidate header. But fetch can:

await fetch("/thing/stuck/in/cache", {
  headers: { "Forced-Revalidate": 1 },
  credentials: "include"
});

So, what is happening here?

We make a request for /thing/stuck/in/cache, and it finds a hit in the cache, but the cached object is varying by Forced-Revalidate with a key of “” (empty string). Our new request carries a Forced-Revalidate value of 1, so it doesn’t match. We also include credentials with the request to ensure that the response can be used for a normal navigation request.
The request is sent to the network. The server returns the new version of the file and still includes Vary: Forced-Revalidate
The browser overwrites the existing cache item with the new one, which is now only valid for requests that have a Forced-Revalidate: 1 header.

But wait. Now the item in the cache will only match future requests that have a Forced-Revalidate header. The next time the browser has an ordinary reason to load this file, as a navigation or a subresource, it won’t send the special header, and we’ll miss the cache again. However, this time, the downloaded response will have a vary key of “” (empty string) and is back to being useful.

This is better, with Edge, Chrome, Firefox and Safari all behaving correctly here, for same-origin resources. Firefox splits the cache for cross-origin fetches vs navigations, so it won’t clear the navigation cache. And it’s possible that in future, browsers will start to store multiple variants, making this technique ineffective. Still, one line of JavaScript, a slightly weird bit of HTTP metadata, and you do still end up having to load the item twice, but there’s no iframe and this code is pretty maintainable.

Of course, ideally there’d be something you could put instead of headers: { "Forced-Revalidate": 1 } to just tell fetch to skip the cache directly…

fetch + cache:reload

Which brings us to the cache property of the Fetch API’s Request object. This is easily the most simple and “correct” way to solve the problem:

await fetch(
  '/thing/stuck/in/cache', 
  {cache: 'reload', credentials: 'include'}
);

The 'reload' cache mode tells fetch to ignore the cache and go directly to the network, but to save any new response into the cache. As before, we include credentials so that the fetch is (supposedly) treated the same as a normal navigation for caching purposes. The new response is immediately usable for any future requests, and you don’t need any crazy headers or iframes or anything.

Sounds perfect! Well, right now this works in Edge, Firefox and Safari, and Chrome is nearly there (works perfectly in Canary, but hasn’t made it to stable yet). Support for this for same-origin resources is much better than I expected, actually, and MDN’s support table was out of date, so this has probably landed in Safari and Edge very recently.

And yet. In Safari, this will only clear the fetch cache, and while navigations can populate fetch cache, the reverse is not true. Also, Edge is the only browser to support this cross-domain.

fetch + POST

Time to roll out some bigger guns. POST requests invalidate cached content for that URL:

A cache MUST invalidate the effective Request URI (Section 5.5 of [RFC7230]) as well as the URI(s) in the Location and Content-Location response header fields (if present) when a non-error status code is received in response to an unsafe request method.

The question is, do browsers honour this, and does the browser cache the response? Let’s see, using fetch to generate a programmatic POST request for the stuck URL.

await fetch(
  '/thing/stuck/in/cache', 
  {method:'POST', credentials:'include'}
);

We’ll have to live with a preflight request, because its an unsafe method and we’re including credentials. It also turns out no browser caches the result of the POST, even though it is advertising itself as cacheable (or if they do, they don’t use it to satisfy a subsequent GET). So even if we do see an invalidation, it’s going to take a minimum of 3 requests to repopulate the cache.

With that caveat, Chrome and Edge do well here, with their single view of the cache producing an invalidation for both same and cross origin content, both for fetch and navigations. Firefox and Safari follow the same pattern we’ve seen before, of splitting navigations and fetches into separate caches, so the POST clears the fetch cache, but if your stuck object is a subresource, you’re out of luck.

POST in an iframe

Oh well, in for a penny, in for a pound, so let’s throw a FORM into an IFRAME and do a POST in there. I know, I’m sorry. Desperate times.

const ifr = document.createElement('iframe');
ifr.name = ifr.id = 'ifr_'+Date.now();
document.body.appendChild(ifr);
const form = document.createElement('form');
form.method = "POST";
form.target = ifr.name;
form.action = '/thing/stuck/in/cache';
document.body.appendChild(form);
form.submit();

Obvious side effects: this will create a browser history entry, and is subject to the same issues of non-caching of the response. But it escapes the preflight requirements that exist for fetch, and since it’s a navigation, browsers that split caches will be clearing the right one.

This one almost nails it. Firefox will hold on to the stuck object for cross-origin resources but only for subsequent fetches. Every browser will invalidate the navigation cache for the object, both for same and cross origin resources.

Clear-Site-Data

We started ugly, found perfection, and then discovered perfection wasn’t all it was cracked up to be, and ended up ugly again. So it seems apt to end our story with an option that could be subtitled ‘nuke it from orbit’. Meet Clear-Site-Data, the new web developer’s weapon of mass destruction.

No matter what URL you want to purge, you can simply return this response header in response to ANY request on the target origin:

Clear-Site-Data: "cache"

And bang, your cache is gone. And not just the thing you wanted to purge either. The entire cache for your origin is toast. Which might just save your bacon in a pinch.

Another advantage of this method is that you don’t need to be in a position to run any client side JavaScript, so you can even send this in response to an image or stylesheet request. It’s glorious in its lack of sophistication and brutal efficacy.

Discussions of this feature go back several years but it’s just now starting to appear in Chrome, though at time of writing, it’s been temporarily disabled due to… reasons. So it doesn’t work in any browser right now. Boo.

Conclusion

OK, so in summary, in what situations do browsers make network requests that invalidate the cache used by subresources?

Technique
location.reload	doc, forcedReload, same-origin	Yes	Yes	Yes	Yes
	doc, normal, same-origin	No	Yes	Yes	Yes
	doc, forcedReload, cross-origin	Yes	Yes	Yes	Yes
	doc, normal, cross-origin	No	Yes	Yes	Yes
	resource, forcedReload, same-origin	Yes	Yes	Yes	No
	resource, normal, same-origin	Varies [1]	Yes	Yes	No
	resource, forcedReload, cross-origin	Yes	Yes	Yes	No
	resource, normal, cross-origin	Varies [1]	Yes	Yes	No
Vary + fetch	same-origin	Yes	Yes [3]	Yes	Yes
Vary + fetch	cross-origin	No [2]	Yes [3]	Yes	Yes
cache:reload	same-origin	Yes	No [4]	Yes	Yes [5]
cache:reload	cross-origin	No [2]	No [4]	Yes	Yes [5]
Fetch + POST	same-origin	Yes	No [4]	Yes	Yes
Fetch + POST	cross-origin	No [2]	No [4]	Yes	Yes
Iframe + POST	same-origin	Yes	Yes	Yes	Yes
Iframe + POST	cross-origin	Yes [6]	Yes	Yes	Yes
Clear-Site-Data		No	No	No	No

[1] Hits network unless resource has Cache-Control: immutable
[2] Splits fetch/navigation caches for foreign origins, so will not clear the navigation cache
[3] The fetch will invalidate both navigation and fetch caches but a subsequent fetch will not re-populate the navigation cache.
[4] Does not clear the navigation cache, only the fetch cache
[5] Supported in Chrome Canary today
[6] Does not clear fetch cache

There are other caches and storage capabilities in the browser which we don’t address here, such as the Service worker Cache API, but I focused here on dealing with the cache that we target with Cache-Control HTTP headers. Clearing other kinds of storage merits another post for another day!

So, in conclusion, if you want to invalidate a script or other subresource, I would use the Iframe + POST technique today, which works in all browsers for both same-origin and cross-origin.

The “correct” way is really cache:reload, so hopefully Safari and Firefox will change their behaviour in future to allow that technique to be more practically useful.

Web Performance Calendar