Third party content in an HTTP/2 and ad blockers world

15thDec 2015 by Yoav Weiss

ABOUT THE AUTHOR

Yoav Weiss (@yoavweiss) does not get discouraged easily and is not afraid of code. He is a Web performance and browser internals specialist, especially interested in the intersection between Responsive Web Design and Web performance.

He has implemented the various responsive images features in Blink and WebKit as part of the Responsive Images Community Group, and is currently working at Akamai, focused on making the Web platform faster. You can follow his rants on Twitter or take a peek at his latest prototypes on Github.

When he's not writing code, he's probably slapping his bass, mowing the lawn in the French country-side or playing board games with the kids.

Summary for the impatient

Third parties cause performance and security issues.
Some of the impact can be mitigated today.
Lack of control over third party content is damaging the ecosystem.
- Users gain control by installing ad blockers
- Embedders gain control by moving to formats such as Instant Articles, Apple News and AMP.
- Browsers need to enable content providers to take back control over their user’s experience.

Once upon a time

In the world of Web performance, third party components have been notorious for a long while. They may add single points of failure, security risks and tend to increase page weight.

But, things have changed in Web performance land that have made the impact of third party components even worse than it previously was.

HTTP/1.0 was a pretty bad network protocol, and severely limited the parallelism of requests, by serializing requests over a connection, and enabling only a limited number of connections per host. As I’m sure you know, one of the best-practices to get around that fact was domain sharding – rewriting your resource URLs over multiple hosts, so that the browser would open more connections and send the requests in parallel. While there are certainly downsides to doing that, in a way that performance hurdle mitigated third party performance. Third party resources were fetched from alternative hosts, so they opened more TCP connections, being as fast or sometimes even faster than the first party resources that were waiting in the queue to be sent up.

Then HTTP/2 came along

Fast forward to today and HTTP/2, and the world is different. HTTP/2 fixes this fundamental protocol issue, and enables true parallelism and multiplexing. The browser can now just send all the requests it has on a single connection, and the server can then send the responses back on this warm TCP connection according to their priority. Ideal.

But third party resources don’t play nice with that scheme. They have to be fetched using TCP connections of their own (since they are hosted elsewhere), incurring extra DNS and TCP slow start. Furthermore, since HTTP/2 requires HTTPS, there’s an added TLS handshake. All that means that third parties are now an even bigger performance burden than they once were.

So, why include these third parties?

Well, as much as all developers know that third parties are often bad news in terms of performance and security, quite often, the business folks don’t care. Third party components are often used for analytics, monetization (read: user tracking and ads) and for other activities that don’t necessarily help the users immediately, but are important for the bottom line and to keep everyone employed, fed and with a roof above their heads.

That means that when it comes to third party components, we’re at a “can’t do with them, can’t do without them” kinda situation.

So, what are the problems?

Let’s expand some more on the problems that publishers encounter today with third party content.

The first point, that we already touched upon, is that third party content cannot be fetched on the same warm HTTP/2 connection that other resources are fetched on. Instead, the browser has to send out a DNS request to figure out the third party’s server address, and then establish TCP and TLS connections to that server, which can typically take up to 3 round-trip-times, or ~1200ms on 3G. But even once that is over, we’re not done yet. This brand new TCP connection will suffer from a couple of factors disadvantaging the download of the third party resource. It will have to go through the slow-start phase of the connection, in order for the server to not flood the network. On top of that, since the server sending the site’s content and the server sending the third party resource are different, these resources are likely to contend on the network. That means that HTTP/2 prioritization only applies to first party resources.

Other than that, when embedding a third party resource, the first party is delegating control to the third party. There’s no middle ground. The decision of the first party is binary, and once they embedded the third party, it can do pretty much anything. Security-wise, in many cases it can run scripts in the context of the parent document, even if in theory that can be prevented and limited by using sandboxed iframes in order to embed these resources. But bandwidth-wise, there is no sandbox. Third party scripts can download large images and videos that will contend on bandwidth with the site content, with the content provider having no say in the matter.

A third party script can also spin up the CPU on the main thread, delaying execution and janking UI of the entire site. That means that the site’s choice of third parties can severely impact the user experience on the site, as well as the user’s monthly bill and battery life.

Third parties often also trigger download of other third party resources, which can then trigger download of even more third party resources. Quite the third party party! All that can create staircase waterfall patterns, that are completely out of the content provider developer’s control.

Another point we touched upon is that third parties can often be a “single point of failure” (SPOF), rendering the entire site unusable. That often happens with blocking scripts and styles, as well as third party hosted fonts in some browsers.

What can we do about it?

What techniques exist today to try and mitigate that damage?

Async

The most well-known and broadly supported way to mitigate the damage of third parties is to load them in an asynchronous manner. If you third parties are not critical to your content rendering, you can even delay their loading past the onload event, minimizing their performance impact further.

The catch here is that the later you load them, the later they would run (duh!) which means that some users would leave your page before the third parties are executing. The may be acceptable from a business perspective in some scenarios, but less so in others. (e.g. a decrease in ad impressions that results in a decrease in revenue)

Preconnect

A recent addition to the world of Web performance is <link rel=preconnect>. It enables the browser to know ahead of time about third party hosts (and further hosts that they would require), and establish the connection to those hosts in an out-of-band manner, minimizing the impact they have on the critical path. It doesn’t solve the issues that third parties raise, but it can mitigate some of the negative effects, specifically, the connection establishment part.

This technique is currently supported in Chromium and Firefox, and hopefully more browsers will join soon.

Preload

Taking that mitigation concept a step further, when the third parties are known in advance and are critical for the page’s rendering, we can make sure that they would be discovered earlier by using <link rel=preload>. This further mitigates the performance impact of TCP slow-start, on top of connection establishment. On the other hand, it could make the “bandwidth contention” part worse, by having your third parties contend with more critical page resources. Therefore, it should be used carefully. Make sure to measure the impact over various network conditions!

This will be supported in Chromium soon (I’m working on it!), and hopefully other browsers will adopt it not long after.

Timeout enforcement with Service Workers

As we’ve seen in a previous entry here, Service Workers can be used to ensure that third parties don’t take too long to download, and therefore remove the SPOF threat, even for blocking resources.

CSP and frames

Up until now we discussed how the performance implications of third parties can be mitigated. But what about the security concerns?

In order to avoid third parties run in the context of your page, you need to load them via an iframe. That makes sure that they are run inside their own browsing context and cannot directly access the main page. You can further limit them by using the sandbox attribute, making sure they cannot do various operations that can be disruptive to the user experience or even malicious.

An emerging standard will also enable pages to enforce a certain CSP policy on third parties, making sure that third parties are secure from XSS attacks, and don’t expose our users to further risks.

The main problems here is that many of the third parties out there are not necessary amenable to being iframed, sandboxed and don’t necessarily use CSP. That means that using that approach in practice can be challenging.

Third parties gone wild

In recent years, we’ve been seeing publishers adding more and more third party resources to their sites, up to a point that in some cases, the published content is a minority among ad and tracker content. That have reached a point where users are paying more for mobile ads in bandwidth costs than what the publishers are making on them.

Not unexpectedly, the implications of that were that users started fighting back.

A graph showing ad blocker install base growing exponentially

Desktop based ad blockers are on the rise and with iOS 9’s content blockers, they have now also reached mobile and the mainstream. If in the past the percentage of your audience that was using ad blockers was fairly limited, chances are that today, this is no longer the case.

The third parties have also realized that, and they are now repenting their sins.

The other part of that equation is that content embedders also started to push back, for obvious reasons. If you are using their fast and smooth app, but every link you follow is slowed down by tons of junk, you’re going to have an overall bad time, which impacts their brand for no fault of their own. That have lead to Facebook’s Instant Articles and Apple News.

The semi-open equivalent of that is Google’s AMP project, which tries to enforce similar barriers on content, making it faster, and more than that, tries to makes sure it has predictable performance. While that project does many great things, if you’re reading this, you most probably know that your site doesn’t have to be AMP in order to have great performance!

Browsers have also paid attention to this trend and are doing more to protect their users from abuse. A recently proposed “user agent intervention” in Chromium suggests to block blocking resource loads that result from a document.write() in some conditions, as a heuristic to limit content that is loaded by badly-written third parties, delaying rendering of the entire page. My guess is that more such “interventions” are to come.

Therefore, I believe we need better enforcement mechanisms that will enable the browser to guaranty given limits on what third parties can and cannot do.

CSP may give us some of that, by enabling us to enforce iframes to a certain policy, limiting what domains the browser can then connect to.

While that’s great, it is hardly enough. I believe we need more policy enforcement mechanisms that would enable us to:

Dictate a number of domains, even if we don’t know them beforehand.
Dictate “content allowance”, providing download quotas for render-blocking content, as well as for non-blocking content.
Allow or disallow media download and playback (e.g. video/audio ads).
Enforce limits on processing time of iframes.
Probably more!

All in all, we need to enable third parties and content providers that want to Do The Right Thing™ to do that, and more importantly, to declare that this is what they are doing, so that content embedders and ad blockers would know it, and give them credit for it.

Smarter ad blockers could use that to determine which third parties they allow and which they do not. Smarter embedders could use that mechanism to determine which links they display more prominently, since they know the user experience as far as performance goes is guarantied to be good.

Such a mechanism can also be beneficial as real-time enforcement of performance budgets, enabling developers to make sure they stay in-line with their site’s desired performance, over time.

I’m afraid that without such mechanisms in place, we would see more and more users take performance into their own hands, regardless of their favorite content provider’s business model.

Summary

Third party content presents many performance challenges, more so today than it did a few years back. But at the same time, it’s an important part of the Web’s ecosystem.

We need to provide developers the tools to take control over their user’s experience, so that third party content will not be an anti-thesis to performance. Otherwise, the users will fix it for us, by blocking third party content altogether.

Web Performance Calendar