Who's Afraid of the Big Bad Preloader?

Who’s Afraid of the Big Bad Preloader?

22ndDec 2013 by Yoav Weiss

ABOUT THE AUTHOR

Yoav Weiss is a Web performance and browser internals specialist, working on responsive design Web performance, image compression and more. He recently implemented the srcset attribute in Blink.

He is an RICG technical lead, a Blink committer, a WebKit contributor and a bass player.

You can follow his rants on Twitter or have a peek at his latest prototypes on Github.

It goes by many names – some call it the preload scanner, others the speculative parser or the look-ahead downloader. Its tales are often passed from senior Web developers to junior ones when the day turns to night and their Web site's resource loading order makes very little sense:

"Have you heard of the preloader? It has sharp teeth all made of regex and it fetches everything it lays its eyes on. I once knew a guy who's cousin had his whole site fetched by the preloader before the server even sent the HTML's response headers!"

And so the legend is passed on from one generation of developers to the next.

But, like many creatures in these kind of stories, the preloader is far from being evil. It is simply often misunderstood.

Let's Go Back to the Beginning

To really understand the preloader's story, you have to go back in time to 2007. At these ancient, simpler times, browsers used to fetch resources, just like today, but in many cases, when a certain element's parsing and execution could have impacted the resources that follow it, the browser simply stopped and waited for the resource to finish downloading before fetching the next ones in line. In practice that meant that if a page contains multiple JavaScript resources, or <script> elements preceded by external CSS resources, the parser would halt and wait for each one of the external resources to finish downloading before fetching the next one in line. As pages became more and more JavaScript heavy, the impact on performance was significant, and devastating.

Browsers were toying with ideas on solving this issue for a while. Then at the beginning of 2008, it finally happened. The IE team decided to do something about it, and went ahead to implement something they called "the lookahead downloader". It's job was to keep the network busy while the real, grownup parser was blocked on executing some JavaScript resource, and keep adding the page's resources to the download queue, so that queue won't get dried up. More or less at the same time, a similar mechanism was added to WebKit and showed some significant performance improvements, and Gecko didn't lag much behind and added something similar a few months later.

So What Does It Do?

When the HTML parser creates the DOM and encounters a synchronous script, it has to stop the DOM creation and run the script. As a bonus point, if there are pending external CSS requests, there's a good chance that the script running would have to wait until these external CSS resources arrive, at least if the script itself needs CSS related info that depends on these resources. Therefore, if resource fetching is done only by the HTML parser when it creates DOM elements, the network will be idle when synchronous scripts are involved. That's especially true for external scripts, but internal scripts can cause a delay as well under some circumstances.

That's where the preloader comes in. It uses the results of an early parsing phase (called "tokenization") in order to look into the various tags that comprise the HTML document, find ones that may contain resources, and accumulates the URLs of these resources. The tokenization output is then sent to the "real" HTML parser, and the URLs are sent to the "fetcher" along with the type of resource that initiated the resource download in the first place. That enables the "fetcher" to attach priorities to the various URLs and download them according to their impact on the page's loading speed.

This is what happens in Blink and WebKit, but the process is fairly similar in the other engines as well.

Which Resources Get Preloaded?

The preloaded tags and resource types actually vary between browsers. A common set seems to be scripts, external CSS and images from the <img> tag.

Blink and WebKit also preload inlined @import rules, Gecko also preloads the 'poster' attribute of video elements. IE may have its own quirks.

Many other resources are not preloaded. The list includes iframes, background images (both inlined or external), Web fonts (both inlined or external), external @import rules (and internal ones in Gecko), image resources from <input>, <object> (and <video poster> in Blink/WebKit) and media resources from the <video> and <audio> tags.

Is There a "Preloader" Specification?

Not really. The preloader is not a standard feature, and each browser is free to do whatever it likes regarding preloading. In theory, some browsers may avoid preloading altogether, while others can use extremely aggressive measures on preloading resources. In practice, most modern browsers are behaving more-or-less in the same way, because preloading shows significant performance benefits.

The preloader's performance benefits vary according to the specific site in question, but on average, there's a consensus that it gives around 20% page load time improvements.

Preloader vs. Responsive Images???

There are some common myths in the Web development community around the preloader and the responsive images problem that keep coming up, and need some debunking.

It prevents Media Query based resource loading

That used to be true, but it is no longer the case. Today's Blink and WebKit's preloader is already evaluating media queries for external CSS (/me sheds a single tear: My first Blink commit). I also have a running prototype showing the preloader loading <picture>'s sources according to Media Queries.

It is possible that future, highly dynamic Media Queries won't be able to run on preloaded resources, but all the ones that matter today are "preloadable".

It prevents Element Query based resource loading

Element Queries are the concept of applying certain CSS rules according to a certain element's dimensions, unlike Media Queries which often refer to general browser attributes or to the viewport's dimensions. That's a cool concept, especially is we want modular responsive design to be feasible, but it is not the preloader that makes that concept hard to implement.

To demonstrate that this is the case, let's imagine a world without a preloader, with element queries, and a responsive images solution that depends on these element queries. Now, in this imaginary world, you are the rendering engine! (Congrats on the promotion)

You have your HTML parser happily parsing the document, when all of the sudden it bumps into an image tag where the resource it needs to download relies on the dimensions of the image element itself. You now need to decide which resource to download according to the current render tree. But you haven't yet created the render tree, since in order to do that you're still waiting on an external CSS to arrive through the pipes.

What's worse is that you know that the final page's layout may depend on other CSS resources, which you're not even aware of, and even images down the page that your parser haven't even encountered yet. This practically means you can't safely download images until we've downloaded all the page's CSS and images (!!!)

So, what do you do now? Which resource do you download? WHICH ONE???

Well, These are the questions we'd need to answer before element queries can become a reality, and as Tab Atkins wrote, this probably involves making such elements independent of their surrounding layout in a way that would avoid loops.

That is a problem that needs to be resolved regardless of the preloader, if we want Element Queries to be a reality. Such a solution would need to take preloaders into account and probably provide some declarative hints regarding the element's dimensions, but that's only one aspect of the problem.

It prevents me from modifying `<img src>` attributes before they are downloaded

That is something that isn't possible, even without the preloader. The image element, when created, immediately checks to see if it has an src attribute, and if it does, it immediately starts to download it, even before the element is added to the DOM.

Those of you that tried old-school prefetching of an image resource using JS may have noticed that there's no need to add the image element to the DOM in order to trigger a download. It is enough to set to src attribute.

The same happens internally. Once the element is created and the src attribute is added to it, a download is triggered.

The preloader prevents me from doing dirty hacks

This one may be true, depending on which dirty hacks you're trying to pull off. Specifically, the preloader makes it impossible to rely on setting cookies with JavaScript and expecting to get them on requests for the page's images. It makes it equally impossible to modify the page's <base> tag, and expect the resources to not be downloaded twice. But these are things that were never guarantied to work in the first place.

Preloader, Shmeloader. Why Do I Need to Know About It?

As a Web developer being aware of the preloader can help you in several aspects.

Avoid hiding resources from the preloader

Resources that are loaded by JavaScript will not be parsed and loaded by the preloader, which means that they're likely to be "last in line" in the resource fetching queue. If you have critical path resources that are being loaded by scripts, you probably want to either move them to HTML tags, or include the loading script inline and early on in the document. (so that the script loading them would be the first script that runs)

Hide resources from the preloader on purpose

If you have resources which you want to make sure are not loaded before all the critical path resources are loaded, you can hide them from the preloader by loading them with JavaScript later on. The Resource Priorities specification will provide a standard, declarative way of getting the same impact, but it's still in its early stages and not yet ready for implementation.

Not be surprised when resources are loaded before your JS ran

If you're trying to influence resource loading using scripts, you're likely to hit race conditions, which may present themselves with subtle differences in each one of the browsers. Knowing that in advance would probably save you a lot of frustration.

Avoid external `@viewport` rules

The preloader's ability to evaluate Media Queries can get significantly degraded if @viewport rules are added in external stylesheets and discovered late in the page's loading process. While preloaders are not taking @viewport into account today in their Media Query evaluation, adding @viewport rules only in their inlined form and preferably before any scripts have run would enable future implementations to take them into account.

Use preloaded tags when applicable

if you have a choice of equivalent HTML tags to load the same resources, (e.g. <img> vs. <object>) you'd want to use the one that the preloader recognize (so, <img> in the example above).

What's In the Future?

Like everything Web-related, there are many aspects of the preloader that can be further optimized in the future.

Better Media Query evaluation support

Today's Media Query evaluation in Blink and WebKit passes the media attribute to the main thread, and performs the evaluation there. True Media Query evaluation on the preloader thread would implementation of media-based resource selection algorithms (e.g. <picture>'s ) and would make them more robust.

Other rendering engines' preloaders currently have no Media Query evaluation abilities. Adding these capabilities can enable getting irrelevant external CSS resources out of the critical path, and enable a Media Query based responsive images solution.

Improved CSS-declared resource preloading

There are many resources that are declared in CSS, and currently are not handled by the preloader. This is an area that can show lots of improvements.

@import rules are currently preloaded only in Blink and WebKit when these rules are inlined. This can be expanded to other browsers as well. Beyond that, external @import rules are not preloaded at all. As Steve Souders pointed out, since @import rules have to be the CSS's first rules, and CSS cannot be evaluated before it has finished downloading in its entirety, there could be a performance win from starting the download of external @import rules as soon as the CSS's top is downloaded and parsed.

Background-images are slightly more complicated since for both inlined and external CSS, you have to make sure that the url() is not inside an irrelevant media query. That would require the preloader to perform CSS parsing, which it currently doesn't do.

Web fonts are a critical asset that is often delayed and currently isn't preloaded at all. The tough part about that is that Web fonts must not be downloaded if they're not being used. That means that in order to preload them, the preloader must perform full CSS parsing and keep track of the classes and ids of tags that it encounters, so that it'd be able to make an educated guess whether a certain font is used or not.

Summary

The preloader is probably the best performance optimization feature in the history of browsers. It ensures that resource download would go smoothly, and won't block, leaving the network idle while there are still resources to download.

While it suffered from bad rep in some circles, knowing how it works can help you master your site's resource loading and make sure it's optimal. So remember kids, the preloader is your friend. 🙂

Web Performance Calendar