The Dangers of data URIs

22ndDec 2020 by Andy Davies

ABOUT THE AUTHOR

Andy Davies (@andydavies) is a Freelance Web Performance Consultant and has helped some of the UK's leading retailers, newspapers and financial services companies make their sites faster.

He's co-author of 'Using WebPageTest' published by O'Reilly and author of 'The Pocket Guide to Web Performance'.

Andy is also one of the organisers of the London Web Performance Meetup.

In the December 2018 edition of the Performance Advent Calendar, Doug Sillars reminded us that base64 encoding is a performance anti-pattern.

But yet, anecdotally, over the last few years I’ve noticed the number of sites with badly implemented data URIs seems to be increasing rather than decreasing.

And most of the increase seems to be driven by build tools, and perhaps Web Pack in particular.

A Quick Recap on data URIs

data URIs enable us to embedded one type of resource within another by replacing the address of the resource with the actual byte stream representation of the resource.

For example, replacing the URL for a red square:

<img src="redsquare.png">

With the actual bytes of the png image, base64-encoded:

<img src="
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABkAAAAZAQMAAAD+JxcgAAAAA1BMVEX/AAAZ4gk3AAAAC0lEQVR4AWMYSAAAAH0AAVFwgb4AAAAASUVORK5CYII=">

Embedding the resource directly removes the latency that’s incurred when the browser makes a separate request for the resource and so for critical resources it can make pages render faster.

But data URIs also come with quite a few trade offs too.

Trade Offs

Prioritisation

Embedding a resource within another overrides its download priority – the embedded resource gets downloaded at the same priority as the one that contains it.

That might be the outcome we desire – embedding a logo directly in the HTML document might improve rendering times.

But embedding a low-priority resource, say a favicon, means its bytes will actually delay our higher priority content.

Caching

The embedded resource cannot be cached separately and will get the same caching lifetime as its container.

It also means that a resource, for example a logo, might be cached multiple times e.g. for every page it’s included in, rather than once and then reused.

Sizes

Converting binary resources to base64 typically increases their size, so we need gzip or brotli compression of the containing document, stylesheet, or script to mitigate that increase.

There’s a risk that by inlining an image or a font into a stylesheet, we’re making a render-blocking resource larger, and so slowing down our overall page.

Redundancy

A data URI is going to be downloaded whether it is used or not, and where bundlers naively inline multiple images the browser is downloading bytes that it would normally delay (or never download when the image isn’t used)

Opaqueness

data URIs are opaque, it’s hard to tell what resource is being encoded. Often the quickest way to discover what the data URI represents is to paste it into a browser’s URL bar.

Things to make you go Hmm…

There are many ways in which data URIs can hinder rather than help performance, to give you some concrete examples here are some cases I’ve come across over the last few years.

Seven Copies of a Logo

A few years ago, I was auditing a retailer’s site and discovered seven copies of their logo embedded in their main stylesheet along with several other duplicated images.

As data URI’s are opaque it can be difficult to recognise when there are duplicates.

Embedded a logo in a stylesheet is an example of reasonable data URI usage, the logo will be discovered relatively quickly and as the stylesheet is shared between pages, only one copy of the logo will be cached.

I’d probably choose to embed the logo in the HTML response rather than making the stylesheet larger though.

Covid Dashboard’s Favicons

Early on in the pandemic, the UK Government introduced a public dashboard to track Covid cases and it was built by an external agency in React.

Using DevTools to search the app for data URIs finds 17KB of embedded Apple Touch Icons – four different dimension pngs base64-encoded into the script bundle.

govuk-apple-touch-icon-152x152.png
module.exports = "data:image/png;base64,iVBOR...

govuk-apple-touch-icon-167x167.png
module.exports = "data:image/png;base64,iVBOR...

govuk-apple-touch-icon-180x180.png
module.exports = "data:image/png;base64,iVBOR...

govuk-apple-touch-icon.png
module.exports = "data:image/png;base64,iVBOR...

The icons are being shipped to everyone, even if the visitor isn’t using an iOS device. When a visitor is using an iOS device, Safari is only going to use one of the icons so the other three are redundant. Touch icons are one of the lowest priority resources but as they’re embedded in a script bundle they’ll be fetched with a high priority.

This is a a prime example of where normal images rather than a data URI should be used!

Redundant Image Downloads

Another example of redundant images can be found in peopledatalabs.com main script bundle.

It appears references to images that are smaller than 10KB are replaced with data URIs.

So in this case, multiple small images are embedded, adding about 30KB to the bundle size.

If there were more small images they would also be embedded increasing the bundle size further.

, function(e, a, t) {
    e.exports = t.p + "static/media/3d-quering-data.00893ade.png"
}
, function(e, a) {
    e.exports = "data:image/png;base64,iVBOR..."
}
, function(e, a) {
    e.exports = "data:image/png;base64,iVBOR..."
}
, function(e, a) {
    e.exports = "data:image/png;base64,iVBOR..."
}
, function(e, a, t) {
    e.exports = t.p + "static/media/Sales & Marketing Tech.06eed599.png"
}

I couldn’t find where these images were used on the site and the browser has no choice but to download them as they’re contained in a script.

If they had remained as URLs the browser wouldn’t download them unless they were actually used.

Visual WebSite Optimizer’s Experiments

I’ve had parts of this post in draft for a few months but it was a client’s recent challenges with Visual WebSite Optimizer (VWO), the A/B Testing tool, that finally prompted me to finish it.

My client had noticed their Lighthouse scores were slowly getting worse but they weren’t quite sure why.

After some investigation we identified VWO was inlining images used in different experiments, and as the client added more experiments VWO’s script grew larger. With the knock on effects of the script taking longer to download and the anti-flicker snippet hiding the page for longer.

For experiments with multiple variations, base64 images were included in the code for each variation, sometimes they would be the same image duplicated, other times they were different images.

The fix was to force VWO to reference the images using a URL but it’s an example of how â€˜optimisations’ in 3rd-party tools can cause performance degradations.

If you’d like to see an example of this behaviour open https://www.suumocounter.jp/ in a browser and using DevTools search the VWO scripts for base64.

Feefo’s Fonts

Most retailers include reviews on their product pages, some use their own service, others use a 3rd-party service such as BazaarVoice, TrustPilot, Feefo etc.

I stumbled across Feefo’s base64-encoded fonts while working with a retailer last year.

Feefo split their tag up into several chunks. One of those chunks injects a 500KB style element into the page and approximately half style element is base64-encoded fonts.

240KB of FontAwesome glyphs:

@font-face {
font-family: FWFontAwesome;
src: url('data:application/vnd.ms-fontobject;base64,...');
src: url('data:application/vnd.ms-fontobject;base64,...') format("embedded-opentype"),
     url('data:font/woff2;base64...') format("woff2"),
     url('data:application/font-woff;base64,...') format("woff"),
     url('data:font/ttf;base64,...') format("truetype"),
     url('data:image/svg+xml;base64,...') format("svg")

31KB of Fontello glyphs:

@font-face {
font-family: fontello;
src: url('data:application/vnd.ms-fontobject;base64,...');
src: url('data:application/vnd.ms-fontobject;base64,...') format("embedded-opentype"),
     url('data:font/woff2;base64,...') format("woff2"),
     url('data:application/font-woff;base64,...') format("woff"),
     url('data:font/ttf;base64,...') format("truetype"),
     url('data:image/svg+xml;base64,...') format("svg")
}

Each @font-face declaration includes the same font six times in various different formats!

Our visitor’s browser is only going to select one of these – woff2 in modern browsers, woff in slightly older browsers and perhaps eot, or ttf in very old browsers – so the other five formats are wasted bytes.

Those wasted byes increase Feefo’s CDN costs, consume visitors bandwidth and potentially affect the user experience of the sites that use them.

This issue was flagged up to Feefo over a year ago, but unfortunately they’ve still not fixed it (and they’re not the only 3rd-party bundling redundant base64-encoded fonts).

My guess is Feefo’s build process is naively inlining all the external resources referenced via CSS’ url() function (there’s a few base64-encoded images too).

Ideally Feefo would replace the icon fonts with SVG, but even if they can’t they should switch to embedding just woff format fonts and subset them to remove any glyphs that aren’t used.

Emarsys’ Source Maps

Last year a client was considering starting a loyalty program using one of Emarsys’ products, and asked me to review the implications for site speed.

As I was reviewing the tags’s impact and ways we could minimise it, I noticed the scripts had embedded source maps which added 600KB to the uncompressed size of the bundles.

Fortunately Emarsys have since fixed the issue, and even reduced the size of their scripts further still.

In this case, the root cause appeared to be that the bundle was being produced using development rather than production settings.

Looking at other sites it’s not uncommon to see embedded source maps, sometimes it’s for whole bundles, other times it’s individual components that have been imported from npm etc.

Shipping embedded base64-encoded source maps may make life easier when debugging, but ultimately it makes our visitors’ experience slower and more expensive.

Avoiding Others’ Mistakes

Anecdotal examples of the many ways we can get data URIs wrong and harm our sites performance might be interesting, and generate a few wry smiles.

But, what practical steps can we take to avoid our sites and components becoming examples in a future edition of Performance Advent Calendar?

Check the Output of Build Processes

I’m sure that the majority of the large / redundant data URIs I find are the result of configuration issues with build processes, or naive choices by build tools.

Automated build processes are great until they â€˜go wrong’ so check what they’re actually producing and what’s actually being included in the final bundles.

Tools like Source Map-explorer or WebPack Bundle Analyzer, can help with this but all too often I find myself grepping build artefacts. What we really need is better audit capabilities, for example checks for inline source maps, and excessive data URI sizes in Lighthouse, and similar tools.

A 3rd-party service like DebugBear, or one of the other performance monitoring tools, can help police the size of the bundles generated by build tools.

If you work on a build tool, or any other tool that can generate data URIs, adopt an approach of safety first and make it hard to create redundant data URIs, large ones or indeed many small ones.

Be Selective About Resources to Inline

As the earlier examples illustrated, there are many ways data URIs can hinder rather than help performance.

Determining which resources should be embedded is about how they are used and the role they play in the rendering process and making our pages and apps feel fast.

So when choosing which resources to embed keep in mind that it’s not about the resource type, or their size but where they fit into our user experience.

It might be acceptable to embed a large hero image on a landing page, but it’s probably never acceptable to embed a favicon no matter how small it is.

Test the Impact of Embedding Resources

The whole point of embedding resources is to make our sites and apps faster, so use tools like Lighthouse, WebPageTest or their commercially equivalents to check that they’ve actually made pages faster, and don’t forget to test on multiple devices!

Also consider the alternatives, for example might preloading the resource be a better choice than embedding it?

Embedding a resource means we should only send one version of it and with for images for example, that means one size and format. We lose the ability to send different sizes to different viewports, or different formats based on the visitors browser, and we also lose the capabilities of any CDN-based image optimisations were using.

Preloading increases the download priority without altering a resource’s caching characteristics, and still enables the browser to choose the most appropriate image, use a CDN to optimise it etc.

Optimise Resources before Encoding them

Once we start using data URIs we lose many of the default features that come with browsers, and our delivery infrastructure.

We become responsible for choosing the most appropriate format for a resource. That choice might be informed by the contents of the resource e.g. jpg vs png vs SVG etc. for images, or it might be informed by the range of browsers we want to support e.g. woff vs woff2 for fonts.

It’s also up to us to control the size of the resource via image optimisation, or font subsetting so that it is as small as possible.

Ship Source Maps as External Files

Inline source maps might make development easy and avoid the extra work of upload them to error tracking tools but as the Emarsys example showed they bloat files massively – I actually found a 1.7MB source map while writing this post.

Configure your build for external source maps, and the process to send them to Sentry etc. early in your development process rather than leaving it as something to do later.

Final Thoughts

In case you hadn’t already guessed I’m not a big fan of embedding resources via data URIs.

The approach has many drawbacks which can often lead to larger download size and slower visitor experiences.

When I do use them I tend to restrict data URIs to resources that need to be loaded quickly for first render e.g. logo’s, hero images etc., and limit myself to embedding them in documents rather than stylesheets or scripts as it’s easy for clients to manage once I’ve left.

There are other use cases beyond this but if you’ve going to use them remember to put the processes in place to ensure your data URIs are being used appropriately and their overall size doesn’t get out of control.

Web Performance Calendar