Reducing Filesizes with Compression Dictionaries and Delta Compression

11thDec 2023 by Alex Hamer

ABOUT THE AUTHOR

Alex Hamer is software engineer at Tesco, with a passion for building high performance, accessible web applications for the many.

I recently attended the performance.now conference and was lucky enough to listen to Patrick Meenan’s talk on Compression Dictionaries which really got me excited.

Since then I’ve been interested in exploring how compression dictionaries could be applied to the web applications i’m involved with.

What are compression dictionaries?

I will try and be brief as there is a great explainer available here.

They allow compression algorithms such as Brotli to replace certain patterns with shorter codes, reducing the overall size of the compressed data. Using the compression dictionary the compressed content can then be decoded correctly at the other end.

Delta Compression

One part of the work that Patrick Meenan and team are championing is utilising previous responses to HTTP requests as a dictionary for future requests. Known as delta compression.

A basic example of this would be

I visit www.mywebsite.com on Monday
As part of the page I download a JS resource, app.v1.js
I visit www.mywebsite.com on the Tuesday
Since my last visit there has been a release deployed that changes the app.js file meaning I now download app.v2.js.
Rather than downloading the entire app.v2.js file, we use the cached app.v1.js file as a compression dictionary so we only need to fetch the delta between the 2 versions, i.e. just the part that has changed.

That sounds pretty beneficial for performance, right!?

How do we implement this?

Firstly Compression Dictionary transport is only available on Chrome and Edge 119 and above. In Chrome, to test locally you can enable it via chrome://flags. The associated experimental features are listed below which you can enable:

There is a supporting origin trial that you can register with to test against a registered domain.

Let’s now create a basic node application to try this out. It has a single dynamic route that can be used to simulate version changes. It simply serves some basic HTML that references a single JavaScript file.

const express = require("express");
const app = express();
const = 1982;

app.get('/version/:version', function (req, res) {
  const version = req.params.version

  res.send(`
    <html lang="en">
    <head>
      <script src="/app.v${version}.js"></script>
    </head>
    <body>
      <h1 id="version_version">Version ${version}</h1>
    </body>
    </html>
  `);
});

app.listen(port, function () {
  console.log(`Application running on ${port}!`);
});

This Javascript file is going to contain a lump of code to simulate a typical bloated JS file. I’ve added code from one of our live applications to provide a real example. We’ll add a console.log statement into it to check thats it’s executing and to report the version of the JS file running.

// ... lots of JavaScript code!
console.log('Running app.js version 1')

We then compress this file using Brotli to create a file that is 32kb in size.

To compress with Brotli locally I cloned the brotli repo and then built with Bazel.

git clone https://github.com/google/brotli.git

cd brotli

bazel build brotli

cd research/

bazel build dictionary_generator

brotli/bazel-bin/brotli app.v1.js -o app.v1.js.br

Let’s serve this JS file from our application as Brotli encoded.

app.get('/app.v*.js', function (req, res) {
  const version = req.path.match(/app.v(\d+)\.js/)[1];

  res.setHeader("Content-Encoding", "br");

  res.sendFile(path.join(__dirname, `public/app.v${version}.js.br`));
});

Now let’s introduce the concept of a second version, to simulate the code update of a newly deployed release. app.v2.js will simply include an update to the version that is logged.

// ... lots of JavaScriptCode!
console.log('Running app.js version 2')

When we hit the /version/2 route we download the app.v2.js file (brotli compressed). Despite it being a one line change we have to download the full 32kb of JavaScript ????.

Not to fear, let’s improve this by utilising compression dictionaries and delta compression. Once a user downloads app.v1.js it will be used as a dictionary when requesting app.v2.js. Allowing the user to only download the changes.

use-as-dictionary response header

First we need to inform the browser that once it has fetched app.v1.js it can be utilised as a dictionary for future requests.

We utilise the use-as-dictionary response header. This is a structured field containing values for:

match: URL-matching pattern for requests where the dictionary can be used.
ttl (time to live): time in seconds the dictionary is valid for. This is independent to cache lifespan of the actual resource being used as a dictionary. It defaults to a year.
type: file format of the dictionary, By default this is set to ‘raw’ which is a format suitable for all compression schemes.
hashes: a list of supported hashes, default is sha-256 and right now this is the only one supported.

In our example we want to match any version of the app.js file. Each version of this file will create a unique filename so we can keep the default TTL expiry of 1 year.

Our response header for our app.js resource can be set as:

res.setHeader("use-as-dictionary", 'match="/app.*.js"');

When the browser receives this response header it will be informed that it can store the resource as a dictionary. It saves the url pattern and a sha-256 hash representation of the resource.

As a result in Chrome you can access chrome://net-internals/#sharedDictionary and validate that the dictionary is stored correctly. You should see an entry along these lines:

Note that only one dictionary is stored for a URL matching pattern with older ones being replaced by the most recently fetched.

Sec-Available-Dictionary

With the dictionary stored, when the browser requests a resource that matches the pattern /app.*.js it will include sbr in the list of formats defined in the Accept-Encoding request header. It will also send an additional request header, sec-available-dictionary.

This request header allows the browser to tell the server it has an available dictionary for the resource which can be used for compression. The header consists of a hash of the contents of the dictionary.

So in our example, having already accessed app.v1.js, storing it as dictionary, when we then access app.v2.js Chrome adds the sec-available-dictionary header to the request. This header looks something like:

Great, so now we can tell the browser to store the resource as a dictionary. This then enables the browser to inform the server that it has an available dictionary for future requests that match the given URL pattern.

Handling a request with a dictionary

Before handling the request we need a delta compressed version of our app.v2.js resource, using app.v1.js as its dictionary. Again we’ll use Bazel to run brotli compression with the output being the difference between v1 and v2.

We need to output this delta in Shared Brotli format.

brotli/bazel-bin/brotli app.v2.js -D app.v1.js -o v1-v2.js.sbr

Next we update our route handler for app.js files to handle the following scenarios:

Request arrives without a supported sec-available-dictionary header, respond with the full app.v2.js.br file
Request arrives with a supported sec-available-dictionary header, we respond with the delta version, v1-v2.js.sbr

In a very simplified way we can set something up like:

const ASSET_MAP = {
  "93478f557349ac4f76ae381c897e60d4e4cb962e9f183fa009bcaece4f776f50": 'v1-v2'
};

app.get('/app.v*.js', function (req, res) {
  const version = req.path.match(/app.v(\d+)\.js/)[1];
  const dictionaryHash = req.get('Sec-Available-Dictionary');

  if (ASSET_MAP[dictionaryHash]) {
    res.setHeader("Content-Encoding", "sbr")
    res.sendFile(path.join(__dirname, `public/${ASSET_MAP[dictionaryHash]}.js.sbr`));
  } else {
    res.setHeader("Content-Encoding", "br");
    res.setHeader("use-as-dictionary", 'match="/app.*.js"');

    res.sendFile(path.join(__dirname, `public/app.v${version}.js.br`));
  }
});

This checks for a sec-available-dictionary header and that it has provided a hash that matches a delta compressed version of the file.

Testing it out

Now we can step through a simulated code change:

We access /version/1 route and download the full app.v1.js file at 32kb
A use-as-dictionary response header is added to the response and the browser stores app.v1.js as a dictionary for future use

A fictitious release is deployed and our app.js file is bumped to v2
We access /version/2 we request app.v2.js , the browser has an available dictionary and informs the server via the sec-available-dictionary request header.
Server returns app.v2.js as the delta of v1 and v2.

The file size falls from 32kb to a mere 358 bytes! Quite a saving!

Wrapping up

It’s early days for the Compression Dictionary Transport spec and I will be super interested in the feedback received from the current origin trial. I should also mention the 2nd use case, Shared Dictionary, you can read more about this here.

The simple example demoed highlights the benefits that could be achieved from delta compression. In an environment where you are pushing small increments of your applications, multiple times a day, this approach could have a big impression on the amount of code your users are downloading. Less code, less time waiting for resources, culminating in better performance for the end user.

I’m also interested how CDN providers could utilise this at the edge in the future. Rather than going to the origin it could be better to run this logic closer to the end user. There’s a requirement for build time resource generation and I wonder if they could be well placed to offload some of this consideration to.

Anyhow this is a promising spec for web performance I will be following it closely!

Comments are closed.

Web Performance Calendar