Sharing is caring: What's new with comparing HAR files?

Sharing is caring: What’s new with comparing HAR files?

7thDec 2018 by Peter Hedenskog

ABOUT THE AUTHOR

Peter Hedenskog

Peter Hedenskog (@soulislove) works in the performance team at the Wikimedia Foundation. Peter is one of the creators of the Open Source sitespeed.io web performance tools.

A couple of years ago I had a summer project: I wanted to build an easy way to compare HAR files. When I was finished it became https://compare.sitespeed.io. It’s Open Source and support all HAR files (with some extra love for HARs from WebPageTest/sitespeed.io/Browsertime). Today I want to share some of the new features that have been added lately.

But first of all: thank you Patrick Meenan for the HAR comparison in WebPageTest (that has inspired me) and without Michael Mrowetz project PerfCascade I could not have built it. Thank you very much for all the cool work you both have done!

Lets talk about the latest changes, most of them driven by the performance community!

Gist and configuration support

Some time ago Matt Hobbs created a feature request to add Gist integration with more configuration. His main use case was to be able to add labels to each HAR so it’s easier to communicate between team members about changes. That would also make it easier to use the result in presentations. We discussed it back and forth for a while and we ended up with some pretty cool functionality!

In the old version there where two ways of comparing HAR files:

Go to the site and upload them.
Add request parameters linking to the HAR files that you want to compare ?har1=FULL_URL1&har2=FULL_URL2&compare=1 (that’s what we use here). And the HAR files are automatically fetched.

That works good but not as flexible as Matt’s suggestion since that will make it possible to configure more things in the future. With the new version, you can also use a configuration file. The simplest format looks like this:

{
  "har1": {
    "url": "https://www.url.com/page1.har"
  },
  "har2": {
    "url": "https://www.url.com/page2.har"
  }
}

You add the full path to the HAR files. You can add your configuration in a gist, paste the gist id directly into your browser window or just add the id as a GET parameter: ?gist=GIST_ID. Want to see what it looks like in all its glory? Try out https://compare.sitespeed.io/?gist=94e4d997a78e03b32b939fcea63eae8e.

There are more things you can do with the configuration: You can choose which run/page in the HAR that should be shown by default, give each HAR a label and add some extra text in the viewer. You can also choose which requests you want to categorise as first/third party. A full configuration file looks like this:

{
  "har1": {
    "url": "https://www.url.com/page1.har",
    "label": "Before change",
    "run": 1
  },
  "har2": {
    "url": "https://www.url.com/page2.har",
    "label": "After change",
    "run": 2
  },
  "title": "The page title used in the title bar",
  "firstParty" : "RegEx that defines what's a first party request",
  "comments": {
    "intro": "Extra information put at the top of the page",
    "waterfall": "Text displayed at top of the waterfall",
    "visualProgress": "Text displayed at the top of visual progress",
    "domains": "Text displayed at the top of domains",
    "requestDiff": "Text displayed at the top of request/response difference",
    "firstParty": "Text displayed at the top of first/third party"
  }
}

You do not need to host the file as a Gist: You can also host it yourself and using another GET parameter ?config=https://URL_TO_THE_CONFIG_FILE. Or you could just copy/paste the configuration file into the browser window. That works too.

I think the configuration file is a hidden treasure waiting to be found: as long as the tool you use can generate a HAR file you can use it. Setup your continous integration to generate the configuration file comparing latest run with earlier runs. That way you as a developer have an super easy way to know what has been changed.

And if you don’t want to use our public available compare.sitespeed.io, you can just deploy your own version (it’s Open Source you know) and use that. Just remember to give us cred 🙂

Difference by size

A couple of days ago Devrim Tufan added another cool feature request to make it easier to find the difference between two HAR files. Say that you see a regression in your timing metrics and you want to know what content has changed since the last time you run your test. Devrim’s idea was that we should list all the requests that differ between runs.

The tool will look for requests done in only one of the HARs and check for size change for responses that exist in both HARs. It looks like this:

And you also get a summary of what changed:

Devrim took the time to really explain how we thought it would be useful and I implemented it. I really like this because it makes it simpler to find what has actually changed.

Offline support

Matt Hobbs took the time to add offline support! You can compare your HAR files anywhere and anytime you need. That’s pretty sweet.

Supporting first/third party request for everyone

In older versions you needed to specify a regex defining which requests that is first party at runtime and that was only limited to HAR files produced by sitespeed.io.

But now all HAR files (independent of tool) can generate the first/third party metrics! By default requests are grouped by first/third party by generating a first party regex from the base domain. For example; if you test https://www.wikipedia.org, the regex will be .*wikipedia.*.

If you want to configure your own first party regex, you can do that with the configuration file.

The highest level will give first/third parties grouped by requests and size:

It’s quite interesting to look at how many request some sites load from third parties.

What’s next?

What more can we do to make an even better tool? Create a issue at Github and let us discuss it there!

Web Performance Calendar