Web Performance Calendar

The speed geek's favorite time of year
2011 Edition
ABOUT THE AUTHOR
Tony Gentilcore photo

Tony Gentilcore (@tonygentilcore) is a software engineer at Google. He enjoys making the web faster and has recently added support for Web Timing and async scripts to Google Chrome/WebKit.

Rich developer tools are available for all modern web browsers. They are typically easy to use and can provide all of the information necessary to optimize web pages. It is rare to need to go beyond the unified networking/scripting/rendering view of the Web Inspector’s Timeline panel.

But they aren’t always perfect: a tool may be missing information, may disagree with another tool, or may just be incorrect. For instance, a recent bug occasionally caused two Navigation Timing metrics to be incorrect in Chrome (and the Inspector).

When these rare situations arise, great engineers are able to go beyond a browser’s developer tools to find out exactly what the browser is telling the operating system to do. On Linux, this source of ultimate truth can be found using strace. This tool can trace each system call made by a browser. Since every network and file access entails a system call, and this is where browsers spend a lot of their time, it is perfect for debugging many types of browser performance issues.

What about other platforms?

In this post, I introduce strace because the syntax is clean and no setup is required. But most systems have an equivalent tool for tracing system calls. Mobile developers will be happy to hear that strace is fully supported by Android. OS X users will find dtrace offers more powerful functionality at the expense of less intuitive syntax (unfortunately not ported to iOS). Finally, Event Tracing for Windows (ETW), while harder to setup, supports a friendly GUI.

Getting started

To use it: open a terminal and invoke strace at the command prompt. This invocation prints all system calls while starting Google Chrome to google.com:

$ strace -f -ttt -T google-chrome http://www.google.com/

I’ve added -f to follow forks, -ttt to print the timestamp of each call and -T to print the duration of each call.

Zeroing in

If you run the command above, you’ll probably be overwhelmed by the amount of stuff going on in a modern web browser. To filter down to something interesting, try using the -e argument. For examining only file or network access, try -e trace=file or -e trace=network. The man page has many more examples.

An example: local storage

As a concrete example, let’s trace local storage performance in Chrome. First I opened a local storage quota test page. Then I retrieved the Chrome browser processes’ ID from Chrome’s task manager (Wrench > Tools > Task Manager) and attached strace to that process using the -p switch.

$ strace -f -T -p <process id> -e trace=open,read,write

The output shows the timestamps, arguments and return value of every open, read, and write system call. The man page for each call explains the arguments and return values. The first call of interest to us is this open:

open("/home/tonyg/.config/google-chrome/Default/Local Storage/http_arty.name_0.localstorage-journal", O_RDWR|O_CREAT, 0640) = 114 <0.000391>

This shows us that Chrome has opened this file for reading and writing (and possibly created it). The name of the file is a big clue that this is where local storage is saved for arty’s web page. The return value, 114, is the file descriptor which will identify it in later reads and writes. Now we can look for read and write calls which operate on fd 114, for example:

write(114, "\0\0\00020\0001\0002\0003\0004\0005\0006\0007\0008\0009\0000\0001\0002\0003\0"..., 1024 <unfinished ...>
<... write resumed> ) = 1024 <0.425476>

These two lines show a 1,024 byte write of the data beginning with the string above to the local storage file (114). This write happened to take 425ms. Note that the call is split into two lines with possibly others in between because another thread preempted it. This is common for slower calls like this.

We’ve only scratched the surface

There are options for dumping the full data read/written from the network or filesystem. Running with -c displays aggregate statistic about the time spent in the most common calls. I’ve also found that some practical python scripting can quickly parse these traces into a variety of useful formats.

This brief introduction hardly does this tool justice. I merely hope it provides the courage to explore deeper into the stack the next time you run into a tricky performance problem.