The craft of web performance has come a long way from yslow and the first few performance best practices. Engineers the web over have made the web faster and we have newer tools, many more guidelines, much better browser support and a few dirty tricks to make our users’ browsing experience as smooth and fast as possible. So how much further can we go?

Physics

The speed of light has a fixed upper limit, though depending on the medium it passes through, it might be lower. In fibre, this is about 200,000 Km/s, and it’s about the same for electricity through copper. This means that a signal sent over a cable that runs 7,000Km from New York to the UK would take about 35ms to get through. Channel capacity (the bit rate), is also limited by physics, and it’s Shannon’s Law that comes into play here.

This, of course, is the physical layer of our network. We have a few layers above that before we get to TCP, which does most of the dirty work to make sure that all the data that your application sends out actually gets to your client in the right order. And this is where it gets interesting.

TCP

Éric Daspet’s article on latency includes an excellent discussion of how slow start and congestion control affect the throughput of a network connection, which is why google have been experimenting with an increased TCP initial window size and want to turn it into a standard. Each network roundtrip is limited by how long it takes photons or electrons to get through, and anything we can do to reduce the number of roundtrips should reduce total page download time, right? Well, it may not be that simple. We only really care about roundtrips that run end-to-end. Those that run in parallel need to be paid for only once.

When thinking about latency, we should remember that this is not a problem that has shown up in the last 4 or 5 years, or even with the creation of the Internet. Latency has been a problem whenever signals have had to be transmitted over a distance. Whether it is a rider on a horse, a signal fire (which incidentally has lower latency than light through fibre[1]), a carrier pigeon or electrons running through metal, each has had its own problems with latency, and these are solved problems.

C-P-P

There are three primary ways to mitigate latency. Cache, parallelise and predict[2]. Caching reduces latency by bringing data as close as possible to where it’s needed. We have multiple levels of cache including the browser’s cache, ISP cache, a CDN and front-facing reverse proxies, and anyone interested in web performance already makes good use of these. Prediction is something that’s gaining popularity, and Stoyan has written a lot about it. By pre-fetching expected content, we mitigate the effect of latency by paying for it in advance. Parallelism is what I’m interested in at the moment.

Multi-lane highways

Mike Belshe’s research shows that bandwidth doesn’t matter much, but what interests me most is that we aren’t exploiting all of this unused channel capacity. Newer browsers do a pretty good job of downloading resources in parallel, and with a few exceptions (I’m looking at you Opera), can download all kinds of resources in parallel with each other. This is a huge change from just 4 years ago. However, are we, as web page developers, building pages that can take advantage of this parallelism? Is it possible for us to determine the best combination of resources on our page to reduce the effects of network latency? We’ve spent a lot of time, and done a good job combining our JavaScript, CSS and decorative images into individual files, but is that really the best solution for all kinds of browsers and network connections? Can we mathematically determine the best page layout for a given browser and network characteristics[3]?

Splitting the combinative

HTTP Pipelining could improve throughput, but given that most HTTP proxies have broken support for pipelining, it could also result in broken user experiences. Can we parallelise by using the network the way it works today? For a high capacity network channel with low throughput due to latency, perhaps it makes better sense to open multiple TCP connections and download more resources in parallel. For example, consider these two pages I’ve created using Cuzillion:

  1. Single JavaScript that takes 8 seconds to load
  2. 4 JavaScript files that take between 1 and 3 seconds each to load for a combined 8 second load time.

Have a look at the page downloads using FireBug’s Net Panel to see what’s actually happening. In all modern browsers other than Opera, the second page should load faster whereas in older browsers and in Opera 10, the first page should load faster.

Instead of combining JavaScript and CSS, split them into multiple files. How many depends on the browser and network characteristics. The number of parallel connections could start of based on the ratio of capacity to throughput and would reduce as network utilisation improved through larger window sizes over persistent connections. We’re still using only one domain name, so no additional DNS lookup needs to be done. The only unknown is the channel capacity, but based on the source IP address and a geo lookup[4] or subnet to ISP map, we could make a good guess. Boomerang already measures latency and throughput of a network connection, and the data gathered can be used to make statistically sound guesses.

I’m not sure if there will be any improvements or if the work required to determine the optimal page organisation will be worth it, but I do think it’s worth more study. What do you think?

Footnotes

  1. Signal fires (or even smoke signals) travel at the speed of light in air v/s light through fibre, however the switching time for signal fires is far slower, and you’re limited to line of sight.
  2. David A. Patterson. 2004. Latency lags bandwith[PDF]. Commun. ACM 47, 10 (October 2004), 71-75.
  3. I’ve previously written about my preliminary thoughts on the mathematical model.
  4. CableMap.info has good data on the capacity and latency of various backbone cables.
ABOUT THE AUTHOR
Philip Tellis photo

Philip Tellis (@bluesmoon) is a geek living and working in California's Silicon Valley. He works with Yahoo!'s performance and security groups on measuring and improving the performance and security of Yahoo!'s websites. Philip writes code because it's fun to do. He loves making computers do his work for him and spends a lot of time trying to get the most out of available hardware. He is equally comfortable with the back end and the front end and his tech blog has articles ranging from Operating System level hacks to Accessibility in Rich Internet Applications.

16 Responses to “Thoughts on Performance”

  1. Tweets that mention Performance Calendar » Thoughts on Performance -- Topsy.com

    [...] This post was mentioned on Twitter by Stoyan Stefanov and others. Stoyan Stefanov said: performance calendar day #22: @bluesmoon's thoughts on performance http://perfplanet.com/201022 [...]

  2. Philip Tellis

    Looks like I copied the wrong cuzillion link for the first example. Use this one instead: http://stevesouders.com/cuzillion/?c0=hj1hfff8_0_f&t=1293009893007

  3. Schepp

    Very good article, Philip! Always a pleasure to read you :)

    For any interested reader I’d like to point to a PHP-library of mine, the CSS-JS-Booster, that I released a year ago and that evolved quite a bit since then. It combines multiple files and does dataURIfication and all that perf stuff.

    But one feature sets it apart from other libraries: by default it re-splits the combined CSS back into multiple even-sized parts. Default is 2 parts (which leaves request headroom for other page resources) but the number can be configured freely (it can be turned off completely as well).

    I first thought of implementing this feature for JavaScript, too, but back then I decided against as most browsers seemed to belong to the “blocking”-fraction which would not profit from parallel script downloads.

    Currently I am again thinking about a JS-parser and a splitting-option, also in regards to what Steve Souders did with postponed execution and all that. But it won’t come that quickly, I think. Lot of work and not as much time as I wish… *sigh*

    Anyway this does not diminish the value of the already implemented features and I can also see people combining it with browser-sniffing / Geo-IP-lookup to regulate the CSS-split-count dynamically according to the visitors metrics.

    Regards

    Schepp

  4. Patrick Meenan

    I’m wondering if the test case with cuzillion is really representative of a high-bandwith, latency-constrained connection. Theoretically you’d pay the same latency for the combined request as you would for EACH of the split requests so having 4 files whose latency added up matches the latency of the single request seems to be the wrong way to do it.

    Even worse, each one of the new connections in the split path will be going through it’s own slow-start so individually they will pay more round trips than if they were combined (granted, you’ll probably want those connections later in the page so you’ll have to pay the price at some point but paying for it with your render-blocking resources is probably not optimal).

    Isn’t that fundamentally why SPDY is working to reduce the number of connections and do as much in parallel over a single TCP connection?

  5. Kyle Simpson

    I’m thrilled someone else is finally suggesting what I’ve been saying for over a year.

    We all listened to Yahoo a few years back when they said “take all your 10 .js files and combine into one file. less http requests = faster page”.

    But we neglected to think about, what if 2 chunks is better than 1? Is there some happy medium.

    FWIW, I’ve been conducting some tests in this exact area. I’ve been testing whether a page loads faster with a single 100k JS file, or two 50k chunks loaded in parallel. My research has almost universally shown that on average the two chunks load faster, even with all the overhead of a second TCP connection, etc.

    Here’s one small scale test:

    http://test.getify.com/concat-benchmark/test-1.html

    I have other tests that are more robust and test more accurately, but they’re much higher bandwidth greedy, so I’m still trying to get good hosting before I make them public.

    Thanks again for calling attention to this issue.

  6. stoyan

    Cuzillion link updated, thanks Philip

  7. Paula Hunt

    I might be wrong, but it seems to me that if the main problem with HTTP is the RTT, then splitting up your CSS/JS into multiple pieces will not change anything besides adding the extra TCP connections overhead. Aren’t those two things orthogonal?

  8. Philip Tellis

    @Paula

    If channel capacity is 1Mbps, but the throughput of a single TCP connection is 100kbps, then a single 200KB JavaScript file would take 2 seconds to load, but two 100KB JavaScript files downloading in parallel would take 1 second. This is of course an oversimplification since slow start and latency play a part, but it isn’t far from what actually happens.

  9. Latency and the Web | leskowsky.net

    [...] post by Philip Tellis (link) on latency and how to minimize it in general. The mnemonic he uses is one I’ve not seen [...]

  10. Hash URIs « Microformats & the semanantic web

    [...] Can different portions of the page be requested in parallel? These days, making many small requests may lead to better performance than one large one. [...]

  11. Shauna King

    I have enjoyed reading the article. Thanks for the updated link! it works fine!
    http://9pillsonline.com/

  12. http://fryzjer.copyar.pl/

    Unquestionably believe that which you said. Your favorite justification seemed to
    be on the internet the easiest thing to be aware of.
    I say to you, I certainly get irked while people consider worries that they plainly
    do not know about. You managed to hit the nail upon the
    top as well as defined out the whole thing without having side-effects , people could take a signal.
    Will likely be back to get more. Thanks

  13. yahoo travel reviews

    Hello there, simply turned into alert to your weblog thru Google, and located that it’s truly informative. I’m gonna be careful for brussels. I’ll be grateful should you proceed this in future. Numerous other folks can be benefited out of your writing. Cheers!

  14. Hash URIs | Technical Architecture Group

    [...] parts as state changes. Can different portions of the page be requested in parallel? These days, making many small requests may lead to better performance than one large one. * Can the different portions of the page be cached locally or in a CDN? You can make best use of [...]

  15. e-deklaracje 2014 gov

    It’s going to be end of mine day, but before finish I am reading this impressive post to increase my knowledge.

    Also visit my weblog … e-deklaracje 2014 gov

  16. Latency and the Web | Leskowsky.net

    […] post by Philip Tellis (link) on latency and how to minimize it in general. The mnemonic he uses is one I’ve not seen […]

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
And here's a tool to convert HTML entities