Nowadays websites ranging from a craftsman’s business site to large portals embed third-party content. This content can be obvious or just plain invisible but in every case it can take your site down. Let’s look at three short examples of third-party content on your respective website:

Example #1: Facebook Like button

Probably the most well known third-party embed is the Facebook Like button. If you have a blog or an image gallery you might want to allow quick sharing of your content on Facebook. To implement the share button you can include the following snippet on your website:

<!-- Include the Facebook SDK -->
<div id="fb-root"></div>
<script>
    (function(d, s, id) {      
          var js, fjs = d.getElementsByTagName(s)[0];      
          if (d.getElementById(id)) return;      
          js = d.createElement(s);    
          js.id = id;      
          js.src = "//connect.facebook.net/de_DE/all.js#xfbml=1";      
          fjs.parentNode.insertBefore(js, fjs);  
      }(document, 'script', 'facebook-jssdk'));</script>
<!-- Include the button being displayed on the website -->
<div class="fb-like" data-href="http://www.xing.com" 
  data-send="true" data-width="450" data-show-faces="true"></div>

After doing this every page load will also request assets for the share button. In this case it will load those asynchronously. Nevertheless this postpones the page load event and makes it dependent on the load time of the Facebook server. window:load will only fire, once the share button will be loaded and interpreted completly. Having a delayed window:load event can be nasty. The browser may still show loading indicators and code triggering that event(tracking, UI components or monitorings) will be delayed as well.

Example #2: advertising

At XING we offer advertising and our best placements appear on the most frequented pages. These are the logged-in homepage, the logout-success page and our hub pages, including jobs and events. Thus we integrate these ads using our own JS wrapper:

<script>
    new xing.controls.RenderFrame({ 
        src: URL_TO_AD_SERVER,     
        target: DIV_CONTAINER,     
        height: HEIGHT,          
        width: WIDTH   
    });
</script>

This function generates an iframe and uses the src parameter to set the source URL of the iframe. This forces our ads to run in a safe sandboxed environment and loads them asynchronously. This technique nevertheless still delays the page load event as the browser waits for finished onload events of the iframe.

Example #3: tracking

Tracking scripts can be quite complex. Every commercial website uses some form of tracking ranging from Google Analytics to SiteCatalyst or some kind of home brewed solution. If we take a look at this example on Webmetrics, such a tracking system will most likely look a bit like the following snippet:

<!-- Load the JS Lib for your tracking suite -->
<script src="http://.../tracking.js"></script>
<!-- Gather information about the stuff you want to track -->
<script>
    x.site="Test Page";      
    x.server="http://tracking.example.com/beacon.url";     
    //to be continued
</script>
<!-- Initiate the transfer of gathered information to the server -->
<script>
    var x_code = collectTrackingInfo();      
    if(x_code) document.write(x_code);
</script>

Everything here happens completely synchronously, thus blocking succeeding code and therefore also influences the window:load.

So we have seen two ways of loading 3rd party JavaSript: synchronously (#3 Tracking) and asynchronously but postponing the window:load event (#1 Like Button) in each case. There is a third one which is probably the best solution, which I will touch upon at the end of this article. But first I want to explain how things can go bad when embedding third party content.

SPOF

SPOF stands for single point of failure. Nowadays with all the cross-site communication that goes on due sharing, embedding and tracking, developers have to become very mindful of using third party snippets. One of those third party snippets can be such a single point of failure. Whenever you embed third-party content, which may be any of the above solutions, you have to be aware of two well-known facts:

  1. Unless marked with “async” or “defer” script tags will load and execute synchronously and thus a blocking way. The subsequent content in the DOM always waits for the preceding JS block to finish loading and executing. The same goes for the DOMContentLoaded event. Tracking and ads wait for this event before submitting their data. If this event does not occur, you will lose whatever data you collected. The event itself always depends on the externally referenced JavaScript. If it takes long to load, the subsequent code will be delayed too. If the script is served by a third-party server and this server is not available the subsequent code will be delayed until the request times out.
  2. Any server – be it your own or a third party – will have issues. Even Facebook or Google servers may be unavailable or have long delays. Going by Murphy’s Law this is always bound to occur when your own site runs just fine.
    It’s important to remember that third-party servers can and will crash no matter how good you’ve done your homework or how well you’ve accounted for failures in your data center.

At this point I’d like to provide a simple but very effective example to check for a SPOF:

Example of third-party content on your website

<script src="//www.example.com/thirdparty/sharebutton.js"></script>

Let’s do some basic measurements here:

<script>
    var startTime = new Date().getTime();
</script>
<script src="//www.example.com/thirdparty/sharebutton.js"></script>
<script>
    alert((new Date().getTime() - startTime) +
        "ms used to download and execute the sharebutton code");
</script>

To simulate a failure we just replace the URL of the third-party plugin with the URL http://blackhole.webpagetest.org. This is an URL provided by @PatMeenan at webpagetest.org. The cool thing about this URL is that it will always respond with a timeout of at least 30 seconds. This is a great way to simulate a potential failure of a third-party content provider.

<script>
var startTime = new Date().getTime();
</script>
<script src="http://blackhole.webpagetest.org"></script>
<script>
    alert((new Date().getTime() - startTime) + 
        "ms used to download and execute the sharebutton code");
</script>

You will immediatly see and understand the difference once you have tried this out.

The “freeze incident”

Now that you know about possibilities of failure and how to simulate them I’d like to share with you one of the weirdest issues we’ve come across in a long time.

From October 17 on we’ve had users reporting that http://www.xing.com freezes their webkit browser. As usual we tried to reproduce it in any way possible, but it simply didn’t happen for us. Our tracking JS is delivered from our own servers, so we didn’t see any risk there. Only the request to transfer the tracking data went directly to our tracking provider’s server, which didnt look harmful to us. Ads were rendered in iframes, so no chance they could freeze the whole page.

So we reported back to our User Care department to put out the usual statement saying “Please check your browser, plugins and system. We can’t reproduce this error”. But as reports kept coming in we kept on looking. After about the 100th reload I noticed a request to our external tracking domain that had been running for a long time. While waiting for the DOMContentLoaded event and not hitting the reload button I noticed that the page had frozen and was in fact unresponsive. What a true WTF?! moment!

This is when I remembered a talk that Pat Meenan gave at the about SPOFs on a webperformance meetup in Hamburg. I immediatly gave the black hole technique a try by changing my /etc/hosts/ and mapping our tracking domain to http://blackhole.webpagetest.org. I then checked our production site and was astonished. The whole page froze, exactly as the users reported. Even after 30 seconds of timeout the page kept freezing and it was reproducable. It was another WTF?! moment. We don’t block any third-party content, so what happened?

We spotted two potential pitfalls:

  • a tracking server that wont respond to a request
  • our newly updated tracking library

The newly updated tracking library which is served by our own working servers implemented some magic called “link tracking” in webkit browsers. There was a tiny little entry in the libs’ update notes.

With this update the tracking JS put an observer on every link of the rendered page. That observer was in turn waiting for the tracking server’s response to release them again. During this time clicks on these links were captured by the tracking script which would be waiting for the response of the tracking server. When the server took more than 500ms to respond the observers would never be released. They would catch every click event and thus make the page unresponsive.

With the above setup after packaging the tracking information, submitting it to the server and while waiting for the server’s response, the page was not responsive at all.

But as the tracking server was not within our control and we couldnt simply turn off the whole tracking, we solved this by manually overriding the new link tracking feature. With this change the tracking lib would never disable any links by catching all clicks, even if the tracking server was not reachable. What made this issue so hard to track down was that none of us would have suspected that there was anything waiting for the request to the tracking server to complete.

Tools and Tricks to the rescue

There are many tools out there that can make your life easier, especially with regard to spotting possible SPOFs. As mentioned above it is very wise to regularly check your 3rd party content against http://blackhole.webpagetest.org. Luckily there is a Chrome plugin that does the trick for you: SPOF-O-MATIC.

That plugin shows a warning whenever there is content in a document that can block or take down your site. It can even simulate a down-time of 3rd party servers.

Another option is to use webpagetest.org
Screenshot

The tests include SPOFs in their measurements.

It is a great idea to take Pat Meenan’s advice on (testing frontend SPOFs). It is always worth surfing your own websites while having adjusted your /etc/hosts. Just point some popular 3rd party domains to http://blackhole.webpagetest.org. This is how we actually confirmed that “the curious case” was actually a SPOF.

In the end, you should always load your 3rd party scripts completely asynchronously and without affecting the DOMContentLoaded event.

To summarize, there are essentially three ways of loading your 3rd party code.

Method synchronous/
asynchronous
delays window:load
Example #1
Facebook Like Button
asynchronous yes
Example #3
Tracking
synchronous oh yes
Load JS in iframe
(acc. to Stoyan/Meebo’s approach)
asynchronous no :-)

Method 1 is actually fine but it still influences window:load. It already decouples the loading completely from the 3rd party server. If these servers have performance issues or downtime you wont be affected. This in itself is already very valuable.

After our recent experiences we now plan to migrate our tracking and advertising wrappers according to the third method and adjust our monitoring to properly report such incidents.

Given these adjustments and now being very aware of this topic we are very positive to prevent such failures in the future.

To recap, the following things can be recommended:

  • Replace the URL of any of your third-party references with, e.g. http://blackhole.webpagetest.org to check for SPOFs
  • Use tools like SPOF-O-MATIC to be constantly (even when doing private surfing) aware of SPOFs
  • Most importantly: Do this with content you embed, as well as with things you send somewhere else. Be aware something can wait for a proper response even for a simple transfer of tracking data
  • Be aware of updates to third party scripts, be it GA, Omniture or others. Make sure to carefully read update notes. They can easily contain surpises your product guys love but all of a sudden place observers on any link on your respective website
ABOUT THE AUTHOR

Björn Kaiser (@BjoernKaiser) is Principal Frontend at XING and keen on everything regarding webperformance and frontend. He is one of the initiators of the biggest German performance meetup in Hamburg and speaker at German performance meetups (Hamburg, Berlin, Karlsruhe)

13 Responses to “SPOF: How we fixed a weird bug causing random users’ browsers to freeze”

  1. Performance Calendar » The non-blocking script loader pattern

    [...] It turns out that in most browsers, any resource that has started downloading before onload, will block onload. This means that if the script we loaded asynchronously was slow, or timed out, our onload event would incur a significant delay. If your site does important tasks in the onload event (like load advertisements), these might be delayed, and might never execute if the user leaves the page before that happens, causing a loss in revenue. Every script added, whether directly or dynamically is a SPOF. [...]

  2. personal newborn covers.

    Wow! Finally I got a web site from where I know how to truly obtain valuable
    facts concerning my study and knowledge.

  3. IT Operations News Roundup — Dec 10th to 16th | Web Performance Monitoring and Optimization

    [...] How To Troubleshoot For SPOF Third party tags can introduce availability issues to a webpage — make sure you aren’t at risk by testing for SPOF. [...]

  4. Performance Calendar » SPOFCheck – Fighting Frontend SPOF at its root

    [...] SPOF has also increased tremendously among engineers, thanks to some of the recent blogs and articles emphasizing the importance of [...]

  5. Early Detection of Frontend Single Points of Failure — eBay Tech Blog

    [...] on this topic, as well as recent articles and blogs emphasizing its importance; see, for example, Bjorn Kaiser’s post about third-party content causing SPOFs. Numerous utilities and plugins exist that can detect [...]

  6. Sandra

    Gday! This is the third time visiting now and I personally just wanted to say I truley get pleasure from reading through your blog website.
    I decided to bookmark it at digg.com with your title: Performance Calendar
    SPOF: How we fixed a weird bug causing random users browsers to freeze
    and your Web address: http://calendar.perfplanet.
    com/2012/spof-bug/. I hope this is all right with you, I’m attempting to give your great blog a bit more publicity. Be back soon.

  7. Allie

    Thank you! I have been going crazy trying to figure out why a client was having issues with his website but I could never see the problem on my end. I tried EVERYTHING to replicate the issue and couldn’t. I just tried your SPOF blackhole test and found the issue is the tracking server. Thank you VERY much for your post!

  8. http://Www.Ikmportugal.com/

    Many thanks for spending free time to write “Performance Calendar SPOF: How
    we fixed a weird bug causing random users browsers to freeze”.
    Thank you so much once again -Moshe

  9. HTTP://www.Tapestryflorist.com

    Thanks for your effort for composing “Performance Calendar SPOF: How we fixed
    a weird bug causing random users browsers to freeze”. I
    reallymay certainly wind up being back again for even more browsing and commenting here soon enough.
    Thanks, Margie

  10. http://www.hb3urbandesign.com

    Thanks for your effort for writing “Performance Calendar SPOF:
    How we fixed a weird bug causing random users browsers to freeze”.
    I actuallymight definitely wind up being back for a great deal more browsing and commenting here shortly.
    I am grateful, Lavonne

  11. turfs2surf.com

    “Performance Calendar SPOF: How we fixed a weird bug
    causing random users browsers to freeze” ended up being a very awesome blog, .
    I hope you keep posting and I am going to continue to keep
    reading through! Thanks for your effort ,Tabitha

  12. film bioskop indonesia

    I’ll right away clutch your rss as I can not to find your e-mail subscription link
    or e-newsletter service. Do you have any? Please permit me recognise in order that I may
    subscribe. Thanks.

  13. Facebook Bug Redirects the Web Through Javascript Widget Error Stratusclear | Stratusclear - Making the Cloud Clear

    […] Kaiser wrote a great blog post last year about the risks that embedded Javascript widgets can create, and how their failure can create a single point of failure (SPOF) on your site. In the post, he […]

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
And here's a tool to convert HTML entities