<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Performance Calendar</title>
	<atom:link href="http://calendar.perfplanet.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://calendar.perfplanet.com</link>
	<description>The speed geek&#039;s favorite time of the year</description>
	<lastBuildDate>Tue, 26 Feb 2013 01:24:02 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Creating a Performance Culture</title>
		<link>http://calendar.perfplanet.com/2012/creating-a-performance-culture/</link>
		<comments>http://calendar.perfplanet.com/2012/creating-a-performance-culture/#comments</comments>
		<pubDate>Mon, 31 Dec 2012 17:00:20 +0000</pubDate>
		<dc:creator>editor</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1558</guid>
		<description><![CDATA[The performance community is growing. With 17K members across 46 meetup groups it&#8217;s pretty easy to find someone else who cares about performance. But what if your company is new to the world of high performance websites? How can you make performance a priority within your organization? I don&#8217;t have a guaranteed recipe, but here are [...]]]></description>
				<content:encoded><![CDATA[<p>The performance community is growing. With <a title="Web Performance Meetups" href="http://web-performance.meetup.com/all/">17K members across 46 meetup groups</a> it&#8217;s pretty easy to find someone else who cares about performance. But what if your company is new to the world of high performance websites? How can you make performance a priority within your organization? I don&#8217;t have a guaranteed recipe, but here are some key ingredients for creating a culture of performance where you work.</p>
<dl style="margin-left: 0.5em;">
<dt style="margin-top: 0.5em;">Get Support from the Top</dt>
<dd>If you&#8217;re lucky like me, your CEO is already on the web performance bus. It might even have been their idea to focus on performance, and you&#8217;ve been recruited to lead the charge. If this isn&#8217;t your situation, you have to start your evangelism at the top. You might start with the CEO, or perhaps COO or an SVP. The key is it has to be someone who is a leader across the different organizations within your company. The culture shift to focus on performance doesn&#8217;t happen in engineering alone. It has to happen across product management, marketing, sales, and all other parts of the company. You need to identify who the key leader or leadership team is and get them excited about web performance, and make them believe in the benefits it delivers.</dd>
<dt style="margin-top: 0.5em;">Speak the Right Vocabulary</dt>
<dd>As an engineer you probably know how to sell to other engineers. &#8220;Optimization&#8221; makes a developer&#8217;s ears perk up. Speaking in terms of reduced regressions and fewer outages wins over folks in devops. But you also need to know how to speak across the organization both horizontally and vertically. The UX team likes hearing about better user metrics (longer sessions, more sessions per month). The folks in finance wants to hear about reduced operating costs in terms of hardware, power consumption, and data center bandwidth. Marketing and sales will light up hearing case studies about doubling unique users from search engine marketing as a result of a faster website. Make sure to use terms that resonate with your audience.</dd>
<dd style="margin-top: 0.3em;">A key skill in evangelizing to upper management is knowing how to speak hierarchically &#8211; start with the high-level stats and drill down into the details if the need arises. I see many engineers who start with the details which many folks don&#8217;t have the time or background to follow. Start by showing a median and save the logarithmic scale charts in the &#8220;more slides&#8221; section.</dd>
<dt style="margin-top: 0.5em;">Pick the Right Product</dt>
<dd>If you&#8217;ve convinced the senior execs to focus on performance, your next step is to pick a product to focus on. You want to pick a high visibility product, so that the wins are significant. But you don&#8217;t want to pick the company&#8217;s flagship product. It&#8217;s possible you might hit a few bumps on your first forays into adopting web performance. Also, you might have to alter the release cycle as you rollout metrics and start A/B testing. That&#8217;s harder to do with a product that&#8217;s the company&#8217;s cash cow. Start with a product that&#8217;s in the top 5 or 10, but not #1.</dd>
<dt style="margin-top: 0.5em;">Pick the Right Team</dt>
<dd>The team you choose to work with is even more important than which product you choose. It all comes down to people, and if the team is too busy, has other priorities, or simply doesn&#8217;t believe in WPO (<a href="http://www.stevesouders.com/blog/2010/05/07/wpo-web-performance-optimization/">Web Performance Optimization</a>) then you should move on to another team. You can always come back and revisit this team in the future.</dd>
<dd style="margin-top: 0.3em;">I always have a kickoff meeting with a team that&#8217;s interested in working on performance, and I ask how many people they can dedicate to work on performance. I&#8217;m usually looking for at least two people full-time for 3 months. Sometimes teams think it&#8217;s sufficient to have someone spend 20% of their time working on performance, but this usually doesn&#8217;t have a positive outcome. If this is the company&#8217;s first engagement with web performance, you want to make sure to pick a team that has the mindset and resources to focus on the work ahead.</dd>
<dt style="margin-top: 0.5em;">Pick the Right Task</dt>
<dd>It&#8217;s critical that the first performance optimization deployed has a significant impact. There&#8217;s nothing more frustrating than getting folks excited about WPO only to have their work show no improvement. For most websites it&#8217;s fairly straightforward to pick an optimization that will have a big impact. I&#8217;m reminded of <a href="http://ismailelshareef.com/">Ismail Elshareef</a>&#8216;s case study about <a href="http://www.youtube.com/watch?v=5_-YukDEDBE&amp;list=PL60DCE94FE519EB10#t=9m17s">Edmunds.com getting 80% faster</a>. He talks about how the first task they picked was making resources cacheable. After just a day of work they pushed the fix and cut their CDN traffic by 34%! This is the type of win you want to have right out of the gate &#8211; something that takes a small amount of work and makes a big improvement.</dd>
<dt style="margin-top: 0.5em;">Start with Metrics</dt>
<dd>I&#8217;ve had several engagements where teams got so excited about the optimizations, they started deploying fixes before the metrics were in place. This is bad for two reasons. Without metrics you&#8217;re flying blind so you don&#8217;t know the actual impact of any fixes. But more importantly, it&#8217;s likely that the first fixes you deploy will have the biggest impact. If the metrics aren&#8217;t in place then you miss out on quantifying your best work! It&#8217;s best to establish the baseline when the site is at its worst. Occasionally teams don&#8217;t want to do this because they&#8217;re embarrassed by the slowness of the site. Just remind them how happy the execs will be to see a chart showing the site getting twice as fast.</dd>
<dt style="margin-top: 0.5em;">Identify Your Replacement</dt>
<dd>Within the chosen team there needs to be someone who is aligned with you to take over your role. This is the person who keeps the team focused on performance after you&#8217;ve moved on to help the next team. He is the one who tracks the dashboards, identifies the changes that were deployed, analyzes the A/B test results, and prioritizes the next optimizations to work on.</dd>
<dd style="margin-top: 0.3em;">It&#8217;s not scalable for you to be the only performance expert in the company. You want to build a virtual performance team that spans all the products in the company. At Yahoo! we called this team the SpeedFreaks. We had regular gatherings, a mailing list, etc. It was a great way to share lessons learned across the different teams and re-energize our excitement about making things faster.</dd>
<dt style="margin-top: 0.5em;">Get Everyone on Board</dt>
<dd>Making and keeping the website fast requires everyone to be thinking about performance. It&#8217;s important to keep the entire company involved. There are several ways of doing this. One technique I see often is having realtime dashboards deployed at large gathering spots in the company. The Wall of Fame is another good chart. Eventually teams that are always at the bottom will start wondering what they have to do to get to the top. Getting time during the company all-hands meetings to review current performance and highlight some wins is good. Adding performance (speed) to the annual performance (HR) review form makes everyone think about their contributions in the past and plan on how they can contribute in the future.</dd>
<dt style="margin-top: 0.5em;">Use Carrot over Stick</dt>
<dd>If you follow these tips it&#8217;s likely that you&#8217;ll start off having successful engagements evangelizing and deploying performance best practices at your company. After working with the choicest teams, however, it&#8217;s also likely that you&#8217;ll run into a team that just isn&#8217;t drinking the WPO kool-aid. This is more likely to happen at larger companies where it&#8217;s more challenging to create a cultural shift. If you can&#8217;t convince this team to apply the right amount of focus, one possible reaction is to bring in a senior exec to command them to make performance a priority and do the work. This might work in the short term, but will fail in the long term and might even set you further back from where you started.</dd>
<dd style="margin-top: 0.3em;">Performance is a way of thinking. It requires vigilance. Anyone who has it forced upon them will likely not value performance and will instead look upon it as a nuisance that took them away from their desired focus. This person is now even harder to win over. It&#8217;s better to avoid taking the &#8220;stick&#8221; approach and instead use the &#8220;carrot&#8221; as motivation &#8211; t-shirts, bonuses, executive praise, shout outs at company all-hands, etc. No one enjoys the stick approach &#8211; the team doesn&#8217;t enjoy it and neither will you. Everyone comes away with negative memories. The carrot approach might not work in the short term, but it leaves the door open for a more positive re-engagement in the future.</dd>
<dt style="margin-top: 0.5em;">Be Passionate</dt>
<dd>It&#8217;s likely you&#8217;re the &#8220;performance lead&#8221; within the company, or at least the one who cares the most about making performance a high priority. It&#8217;s not going to be easy to get everyone else on board. You can&#8217;t go about this half heartedly. You have to be passionate about it. John Rauser spoke (passionately) about this in <a href="http://www.youtube.com/watch?feature=player_embedded&amp;v=UL2WDcNu_3A">Creating Cultural Change</a> at Velocity 2010. He says you have to be excited, and relentless. I agree.</dd>
</dl>
<p>Creating a culture of performance at your company is about creating a culture of quality. This is especially true because best (and worst) practices propagate quickly at web companies. Code written for product A is reused by product B. And folks who worked on team A transfer over to team C. If product A is built in a high performance way, those best practices are carried forward by the code and team members. Unfortunately, bad practices spread just as easily.</p>
<p>Companies like <a title="Google's Ten things we know to be true" href="http://www.google.com/about/company/philosophy/">Google</a>, <a title="Etsy October 2012 Site Performance Report" href="http://codeascraft.etsy.com/2012/11/09/october-2012-site-performance-report/">Etsy</a>, and <a title="Betfair's Customer Commitment" href="https://promotions.betfair.com/customer-commitment/">Betfair</a> have gone so far as to publish their commitment to performance. This is a win for their customers and for their brand. It&#8217;s also a win for the performance community because these companies are more likely to share their best practices and case studies. If your company is focused on performance, please help the community by sharing your lessons learned. If your company doesn&#8217;t have a focus on performance, I hope these tips help you establish that WPO focus to create a website that has a better user experience, more traffic, greater revenue, and reduced operating expenses.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/creating-a-performance-culture/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Speed Up Your Site Using Prefetching</title>
		<link>http://calendar.perfplanet.com/2012/speed-up-your-site-using-prefetching/</link>
		<comments>http://calendar.perfplanet.com/2012/speed-up-your-site-using-prefetching/#comments</comments>
		<pubDate>Sun, 30 Dec 2012 19:20:42 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1556</guid>
		<description><![CDATA[The concept of prefetching is pretty simple. We often know about resources the browser is likely to need before the browser does. Prefetching involves either giving the browser hints of pages or resources it is likely to need so that it can download them ahead of time, or actually downloading resources into the browser cache [...]]]></description>
				<content:encoded><![CDATA[<p>The concept of prefetching is pretty simple. We often know about resources the browser is likely to need before the browser does. Prefetching involves either giving the browser hints of pages or resources it is likely to need so that it can download them ahead of time, or actually downloading resources into the browser cache before needed so that the overhead of requesting and downloading the object can be preemptively handled or done in a non-blocking way.</p>
<p>There are many ways to prefetch content, but here are 3 simple options.</p>
<h2>DNS Prefetching</h2>
<p>DNS is the protocol that converts human readable domains (mysite.com) into computer readable IPs (123.123.123.123). DNS resolution is generally pretty fast and measured in 100&#8242;s of milliseconds, but because it must happen before any request to the server can be made it can cause a cascade effect that has a real impact on the overall load time of a page. Often we know about several other domains that will need to be loaded for resources later in the page or user session, such as subdomains for static content (images.mydomain.com) or domains for 3rd party content. Some browsers support a meta tag that identifies these domains that need to be resolved so the browser can resolve them ahead of time. The tag to do this is pretty straight forward:</p>
<div class="hl-main">
<pre><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">//my.domain.com</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">dns-prefetch</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-brackets">/&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">http://my.domain.com/</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">prefetch</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-brackets">/&gt;</span><span class="hl-code"> </span><span class="hl-brackets">&lt;</span><span class="hl-code">!– </span><span class="hl-var">IE9</span><span class="hl-code">+ –</span><span class="hl-brackets">&gt;</span></pre>
</div>
<p>Adding this tag causes the browser to do the DNS resolution ahead of time, instead of waiting until a resource requires it later. This technique is probably most valuable to preload DNS for content on other pages on your site that visitors are likely to go to. This feature is supported in Chrome, Firefox, and IE9+.</p>
<p>Although shaving a few hundred milliseconds might seem trivial, in aggregate this can be a measurable gain. It&#8217;s also a safe optimization and easy to implement. I was curious to see how often this technique is used, so I crawled the top 100K Alexa sites. It turns out only 552 sites (0.55%) are currently using DNS prefetching. This is a cheap win, and something more sites should leverage.</p>
<h2>Resource Prefetching</h2>
<p>Images make up a large portion of the overall bytes of many major websites today. Often the overhead of making the requests and downloading images can have a significant performance impact. In many cases, though, the site developer knows when an image will be needed that won&#8217;t be detected early by the browser, such as an image loaded from an ajax request or other user action on the page. Resource prefetching is when you load an image, script, stylesheet, or other resource into the browser preemptively. This is most often done with images, but can be done with any type of resource that can be cached in the browser.</p>
<p>Of the three techniques I&#8217;m covering here, this is by far the oldest and the most used. Unfortunately I can&#8217;t give a concrete number about adoption because there are too many ways to implement this to detect in my Alexa crawl. Still, many sites don&#8217;t properly leverage this technique and even just preloading a few images can make a huge difference for the user experience.</p>
<h2>Page Prefetching / Prerendering</h2>
<p>Page prefetching is very similar to resource prefetching, except that we actually load the new page itself preemptively. This was first made available in Firefox. You can hint to the browser that a page (or an individual resource) should be prefetched by including the following tag:</p>
<div class="hl-main">
<pre><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">prefetch</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">/my-next-page.htm</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span></pre>
</div>
<p>In the case of prerendering, the browser not only downloads the page, but also the necessary resources for that page. It also begins to render the page in memory (not visible to the user) so that when the request for the page is made it can appear nearly instantaneous to the user. Prerendering was first added in Chrome. You can hint that a page should be prerendered by including the following tag:</p>
<div class="hl-main">
<pre><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">prerender</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">http://mydomain.com/my-next-page.htm</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span></pre>
</div>
<p>This technique is by far the most controversial and the riskiest of the three. Prerendering a page should only be done when there is a high confidence that the user will go to that page next. The most well known example of this is Google Search, which will prerender the first result of the page if the confidence is high enough. I found only 95 examples of this in my crawl of the Alexa Top 100k sites. Although this technique is clearly not for every use case, I think many more sites could leverage this to improve the user experience.</p>
<h2>The Downsides</h2>
<p>Prefetching in general is often a controversial topic. Many people argue that it is not efficient and leads to a waste in bandwidth. It also uses client resources unnecessarily (most notably on mobile devices). Also worth mentioning is that in some cases prefetching or prerendering of pages can have adverse effects on analytics and log tracking since there is no obvious way to discern a user visiting the page (and seeing it) or simply the browser prerendering without the user&#8217;s knowledge.</p>
<h2>Conclusion</h2>
<p>Despite all of these cautions, prefetching can be a huge win. The fastest request is always the one we never have to make and getting as much into the cache as possible is the best way to make that happen. By making these expensive requests when the user is not waiting on them, we can greatly improve the perceived performance of even the slowest sites on the slowest networks. If you&#8217;re not already doing so, it&#8217;s worth trying these techniques on your site. The results will vary, so be sure to use Real User Measurement (e.g. <a href="http://torbit.com">Torbit</a>) to find out how much of an improvement prefetching makes for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/speed-up-your-site-using-prefetching/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Optimizing Your Network Stack for Optimal Mobile Web Performance</title>
		<link>http://calendar.perfplanet.com/2012/optimizing-your-network-stack-for-optimal-mobile-web-performance/</link>
		<comments>http://calendar.perfplanet.com/2012/optimizing-your-network-stack-for-optimal-mobile-web-performance/#comments</comments>
		<pubDate>Sun, 30 Dec 2012 19:19:51 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1549</guid>
		<description><![CDATA[We spend a lot of time at CloudFlare thinking about how to make the Internet fast on mobile devices. Currently there are over 1.2 billion active mobile users and that number is growing rapidly. Earlier this year mobile Internet access passed fixed Internet access in India and that&#8217;s likely to be repeated the world over. [...]]]></description>
				<content:encoded><![CDATA[<p>We spend a lot of time at CloudFlare thinking about how to make the Internet fast on mobile devices. Currently there are over 1.2 billion active mobile users and that number is growing rapidly. Earlier this year mobile Internet access passed fixed Internet access in India and that&#8217;s likely to be repeated the world over. So, mobile network performance will only become more and more important.</p>
<p>Most of the focus today on improving mobile performance is on Layer 7 with front end optimizations (FEO). At CloudFlare, we&#8217;ve done significant work in this area with front end optimization technologies like <a href="https://www.cloudflare.com/features-optimizer">Rocket Loader, Mirage, and Polish</a> that dynamically modify web content to make it load quickly whatever device is being used. However, while FEO is important to make mobile fast, the unique characteristics of mobile networks also means we have to pay attention to the underlying performance of the technologies down at Layer 4 of the network stack.</p>
<p>This article is about the challenges mobile devices present, how the default TCP configuration is ill-suited for optimal mobile performance, and what you can do to improve performance for visitors connecting via mobile networks. Before diving into the details, a quick technical note. At CloudFlare, we&#8217;ve built most of our systems on top of a custom version of Linux so, while the underlying technologies can apply to other operating systems, the examples I&#8217;ll use are from Linux.</p>
<h3>TCP Congestion Control</h3>
<p>To understand the challenges of mobile network performance at Layer 4 of the networking stack you need to understand TCP Congestion Control. TCP Congestion Control is the gatekeeper that determines how to control the flow of packets from your server to your clients. Its goal is to prevent Internet congestion by detecting when congestion occurs and slowing down the rate data is transmitted. This helps ensure that the Internet is available to everyone, but can cause problems on mobile network when TCP mistakes mobile network problems for congestion.</p>
<p>TCP Congestion Control holds back the floodgates if it detects congestion (i.e. packet loss) on the remote end. A network is, inherently, a shared resource. The purpose of TCP Congestion Control was to ensure that every device on the network cooperates to not overwhelm its resource. On a wired network, if packet loss is detected it is a fairly reliable indicator that a port along the connection is overburdened. What is typically going on in these cases is that a memory buffer in a switch somewhere has filled beyond its capacity because packets are coming in faster than they can be sent out and data is being discarded. TCP Congestion Control on clients and servers is setup to &#8220;back off&#8221; in these cases in order to ensure that the network remains available for all its users.</p>
<p>But figuring out what packet loss means on a mobile network is a different matter. Radio networks are inherently susceptible to interference which results in packet loss. If packets are being dropped does that mean a switch is overburdened, like we can infer on a wired network? Or did someone travel from an under-subscribed wireless cell to an oversubscribed one? Or did someone just turn on a microwave? Or maybe it was just a random solar flare? Since it&#8217;s not as clear what packet loss means on a mobile network, it&#8217;s not clear what action a TCP Congestion Control algorithm should take.</p>
<h3>A Series of Leaky Tubes</h3>
<p>To optimize networks for lossy networks like those on mobile networks, it&#8217;s important to understand exactly how TCP Congestion Control algorithms are designed. While the high level concept makes sense, the details of TCP Congestion Control are not widely understood by most people working in the web performance industry. That said, it is an important core part of what makes the Internet reliable and the subject of very active research and development.</p>
<p>To understand how TCP Congestion Control algorithms work, imagine the following analogy. Think of your web server as your local water utility plant. You&#8217;ve built on a large network of pipes in your hometown and you need to guarantee that each pipe is as pressurized as possible for delivery, but you don&#8217;t want to burst the pipes. (Note: I recognize the late Senator Ted Stevens got a lot of flack for describing the Internet as a &#8220;series of tubes,&#8221; but the metaphor is surprisingly accurate.)</p>
<p>Your client, Crazy Arty, runs a local water bottling plant that connects to your pipe network. Crazy Arty&#8217;s infrastructure is built on old pipes that are leaky and brittle. For you to get water to them without bursting his pipes, you need to infer the capability of Crazy Arty&#8217;s system. If you don&#8217;t know in advance then you do a test — you send a known amount of water to the line and then measure the pressure. If the pressure is suddenly lost then you can infer that you broke a pipe. If not, then that level is likely safe and you can add more water pressure and repeat the test. You can iterate this test until you burst a pipe, see the drop in pressure, write down the maximum water volume, and going forward ensure you never exceed it.</p>
<p>Imagine, however, that there&#8217;s some exogenous factor that could decrease the pressure in the pipe without actually indicating a pipe had burst. What if, for example, Crazy Arty ran a pump that he only turned on randomly from time to time and you didn&#8217;t know about. If the only signal you have is observing a loss in pressure, you&#8217;d have no way of knowing whether you&#8217;d burst a pipe or if Crazy Arty had just plugged in the pump. The effect would be that you&#8217;d likely record a pressure level much less than the amount the pipes could actually withstand — leading to all your customers on the network potentially having lower water pressure than they should.</p>
<h3>Optimizing for Congestion or Loss</h3>
<p>If you&#8217;ve been following up to this point then you already know more about TCP Congestion Control than you would guess. The initial amount of water we talked about in TCP is known as the Initial Congestion Window (initcwnd) it is the initial number of packets in flight across the network. The congestion window (cwnd) either shrinks, grows, or stays the same depending on how many packets make it back and how fast (in ACK trains) they return after the initial burst. In essence, TCP Congestion Control is just like the water utility — measuring the pressure a network can withstand and then adjusting the volume in an attempt to maximize flow without bursting any pipes.</p>
<p>When a TCP connection is first established it attempts to ramp up the cwnd quickly. This phase of the connection, where TCP grows the cwnd rapidly, is called Slow Start. That&#8217;s a bit of a misnomer since it is generally an exponential growth function which is quite fast and aggressive. Just like when the water utility in the example above detects a drop in pressure it turns down the volume of water, when TCP detects packets are lost it reduces the size of the cwnd and delays the time before another burst of packets is delivered. The time between packet bursts is known as the Retransmission Timeout (RTO). The algorithm within TCP that controls these processes is called the Congestion Control Algorithm. There are many congestion control algorithms and clients and servers can use different strategies based in the characteristics of their networks. Most of Congestion Control Algorithms focus on optimizing for one type of network loss or another: congestive loss (like you see on wired networks) or random loss (like you see on mobile networks).</p>
<p>In the example above, a pipe bursting would be an indication of congestive loss. There was a physical limit to the pipes, it is exceeded, and the appropriate response is to back off. On the other hand, Crazy Arty&#8217;s pump is analogous to random loss. The capacity is still available on the network and only a temporary disturbance causes the water utility to see the pipes as overfull. The Internet started as a network of wired devices, and, as its name suggests, congestion control was largely designed to optimize for congestive loss. As a result, the default Congestion Control Algorithm in many operating systems is good for communicating wired networks but not as good for communicating with mobile networks.</p>
<p>A few Congestion Control algorithms try to bridge the gap by using the time of the delay in the &#8220;pressure increase&#8221; to &#8220;expected capacity&#8221; to figure out the cause of the loss. These are known as bandwidth estimation algorithms, and examples include <a href="http://en.wikipedia.org/wiki/TCP_Vegas">Vegas</a>, <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.5736">Veno</a> and <a href="http://en.wikipedia.org/wiki/TCP_Westwood_plus">Westwood+</a>. Unfortunately, all of these methods are reactive and reuse no information across two similar streams.</p>
<p>At companies that see a significant amount of network traffic, like CloudFlare or Google, it is possible to map the characteristics of the Internet&#8217;s networks and choose a specific congestion control algorithm in order to maximize performance for that network. Unfortunately, unless you are seeing the large amounts of traffic as we do and can record data on network performance, the ability to instrument your congestion control or build a &#8220;weather forecast&#8221; is usually impossible. Fortunately, there are still several things you can do to make your server more responsive to visitors even when they&#8217;re coming from lossy, mobile devices.</p>
<h3>Compelling Reasons to Upgrade Your Kernel</h3>
<p>The Linux network stack has been under extensive development to bring about some sensible defaults and mechanisms for dealing with the network topology of 2012. A mixed network of high bandwidth low latency and high bandwidth, high latency, lossy connections was never fully anticipated by the kernel developers of 2009 and if you check your server&#8217;s kernel version chances are its running a 2.6.32.x kernel from that era.</p>
<p><code>uname -a</code></p>
<p>There are a number of reasons that if you&#8217;re running an old kernel on your web server and want to increase web performance, especially for mobile devices, you should investigate upgrading. To begin, Linux 2.6.38 bumps the default initcwnd and initrwnd (inital receive window) from <a href="http://www.ietf.org/rfc/rfc3390.txt">3 to 10</a>. This is an easy, big win. It allows for 14.2KB (vs 5.7KB) of data to be sent or received in the initial round trip before slow start grows the cwnd further. This is important for HTTP and SSL because it gives you more room to fit the header in the initial set of packets. If you are running an older kernel you may be able to run the following command on a bash shell (use caution) to set all of your routes&#8217; initcwnd and initrwnd to 10. On average, this small change can be one of the biggest boosts when you&#8217;re trying to maximize web performance.</p>
<p><code>ip route | while read p; do ip route change $p initcwnd 10 initrwnd 10; done</code></p>
<p>Linux kernel 3.2 implements <a href="http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01">Proportional Rate Reduction (PRR)</a>. PRR decreases the time it takes for a lossy connection to recover its full speed, potentially improving HTTP response times by 3-10%. The benefits of PRR are significant for mobile networks. To understand why, it&#8217;s worth diving back into the details of how previous congestion control strategies interacted with loss.</p>
<p>Many congestion control algorithms halve the cwnd when a loss is detected. When multiple losses occur this can result in a case where the cwnd is lower than the slow start threshold. Unfortunately, the connection never goes through slow start again. The result is that a few network interruptions can result in TCP slowing to a crawl for all the connections in the session.</p>
<p>This is even more deadly when combined with tcp_no_metrics_save=0 sysctl setting on unpatched kernels before 3.2. This setting will save data on connections and attempt to use it to optimize the network. Unfortunately, this can actually make performance worse because TCP will apply the exception case to every new connection from a client within a window of a few minutes. In other words, in some cases, one person surfing your site from a mobile phone who has some random packet loss can reduce your server&#8217;s performance to this visitor even when their temporary loss has cleared.</p>
<p>If you expect your visitors to be coming from mobile, lossy connections and you cannot upgrade or patch your kernel I recommend setting tcp_no_metrics_save=1. If you&#8217;re comfortable doing some hacking, you can <a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a262f0cdf1f2916ea918dc329492abb5323d9a6c">patch older kernels.</a></p>
<p>The good news is that Linux 3.2 implements PRR, which decreases the amount of time that a lossy connection will impact TCP performance. If you can upgrade, it may be one of the most significant things you can do in order to increase your web performance.</p>
<h3>More Improvements Ahead</h3>
<p>Linux 3.2 also has another important improvement with RFC2099bis. The initial Retransmission Timeout (initRTO) has been changed to 1s from 3s. If loss happens after sending the initcwnd two seconds waiting time are saved when trying to resend the data. With TCP streams being so short this can have a very noticeable improvement if a connection experiences loss at the beginning of the stream. Like the PRR patch this can also be applied (with modification) to older kernels if for some reason you cannot upgrade (<a href="http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=9ad7c049f0f79c418e293b1b68cf10d68f54fcdb">here&#8217;s the patch</a>).</p>
<p>Looking forward, Linux 3.3 has Byte Queue Limits when teamed with CoDel (controlled delay) in the 3.5 kernel helps fight the long standing issue of <a href="http://www.bufferbloat.net/projects/bloat/wiki/Introduction">Bufferbloat</a> by intelligently managing packet queues. Bufferbloat is when the queuing overhead on the network stack becomes backed up because its littered with stale data. Linux 3.3 has features to auto QoS important packets (SYN/DNS/ARP/etc.,) keep down buffer queues thereby reducing bufferbloat and improving latency on loaded servers.</p>
<p>Linux 3.5 implements <a href="http://tools.ietf.org/html/rfc5827">TCP Early Retransmit</a> with some safeguards for connections that have a small amount of packet reordering. This allows connections, under certain conditions, to trigger fast retransmit and bypass the costly Retransmission Timeout (RTO) mentioned earlier. By default it is enabled in the failsafe mode tcp_early_retrans=2. If for some reason you are sure your clients have loss but no reordering then you could set tcp_early_retrans=1 to save one quarter a RTT on recovery.</p>
<p>One of the most extensive changes to 3.6 that hasn&#8217;t got much press is the removal of the IPv4 routing cache. In a nutshell it was an extraneous caching layer in the kernel that mapped interfaces to routes to IPs and saved a lookup to the Forward Information Base (FIB). The FIB is a routing table within the network stack. The IPv4 routing cache was intended to eliminate a FIB lookup and increase performance. While a good idea in principle, unfortunately it provided a very small performance boost in less than 10% of connections. In the 3.2.x-3.5.x kernels it was extremely vulnerable to certain DDoS techniques so it has been removed.</p>
<p>Finally, one important setting you should check, regardless of the Linux kernel you are running: tcp_slow_start_after_idle. If you&#8217;re concerned about web performance, it has been proclaimed sysctl setting of the year. It can be enabled in almost any kernel. By default this is set to 1 which will aggressively reduce cwnd on idle connections and negatively impact any long lived connections such as SSL. The following command will set it to 0 and can significantly improve performance:</p>
<p><code>sysctl -w tcp_slow_start_after_idle=0</code></p>
<h3>The Missing Congestion Control Algorithm</h3>
<p>You may be curious as to why I haven&#8217;t made a recommendation as far as a quick and easy change of congestion control algorithms. Since Linux 2.6.19, the default congestion control algorithm in the Linux kernel is CUBIC, which is time based and optimized for high speed and high latency networks. Its killer feature, known as called Hybrid Slow Start (HyStart), allows it to safely exit slow start by measuring the ACK trains and not overshoot the cwnd. It can improve startup throughput by up to 200-300%.</p>
<p>While other Congestion Control Algorithms may seem like performance wins on connections experiencing high amounts of loss (>.1%) (e.g., TCP Westwood+ or Hybla), unfortunately these algorithms don&#8217;t include HyStart. The net effect is that, in our tests, they under perform CUBIC for general network performance. Unless a majority of your clients are on lossy connections, I recommend staying with CUBIC.</p>
<p>Of course the real answer here is to dynamically swap out congestion control algorithms based on historical data to better serve these edge cases. Unfortunately, that is difficult for the average web server unless you&#8217;re seeing a very high volume of traffic and are able to record and analyze network characteristics across multiple connections. The good news is that loss predictors and hybrid congestion control algorithms are continuing to mature, so maybe we will have an answer in an upcoming kernel.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/optimizing-your-network-stack-for-optimal-mobile-web-performance/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Giving Your Images An Extra Squeeze</title>
		<link>http://calendar.perfplanet.com/2012/giving-your-images-an-extra-squeeze/</link>
		<comments>http://calendar.perfplanet.com/2012/giving-your-images-an-extra-squeeze/#comments</comments>
		<pubDate>Sat, 29 Dec 2012 22:53:04 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[images]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1560</guid>
		<description><![CDATA[According to the latest HTTP archive stats, the average Web page weighs 1286KB, and 60% of that is image data. That means that properly compressing image data is of utmost importance for the overall page content size and hence its loading time. It also has a significant impact on the data plan hit users incur [...]]]></description>
				<content:encoded><![CDATA[<p>According to the latest HTTP archive stats, the average Web page weighs 1286KB, and 60% of that is image data. That means that properly compressing image data is of utmost importance for the overall page content size and hence its loading time. It also has a significant impact on the data plan hit users incur when they browse the Web on their mobile devices.</p>
<p><img title="" alt="Byte distribution per content type: Images 793KB, Scripts 211KB, Stylesheets 35KB, Flash 92KB, HTML 54KB, Other 101KB - Total 1286KB" src="http://chart.apis.google.com/chart?chs=400x225&amp;cht=p&amp;chco=007099&amp;chd=t:54,793,211,35,92,101&amp;chds=0,793&amp;chdlp=b&amp;chdl=total%201286%20kB&amp;chl=HTML+-+54+kB%7CImages+-+793+kB%7CScripts+-+211+kB%7CStylesheets+-+35+kB%7CFlash+-+92+kB%7COther+-+101+kB&amp;chma=|5&amp;chtt=Average+Bytes+per+Page+by+Content+Type" /></p>
<p>Yet, when we look at the actual numbers &#8220;in the wild&#8221;, we see that few developers actually compress their images, and even for those that do, the results are not always ideal.</p>
<p>A few months ago, I downloaded 5.8 million images from Alexa&#8217;s top 200,000 sites. Using that image data, I&#8217;ll demonstrate how much data can be saved by properly compressing images.</p>
<h2>Image Formats</h2>
<p>I&#8217;m sure most of you know this by now, but here is a short overview of the image formats on the Web:</p>
<ul>
<li><strong>GIF &#8211; </strong>Best suited for computer generated images with relatively few number of colors. It works by choosing a palette of up to 256 colors that best fits the image, creating a bitmap that represents the image using the palette&#8217;s color numbers, and then compressing that bitmap using a generic compression algorithm. The format supports animation and transparency, but not a full alpha channel.</li>
<li><strong>PNG &#8211; </strong>Best suited for computer generated images, but can represent more than 256 colors. The format has several subtypes. The subtype usually referred to as PNG8 is very similar to GIF, but uses a different compression algorithm. It does not support animation, but does support a full alpha channel. The subtypes referred to as PNG24 and PNG24α can represent the full RGB color space, with the latter also supporting a full alpha channel. The downside is that both PNG24 subtypes are represented as bitmaps to which a generic compression algorithm is applied. This is usually not ideal in terms of byte size.</li>
<li><strong>JPEG &#8211; </strong>Best suited for real life photos. It is not a bitmap based format, but represents the images by storing the frequency of color changes between different pixels, eliminating high frequencies that humans are likely not to notice anyway, and then compressing that. It is a <em>lossy</em> image format, which means a JPEG cannot be converted to the original bitmap image with perfect accuracy. For most uses on the Web, this is not a limitation.</li>
<li><strong>WebP &#8211; </strong>Best suited for <em>both</em> real life photos and computer generated images, since it can employ both lossy and lossless techniques. Based on the VP8 video codec, the WebP format uses predictive coding to achieve its high lossy compression rates and the latest entropy coding techniques to achieve better lossless results. It also supports a full alpha channel and animation.What&#8217;s the catch, then? The main issue is that WebP is not really part of the Web platform&#8217;s &#8220;official&#8221; formats since it is only supported by Chrome and Opera at the present. The lack of simple fallback mechanisms (both <a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=20214">client</a> and <a href="http://www.igvita.com/2012/12/18/deploying-new-image-formats-on-the-web/">server</a> side) poses a high barrier of entry for developers that want to use WebP today.</li>
</ul>
<p>Here&#8217;s a look at the presence each format has on the Web today based on bytes.</p>
<h2>Format Distribution</h2>
<table>
<thead>
<tr class="imagemenot">
<th>Image format</th>
<th>% in bytes</th>
</tr>
</thead>
<tbody>
<tr>
<td>JPG</td>
<td>66.9%</td>
</tr>
<tr>
<td>Animated GIF</td>
<td>6.4%</td>
</tr>
<tr>
<td>Non-animated GIF</td>
<td>5.3%</td>
</tr>
<tr>
<td>PNG8</td>
<td>1.3%</td>
</tr>
<tr>
<td>PNG24</td>
<td>5.2%</td>
</tr>
<tr>
<td>PNG24α</td>
<td>14.3%</td>
</tr>
<tr>
<td>icons</td>
<td>0.4%</td>
</tr>
<tr>
<td>bitmaps</td>
<td>0.2%</td>
</tr>
</tbody>
</table>
<p>Some of you may say: &#8220;You forgot SVG!&#8221;. I didn&#8217;t. SVG comprise only 0.001% of the overall image data, so it didn&#8217;t make it into the format distribution table. Sad, but true.</p>
<h2>Lossless Optimization</h2>
<p>In my quest for finding image optimization opportunities, I first sought to find the savings that could be achieved without any compromise on quality. I ran lossless optimizations on JPEG and PNG using the <code>jpegtran</code> and <code>pngcrush</code> utilities, as well as conversion to lossless WebP. The results are below.</p>
<table>
<thead>
<tr class="imagemenot">
<td>Optimization</td>
<td>Data Reduction</td>
</tr>
</thead>
<tbody>
<tr>
<td>JPEG EXIF removal</td>
<td>6.6%</td>
</tr>
<tr>
<td>JPEG EXIF removal, optimized Huffman</td>
<td>13.3%</td>
</tr>
<tr>
<td>JPEG EXIF removal, optimized Huffman, Convert to progressive</td>
<td>15.1%</td>
</tr>
<tr>
<td>PNG8 pngcrush</td>
<td>2.6%</td>
</tr>
<tr>
<td>PNG8 lossless WebP</td>
<td>23%</td>
</tr>
<tr>
<td>PNG24 pngcrush</td>
<td>11%</td>
</tr>
<tr>
<td>PNG24 lossless WebP</td>
<td>33.1%</td>
</tr>
<tr>
<td>PNG24α pngcrush</td>
<td>14.4%</td>
</tr>
<tr>
<td>PNG24α lossless WebP</td>
<td>42.5%</td>
</tr>
</tbody>
</table>
<p>Overall with these lossless optimization techniques about 12.75% of image data can be saved. That is 101KB for an average page! If we use the lossless variant of WebP, we can save 18.2% of overall image data for browsers that support it, which is 144KB.</p>
<h2>Lossy Optimization</h2>
<p>Now let&#8217;s see what happens when we are willing to (slightly) compromise quality for the sake of data savings. I used the <a href="http://en.wikipedia.org/wiki/Structural_similarity">SSIM</a> index in order to get an objective idea of the trade-off we make between quality and byte size. Basically, an SSIM score of 100% means identical images. Lower SSIM score means a bigger difference between the images.</p>
<h3>JPEG</h3>
<p>Using <code>ImageMagick</code> I compressed JPEGs several levels of quality. Then I applied the lossless optimizations that we saw above to them, in order to squeeze these images some more. I also compressed the images using <a href="https://github.com/rflynn/imgmin">imgmin</a> which is a utility that deploys binary search to find the ideal quality level for each image. Finally, I ran JPEG to WebP conversion to see if the benefits match Google&#8217;s result of 30% data reduction.</p>
<table>
<thead>
<tr class="imagemenot">
<th>Quality Level</th>
<th>Data Reduction</th>
<th>SSIM</th>
</tr>
</thead>
<tbody>
<tr>
<td>75</td>
<td>50%</td>
<td>96.22%</td>
</tr>
<tr>
<td>50</td>
<td>64.6%</td>
<td>92.28%</td>
</tr>
<tr>
<td>30</td>
<td>73.3%</td>
<td>89.13%</td>
</tr>
<tr>
<td>imgmin</td>
<td>38.6%</td>
<td>97.52%</td>
</tr>
<tr>
<td>WebP 75</td>
<td>68%</td>
<td>95.28%</td>
</tr>
</tbody>
</table>
<p>WebP gives us compression levels close to &#8220;quality 30&#8243; with &#8220;quality 75&#8243; image quality. Another way to look at this is that WebP files are 37% smaller than the size of JPEGs with equivalent quality.</p>
<h3>PNG24</h3>
<p>I tried several lossy optimizations on these images: <a href="https://twitter.com/pornelski">Kornel Lesiński</a>&#8216;s <a href="https://github.com/pornel/improved-pngquant">improved pngquant</a>, conversion to JPEG using ImageMagick+jpegtran and conversion to WebP.</p>
<table>
<thead>
<tr class="imagemenot">
<th>Method</th>
<th>Setting</th>
<th>Data Reduction</th>
<th>SSIM</th>
</tr>
</thead>
<tbody>
<tr>
<td>pngquant</td>
<td>256</td>
<td>57.1%</td>
<td>99.8%</td>
</tr>
<tr>
<td>pngquant + lossless WebP</td>
<td>256</td>
<td>63.2%</td>
<td>99.8%</td>
</tr>
<tr>
<td>JPEG</td>
<td>75</td>
<td>77%</td>
<td>94.6%</td>
</tr>
<tr>
<td>WebP</td>
<td>75</td>
<td>84.7%</td>
<td>95.1%</td>
</tr>
</tbody>
</table>
<p>I&#8217;m not sure what&#8217;s more impressive here, pngquant&#8217;s 57.1% data reduction with practically zero quality loss, or JPEG&#8217;s and WebP&#8217;s results. Here again, the WebP files were 33% smaller than JPEG. Lossless WebP gave an extra 14.2% compression when applied to PNGs after pngquant. Note: I avoided converting PNGs smaller than 500 bytes to JPEG since this usually resulted in larger file sizes.</p>
<h3>PNG24α</h3>
<p>For PNGs with an alpha channel, I couldn&#8217;t use the above conversion to JPEG, since JPEG doesn&#8217;t have an alpha channel. Also, because of problems the original <a href="http://mehdi.rabah.free.fr/SSIM/">SSIM</a> utility I used had with a full alpha channel, I used Kornel&#8217;s <a href="https://github.com/pornel/dssim">dssim</a> utility instead.</p>
<table>
<thead>
<tr class="imagemenot">
<th>Method</th>
<th>Setting</th>
<th>Data Reduction</th>
<th>SSIM</th>
</tr>
</thead>
<tbody>
<tr>
<td>pngquant</td>
<td>256</td>
<td>63.1%</td>
<td>99.8%</td>
<td></td>
</tr>
<tr>
<td>pngquant + lossless WebP</td>
<td>256</td>
<td>69%</td>
<td>99.8%</td>
</tr>
<tr>
<td>WebP</td>
<td>75</td>
<td>77.9%</td>
<td>94.8%</td>
</tr>
</tbody>
</table>
<p>Again, pngquant&#8217;s results are extremely impressive, providing files that are almost 3 times smaller with negligible quality loss. Lossless WebP gave an extra 15.8% compression on these pngquant results. Lossy WebP provides even better compression results with files that are 40% smaller than pngquant and almost 5 time smaller than the original PNGs, although it does that with slight visual quality loss.</p>
<h2>Why Don&#8217;t Developers Compress Their Images?</h2>
<p>While I have no evidence to support that theory, I suspect most developers don&#8217;t compress their images since there is no automated process in place. Depending on the workflow, there are a few options to automate image compression:</p>
<ul>
<li><strong>Build time &#8211; </strong>For static images, adding image compression utilities to the build process can make sure that no static uncompressed images make it through.</li>
<li><strong>Upload time &#8211; </strong>For images that are dynamically added by the site&#8217;s users or administrators, the developers should find a way to add image compression utilities to the upload process. That may not always be easy (e.g. when working with legacy CMSs), but it is essential to avoid serving bloated images to users.</li>
<li><strong>Serving time &#8211; </strong>If neither of the previous options is feasible, there&#8217;s always the possibility to apply image compression before the images are served to the user. The open source option here is <a href="http://code.google.com/p/modpagespeed/">mod page speed</a>&#8216;s <a href="https://developers.google.com/speed/docs/mod_pagespeed/filter-image-optimize">image optimization filters</a>. Otherwise, commercial options are also available.</li>
</ul>
<p>Each developer should choose the optimization options that fit his workflow best, but <em>everyone</em> should automate image optimization, otherwise there&#8217;s a strong chance it will not happen.</p>
<h2>Conclusions</h2>
<p>Even though every Web developer knows that images must be properly compressed, very few actually do that optimally, as we can see from the extra compression we can squeeze out of the Web&#8217;s images, with no or little compromise in terms of quality.</p>
<p>Even if developers only choose the truly lossless path, image data can be reduced by 12.75% or 100KB per page. Using lossless WebP turns that into 18.2% or 144KB for supporting browsers.</p>
<p>If every Web developer would employ maximal lossy and lossless techniques to compress his site&#8217;s images to the maximal extent, with practically non existent visual impact (i.e. imgmin for JPEG, pngquant for PNG24), the current average page size image data could be reduced by 37.8% or 300KB!</p>
<p>Willingness to apply more lossy techniques (but still maintain good visual quality), can result in 47.5% image data savings or 368KB.</p>
<p>Using WebP would increase the savings to 61% of image data or 483KB for browsers that support it.</p>
<p>That&#8217;s huge! Image compression is something that every one of us should add into our workflow, since it can save a large chunk of your site&#8217;s Web traffic. All the tools I used are free and open-source software. There are no excuses!</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/giving-your-images-an-extra-squeeze/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Progressive jpegs: a new best practice</title>
		<link>http://calendar.perfplanet.com/2012/progressive-jpegs-a-new-best-practice/</link>
		<comments>http://calendar.perfplanet.com/2012/progressive-jpegs-a-new-best-practice/#comments</comments>
		<pubDate>Fri, 28 Dec 2012 23:42:25 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1557</guid>
		<description><![CDATA[Bandwidth-wise, images are hogs. They are the largest average web site payload (62%), and they are most often the content bottleneck. When images arrive, they come tripping onto the page, pushing other elements around and triggering a clumsy repaint. They come “chop chop chop chop chop down” or you get nothing until suddenly “boom!” out [...]]]></description>
				<content:encoded><![CDATA[<style><!--
.positiveCell {background-color: #c0e9b1;} .negativeCell {background-color: #fbbfbc;} .version {font-size: smaller;color: #666;}
--></style>
<p>Bandwidth-wise, images are hogs. They are the largest average web site payload (<a href="http://httparchive.org/interesting.php">62%</a>), and they are most often the content bottleneck. When images arrive, they come tripping onto the page, pushing other elements around and triggering a clumsy repaint. They come “chop chop chop chop chop down” or you get nothing until suddenly “boom!” out of nowhere there it is. We all know what I’m talking about when I say “chop chop down” and “boom” and it makes us a little bit sick, because we sense how much time we’ve lost of our precious, short lives, waiting for pictures to download.</p>
<h2>A missed opportunity</h2>
<p>Photos are the main culprit when it comes to slow rendering. They are the <a href="http://httparchive.org/interesting.php">most common type of image requested</a> and <a href="http://httparchive.org/interesting.php">on average weigh more</a>. They are millions of colors and pixel depth is increasing. They are beautiful, and we don&#8217;t want to compromise on quality. </p>
<p>Web-optimized photos are jpegs, and jpegs come in two flavors: baseline and progressive. A baseline jpeg is a full-resolution top-to-bottom scan of the image, and a progressive jpeg is a series of scans of increasing quality. And that&#8217;s how they render; baseline jpegs paint top to bottom (&#8220;chop chop chop&#8230;&#8221;), and progressive jpegs quickly stake out their territory and refine (or at least that&#8217;s the idea).</p>
<p>Progressive jpegs are better because they are faster. Appearing faster is being faster, and <b>perceived speed is more important that actual speed</b>. Even if we are being greedy about what we are trying to deliver, progressive jpegs give us as much as possible as soon as possible. They assist us in our challenge of delivering big beautiful photos today.</p>
<p>Experimenting locally with a throttled bandwidth, an 80K progressive jpeg beats a 5K baseline jpeg (the same image, downsized) to the page in Firefox on Windows. This should blow your mind. Sure, the progressive jpeg&#8217;s first pass is low-resolution, but it contains as much information, or more, as the small image. And if you are zoomed out, perhaps on a mobile device, you will not notice it&#8217;s low-res. That&#8217;s responsive images working for us right now!</p>
<p><img alt="Progressive jpeg example" src="http://annrobson.com/img/kickass-pjpeg.jpg" /></p>
<p>Basically, progressive jpegs are better. So what&#8217;s the most common type of jpeg online? You guessed it: <b>baseline</b>, and by a very wide margin. In a thousand-image sample, 92.6% are baseline.</p>
<p>No worries, we just need to declare progressive jpegs a best practice and get the rest of the world on-board with us. But in order to declare progressive jpeg a best practice, we need to be confident that it is. And to do so we need to first understand what browser support for this type of jpeg looks like today.</p>
<h2>Reality Check #1</h2>
<p>Progressive jpegs are displayed in all browsers, that&#8217;s not a worry. Our concern is how they render.</p>
<h3>Behavior of progressive jpegs across browsers</h3>
<table>
<tbody>
<tr class="imagemenot">
<th><strong>Browser <span class="version">(specific version tested)</span></strong></th>
<th><strong>Foreground progressive jpeg renders</strong></th>
<th><strong>Background progressive jpeg renders</strong></th>
</tr>
<tr>
<td>Chrome <span class="version">(v 25.0.1323.1 dev Mac, 23.0.1271.97 m Win)</span></td>
<td class="positiveCell">progressively (superfast!)</td>
<td class="positiveCell">progressively (superfast!)</td>
</tr>
<tr>
<td>Firefox <span class="version">(v 15.0.1 Mac, 12.0 Win)</span></td>
<td class="positiveCell">progressively (superfast!)</td>
<td class="negativeCell">instantly after file download (slow)</td>
</tr>
<tr>
<td>Internet Explorer 8</td>
<td class="negativeCell">instantly after file download (slow)</td>
<td class="negativeCell">instantly after file download (slow)</td>
</tr>
<tr>
<td>Internet Explorer 9</td>
<td class="positiveCell">progressively (superfast!)</td>
<td class="negativeCell">instantly after file download (slow)</td>
</tr>
<tr>
<td>Safari <span class="version">(v 6.0 Desktop, v 6.0 Mobile)</span></td>
<td class="negativeCell">instantly after file download (slow)</td>
<td class="negativeCell">instantly after file download (slow)</td>
</tr>
<tr>
<td>Opera <span class="version">(v 11.60)</span></td>
<td class="negativeCell">instantly after file download (slow)</td>
<td class="negativeCell">instantly after file download (slow)</td>
</tr>
</tbody>
</table>
<p>These are disappointing results, but overall, market share and progressive rendering for progressive jpegs are trending upward. Support is currently at about 65% (Chrome + Firefox + IE9).</p>
<p>Unfortunately, the browsers that do not render progressive jpegs progressively render them all at once after download is complete, which makes them less progressive and slower than baseline jpegs. While baseline rendering is not as immediate and smooth as progressive rendering, at least it&#8217;s something while we wait, and the &#8220;chop chop&#8221; is a kind of progress indicator (a good thing). We can&#8217;t underestimate the reassurance we give users when they see something is happening.</p>
<p>By choosing progressive jpegs we are giving a majority of users an excellent experience and a minority — but a significant minority — a worse experience. But if we select baseline jpegs because it is a less poor experience in a minority of views, that&#8217;s a terrible compromise. We need to offer the best experience to our users, and look ahead.</p>
<h2>Reality Check #2</h2>
<p>You might ask &#8220;Aren&#8217;t progressive jpegs bigger than regular jpegs? Don&#8217;t we pay for the &#8216;layers&#8217;?&#8221; This is true for other types of interlaced images, but not jpegs. A progressive jpeg is usually a few kilobytes smaller than its baseline version. Plotting the savings of <a href="http://www.bookofspeed.com/chapter5.html">10000 random baseline jpegs converted to progressive</a>, Stoyan Stefanov discovered a valuable rule of thumb: files that are over 10K will generally be smaller using the progressive option.</p>
<p>It would be an easier sell if we could say progressive jpegs are always smaller, so always make progressive jpegs. Stoyan helps us out here. He says &#8220;One observation about the 10K rule is that when baseline is smaller, it&#8217;s smaller by a small margin. When progressive is smaller it&#8217;s usually a lot smaller. So it&#8217;s ok to say go 100% progressive and you&#8217;ll do better.&#8221;</p>
<p>That&#8217;s exactly what I wanted to hear! For all the baseline jpegs we&#8217;ve been serving, we&#8217;ve been missing opportunities in file size and perceived speed. Choosing the progressive option is win-win, and should always be the default. Then, after all jpegs are progressive, if we want to optimize further, it&#8217;s just a few bytes we&#8217;ll save and only on our smallest images.</p>
<p>The reason baseline jpegs are most common online, no doubt, is because image-optimization tools make them by default. However, all of the ones I looked at &#8212; Photoshop, Fireworks, ImageMagick, jpegtran &#8212; have a progressive option. Therefore, to serve progressive jpegs you&#8217;ll need to consciously modify your image optimization process.</p>
<p>I&#8217;d expect <a href="http://www.smushit.com/ysmush.it/">Smushit</a> to translate baseline jpegs to progressive, and <a href="http://developer.yahoo.com/yslow/smushit/faq.html">sure enough it does</a>. (Smushit, btw, can be run from the command line and integrated into your image optimization process.)</p>
<p>How do you know if your jpegs are progressive? Here are a few ways to identify jpeg type:</p>
<ol>
<li><b>ImageMagick</b> — On the command line run: identify -verbose mystery.jpg | grep Interlace The output will either be &#8220;Interlace: JPEG&#8221; or &#8220;Interlace: None.&#8221;</li>
<li><b>Photoshop</b> — Open file. Select File -&gt; Save for Web &amp; Devices. If it&#8217;s a progressive jpeg, the Progressive checkbox will be selected.</li>
<li><b>Any browser</b> — Baselines jpegs will load top to bottom, and progressive jpegs will do something else. If the file loads too fast you may need to add bandwidth throttling. I use ipfw on my Mac.</li>
</ol>
<h3>Reality Check #3</h3>
<p>According to <a href="http://www.faqs.org/faqs/jpeg-faq/part1/section-11.html"> this progressive jpeg FAQ</a>, each progressive scan requires about the same amount of CPU as the entire baseline jpeg would take to render. This is not a concern for desktops but possibly for mobile devices.</p>
<p>The extra computation is a disadvantage but not a deal breaker. Delivering photos on small hardware is a challenge regardless. I know this because I&#8217;m writing a photo gallery application with infinite scrolling and it crashes on iPad. If you are handling a lot of images, you will have challenges on mobile anyway &#8212; different challenges.</p>
<p>As we&#8217;ve seen in the chart, Mobile Safari does not render progressive jpegs progressively anyway, and probably because they tax the CPU. But this is not a new image file format. Therefore, it wasn&#8217;t an option for browsers, even mobile browsers, to choose not to support progressive jpegs. Hopefully soon mobile browsers will leverage progressive rendering, but it makes sense why they currently don&#8217;t. It&#8217;s also a crying shame; we could really use the speed and file size savings progressive jpegs give us for mobile. When I said they are a kind of solution for responsive images right now, well, they would be, but aren&#8217;t yet.</p>
<h2>Onward</h2>
<p>In the last few days, Google got on-board with their <a href="https://code.google.com/p/modpagespeed/">Mod_Pagespeed</a> service, making convert_jpeg_to_progressive a <a href="http://googledevelopers.blogspot.com/2012/12/new-modpagespeed-cache-advances.html">core filter</a>. <a href="http://www.chromium.org/spdy/spdy-whitepaper">SPDY</a> does as well, <a href="https://developers.google.com/speed/docs/mod_pagespeed/filter-image-optimize#progressive">translating jpegs that are over 10K to progressive by default</a>, following Stoyan&#8217;s rule of thumb. This will make browsers that support incremental display seem much faster. As you can see in the chart above that includes Google Chrome, so it makes sense that Google would make this choice. I&#8217;m not going to say that because &#8220;do-no-evil-make-the-web-faster&#8221; Google has selected progressive jpeg to be a best practice so should we. But it&#8217;s more data and validation. Most importantly, it shows that progressive jpeg &#8212; a format that has been in a kind of deep freeze for a decade &#8212; has staged a comeback.</p>
<p>And even though not all current browsers make use of progressive jpeg&#8217;s progressive rendering, the ones that do really benefit, and we get file size savings across the board. It&#8217;s our best option today and we should use it. Progressive jpegs are the future, not the past.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/progressive-jpegs-a-new-best-practice/feed/</wfw:commentRss>
		<slash:comments>101</slash:comments>
		</item>
		<item>
		<title>SPOFCheck &#8211; Fighting Frontend SPOF at its root</title>
		<link>http://calendar.perfplanet.com/2012/spofcheck-fighting-frontend-spof-at-its-root/</link>
		<comments>http://calendar.perfplanet.com/2012/spofcheck-fighting-frontend-spof-at-its-root/#comments</comments>
		<pubDate>Thu, 27 Dec 2012 19:21:02 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1553</guid>
		<description><![CDATA[With the increase in 3rd party widgets and modernization of web applications, Frontend Single Point Of Failure (SPOF) has become a critical focus point. Thanks to Steve Souders for his initial research on this topic, we now have a list of common patterns which causes SPOF. The awareness of Frontend SPOF has also increased tremendously [...]]]></description>
				<content:encoded><![CDATA[<p>With the increase in 3rd party widgets and modernization of web applications, Frontend Single Point Of Failure (<a href="http://en.wikipedia.org/wiki/Single_point_of_failure">SPOF</a>) has become a critical focus point. Thanks to <a href="https://twitter.com/souders">Steve Souders</a> for his <a href="http://www.stevesouders.com/blog/2010/06/01/frontend-spof/">initial research</a> on this topic, we now have a list of common patterns which causes SPOF. The awareness of Frontend SPOF has also increased tremendously among engineers, thanks to some of the recent blogs and <a href="http://calendar.perfplanet.com/2012/spof-bug/">articles</a> emphasizing the importance of it.</p>
<p>There are already a bunch of utilities and plugins out there which can detect possible SPOF vulnerabilities in a web application. The most notable ones being <a href="http://blog.patrickmeenan.com/2011/10/testing-for-frontend-spof.html">webpagetest.org</a>, <a href="https://chrome.google.com/webstore/detail/spof-o-matic/plikhggfbplemddobondkeogomgoodeg?hl=en-US">SPOF-O-Matic</a> chrome plugin and <a href="http://www.phpied.com/3po/">YSlow 3PO</a> extension. At eBay we wanted to detect SPOF at a very early stage, during the development cycle itself. This means an additional hook in our automated testing pipeline. The solution resulted in creating a simple tool which works on our test URLs and produces SPOF alerts. The tool is <strong><a href="http://senthilp.github.com/spofcheck/">SPOFCheck</a></strong>.</p>
<p>SPOFCheck is a <a href="http://en.wikipedia.org/wiki/Command-line_interface">Command Line Interface</a> (CLI) built in Node.js to detect possible Frontend SPOF for web pages. The output is generated in an XML format that can be consumed and reported by <a href="http://en.wikipedia.org/wiki/Continuous_integration">CI</a> jobs. The tool is integrated with our secondary jobs, which run daily automation on a testing server where a development branch is deployed. In case of a SPOF alert, engineers are notified and they act on it accordingly. This process ensures that SPOFs are contained within the development cycle and do not sneak into staging or production.</p>
<h2>The command line interface</h2>
<p>SPOFCheck provides a simple command line interface and runs on Node.js</p>
<p>To install SPOFCheck run the following</p>
<pre>$ npm install -g spofcheck</pre>
<p>To run SPOFCheck, use the following format</p>
<div class="hl-main">
<pre><span class="hl-identifier">spofcheck</span><span class="hl-brackets">[</span><span class="hl-var">options</span><span class="hl-brackets">]</span><span class="hl-identifier">*</span><span class="hl-brackets">[</span><span class="hl-var">urls</span><span class="hl-brackets">]</span><span class="hl-identifier">*</span><span class="hl-identifier">Options</span><span class="hl-code">
--</span><span class="hl-identifier">help</span><span class="hl-code"> | -</span><span class="hl-identifier">h</span><span class="hl-identifier">Displays</span><span class="hl-identifier">this</span><span class="hl-identifier">information</span><span class="hl-code">.
--</span><span class="hl-identifier">format</span><span class="hl-code">=&lt;</span><span class="hl-identifier">format</span><span class="hl-code">&gt; | -</span><span class="hl-identifier">f</span><span class="hl-code"> &lt;</span><span class="hl-identifier">format</span><span class="hl-code">&gt;             </span><span class="hl-identifier">Indicate</span><span class="hl-identifier">which</span><span class="hl-identifier">format</span><span class="hl-brackets">[</span><span class="hl-var">junit-xml</span><span class="hl-code"> | </span><span class="hl-var">spof-xml</span><span class="hl-code"> | </span><span class="hl-var">text</span><span class="hl-brackets">]</span><span class="hl-identifier">to</span><span class="hl-identifier">use</span><span class="hl-identifier">for</span><span class="hl-identifier">output</span><span class="hl-code">.
--</span><span class="hl-identifier">outputdir</span><span class="hl-code">=&lt;</span><span class="hl-identifier">dir</span><span class="hl-code">&gt; | -</span><span class="hl-identifier">o</span><span class="hl-code"> &lt;</span><span class="hl-identifier">dir</span><span class="hl-code">&gt;                </span><span class="hl-identifier">Outputs</span><span class="hl-identifier">the</span><span class="hl-identifier">spof</span><span class="hl-identifier">results</span><span class="hl-identifier">to</span><span class="hl-identifier">this</span><span class="hl-identifier">directory</span><span class="hl-code">.
--</span><span class="hl-identifier">rules</span><span class="hl-code">=&lt;</span><span class="hl-identifier">rule</span><span class="hl-brackets">[</span><span class="hl-code">,</span><span class="hl-var">rule</span><span class="hl-brackets">]</span><span class="hl-code">+&gt; | -</span><span class="hl-identifier">r</span><span class="hl-code"> &lt;</span><span class="hl-identifier">rule</span><span class="hl-brackets">[</span><span class="hl-code">,</span><span class="hl-var">rule</span><span class="hl-brackets">]</span><span class="hl-code">+&gt;  </span><span class="hl-identifier">Indicate</span><span class="hl-identifier">which</span><span class="hl-identifier">rules</span><span class="hl-identifier">to</span><span class="hl-identifier">include</span><span class="hl-code">.
--</span><span class="hl-identifier">print</span><span class="hl-code"> | -</span><span class="hl-identifier">p</span><span class="hl-identifier">Outputs</span><span class="hl-identifier">the</span><span class="hl-identifier">results</span><span class="hl-identifier">in</span><span class="hl-identifier">console</span><span class="hl-code">,
                                              </span><span class="hl-identifier">instead</span><span class="hl-identifier">of</span><span class="hl-identifier">saving</span><span class="hl-identifier">to</span><span class="hl-identifier">a</span><span class="hl-identifier">file</span><span class="hl-code">.
--</span><span class="hl-identifier">quiet</span><span class="hl-code"> | -</span><span class="hl-identifier">q</span><span class="hl-identifier">Keeps</span><span class="hl-identifier">the</span><span class="hl-identifier">console</span><span class="hl-identifier">clear</span><span class="hl-identifier">from</span><span class="hl-identifier">logging</span><span class="hl-code">.</span></pre>
</div>
<p>Example</p>
<pre>$ spofcheck -f junit-xml -o /tests www.ebay.com www.amazon.com</pre>
<h2>Rules</h2>
<p>SPOFCheck by default runs with 5 rules (checks). The rules are maintained in the <a href="https://github.com/senthilp/spofcheck/blob/master/lib/rules.js">rules.js</a> file. New rules are easily added by pushing entries to the <a href="https://github.com/senthilp/spofcheck/blob/master/lib/rules.js#L6">rules</a> array or calling the spof API <a href="https://github.com/senthilp/spofcheck/blob/master/lib/engine.js#L142">registerRules</a>. The default rules come from Souders’s original <a href="http://www.stevesouders.com/blog/2010/06/01/frontend-spof/">list</a> outlined below.</p>
<ol>
<li><code>3rdparty-scripts</code> &#8211; Always load 3rd party external scripts asyncronously in a non-blocking pattern</li>
<li><code>application-js</code> &#8211; Load application JS in a non-blocking pattern or towards the end of page</li>
<li><code>fontface-stylesheet</code> &#8211; Try to inline @font-face style. Also make the font files compressed and cacheable</li>
<li><code>fontface-inline</code> &#8211; Make sure the fonts files are compressed, cached and small in size</li>
<li><code>fontface-inline-precede-script-IE</code> &#8211; Make sure inlined @font-face is not preceded by a SCRIPT tag, causes SPOF in IE</li>
</ol>
<h2 id="output">Output</h2>
<p>SPOFCheck creates a file and writes results in one of these formats:</p>
<ul>
<li><code>junit-xml</code> &#8211; a format most CI servers can parse (the default format)</li>
<li><code>spof-xml</code> &#8211; an XML format that can be consumed by other utilities</li>
<li><code>text</code> &#8211; a textual representation of the results</li>
</ul>
<p>The format can be specified using the <code>--format</code> or <code>-f</code> option. For just printing results, i.e. no file creation, use the <code>--print</code> or <code>-p</code> option.</p>
<h2 id="a_spof_free_world">A SPOF Free World</h2>
<p>Our primary goal was to eradicate frontend SPOF before it creeps in, and SPOFCheck is a step towards it. Go ahead, give <a href="http://senthilp.github.com/spofcheck/">SPOFCheck</a> a try and integrate it with your CI environments. Let’s build a SPOF Free World <img alt=":-)" src="http://calendar.perfplanet.com/wp-includes/images/smilies/icon_smile.gif" />.</p>
<p>Thanks to github projects <a href="https://github.com/pmeenan/spof-o-matic">spof-o-matic</a> and <a href="https://github.com/stoyan/yslow">3po</a>, a lot of the code logic has been re-used here. The design and packaging of the tool is based on <a href="https://github.com/stubbornella/csslint">csslint</a>, thanks to <a href="https://twitter.com/slicknet">Nicholas Zakas</a> and <a href="https://twitter.com/stubbornella">Nicole Sullivan</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/spofcheck-fighting-frontend-spof-at-its-root/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Deciphering the Critical Rendering Path</title>
		<link>http://calendar.perfplanet.com/2012/deciphering-the-critical-rendering-path/</link>
		<comments>http://calendar.perfplanet.com/2012/deciphering-the-critical-rendering-path/#comments</comments>
		<pubDate>Wed, 26 Dec 2012 23:30:20 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1548</guid>
		<description><![CDATA[As Steve pointed out in an earlier post, window.onload is not the best metric for measuring website speed. It is a convenient metric, and a familiar one, but it fails to capture the dynamic nature of most modern pages. Instead, we want to think about the user perceived performance of the page: how quickly can [...]]]></description>
				<content:encoded><![CDATA[<p>As Steve pointed out in an <a href="http://calendar.perfplanet.com/2012/moving-beyond-window-onload/">earlier post</a>, window.onload is not the best metric for measuring website speed. It is a convenient metric, and a familiar one, but it fails to capture the dynamic nature of most modern pages. Instead,  we want to think about the user perceived performance of the page: how quickly can the user begin interacting with the page?</p>
<p>The definition of &#8220;interacting&#8221; will vary depending on your page. For some, this may be as simple as getting the text visible on the page, such that the user can begin consuming the information they requested (e.g. this page). For others, this may require wiring up dozens of JavaScript components to build up a JavaScript UI (e.g. Gmail). However, in both cases, there is one prerequisite: the user must be able to see the page, which is to say, the browser needs to render <em>something</em> to the screen.</p>
<p>So, with that in mind, what does it actually take to do a first content render in a modern browser?</p>
<h2>DOM + CSSOM = Render Tree</h2>
<p>The exact timing and behavior of the rendering pipeline will, of course, vary based on the parsing, layout and compositing pipelines of the browser. However, implementation differences aside, to get anything visible on the screen, all browsers must construct something resembling a &#8220;render tree&#8221;.</p>
<p><img src="http://www.igvita.com/posts/12/doc-render.png" alt="document render steps" /></p>
<p>The parsing of the HTML document is what constructs the DOM. In parallel, there is an oft forgotten cousin, the CSSOM, which is constructed from the specified stylesheet rules and resources. The two are then combined to create the &#8220;render tree&#8221;, at which point the browser has enough information to perform a layout and paint something to the screen. So far, so good.</p>
<p>However, the diagram above shows an optimistic case: both the CSSOM and the DOM trees are shown as being constructed in parallel. This is where we must, unfortunately, introduce our favorite friend and foe &#8211; JavaScript.</p>
<ul>
<li>Synchronous JavaScript can issue a doc.write at any point; hence the DOM tree construction is blocked anytime a synchronous script is encountered</li>
<li>JavaScript can query for a computed style of any object, which means it can also block on CSS</li>
</ul>
<p><img src="http://www.igvita.com/posts/12/doc-render-js.png" alt="document render steps, with JavaScript" /></p>
<p>Instead of nice, parallel construction of the DOM and CSSOM objects shown in the earlier diagram, the two are now potentially intertwined: DOM construction can&#8217;t proceed until JavaScript is executed, and JavaScript can&#8217;t proceed until CSSOM is available. Yikes.</p>
<p>Depending on how this dependency graph is resolved on your pages, which is governed by how, and how many resources you include in that first &#8220;critical path&#8221; of the page load, the time to first render will vary accordingly. Can we get some metrics, or insights into this process? Turns out, yes we can!</p>
<h2>Document Interactive &amp; DOMContentLoaded</h2>
<p>The HTML5 spec defines a <a href="http://www.w3.org/TR/html5/syntax.html#the-end">well documented sequence of steps</a> which the user agent must follow while constructing the page. Specifically, the end sequence captures two states, which can help answer our earlier question:</p>
<ul>
<li>The document is marked as &#8220;interactive&#8221; when the user agent stops parsing the document. Meaning, the DOM tree is ready.</li>
<li>The user agent fires the DOMContentLoaded (DCL) event once any scripts marked with &#8220;defer have been executed, and there are no stylesheets that are blocking scripts. Meaning, the CSSOM is ready.</li>
</ul>
<p>If no synchronous JavaScript is thrown into the mix, then the DOM and CSSOM construction can proceed in parallel. Things get more interesting once we introduce JavaScript into the picture.</p>
<p>If you add a script and tag it with &#8220;defer&#8221;, then you unblock the construction of the DOM: the document interactive state does not have to wait for execution of JavaScript. However, note that this same script will be executed <b>before</b> DCL is fired. Further, recall that JavaScript may query CSSOM, which means that the DCL event may be held until the CSSOM is ready, at which point the script will be executed. In short: we&#8217;ve unblocked the &#8220;document interactive&#8221; state, but we&#8217;re still potentially blocking DCL.</p>
<p>If you add a script and tag it with &#8220;async&#8221;, then you inherit similar behavior as above, but with one distinction: DCL does not have to wait for execution of async scripts!</p>
<p>The first important takeaway here is that by default, JavaScript will block DOM construction, which may block on CSSOM. Sync scripts are bad, but you already knew that. Marking scripts with &#8220;defer&#8221; and &#8220;async&#8221; makes an implicit promise to the document parser that you will not use doc.write, which in turn allows it to unblock DOM construction.</p>
<p>Second takeaways is: if at any point we must wait for JavaScript execution, then we will have to first wait for the CSSOM construction to finish. In other words, there is a hard dependency edge between JavaScript and CSS&#8230; Stylesheets at the top, scripts at the bottom? Now you know why.</p>
<p>Ok! This is all great in theory, but is this practical knowledge to help us optimize pages? Neither metric is a direct indicator of when the page will be painted, but monitoring either or both is a step in the right direction towards our ultimate goal of improving perceived performance.</p>
<h2>Tracking the critical path of your page</h2>
<p>If nothing else, <a href="https://developer.mozilla.org/en-US/docs/DOM/document.readyState">monitoring</a> &#8220;document interactive&#8221; will give you a good indicator of whether you are blocking DOM construction due to synchronous scripts. Sometimes, there is no way around this behavior, but this should be a known fact and a tradeoff, not an implicit &#8220;that&#8217;s how it works&#8221;.</p>
<p>The DCL event is also a critical milestone. Many popular libraries, such as JQuery, will begin executing their code once it fires. In other words, this is likely the first point at which your client code can begin interacting with the page, as well as provide meaningful feedback to the user. If you do your job right, then through the magic of progressive enhancement, you can get the skeleton of the page up, such that the user can begin interacting with the page while the browser continues to load the remaining assets. The IE team has an excellent example illustrating the <a href="http://ie.microsoft.com/testdrive/HTML5/DOMContentLoaded/Default.html">difference between DCL and the window.onload</a> events.</p>
<h2>When does your DOMContentLoaded fire?</h2>
<p>What you can measure, you can optimize. Even better, Navigation Timing spec already captures all the events we need: domInteractive, domContentLoadedEvent{Start,End}, and loadEvent{Start,End}. If you are already tracking the onload event already, then you might want to add the two events we&#8217;ve have covered here as well!</p>
<p>On that note, if you are using Google Analytics, then Christmas came early this year. The team recently added a new &#8220;<b>DOM Timings</b>&#8221; section. Guess which values it tracks? Yep.</p2>
<p><img src="http://www.igvita.com/posts/12/ga-dcl.png" alt="Google Analytics DOM timing report" /></p>
<p>Login into your GA account and head to &#8220;Content > <a href="http://support.google.com/analytics/bin/answer.py?hl=en&amp;answer=1205784">Site Speed</a>&#8220;. Once there, head to the &#8220;Performance&#8221; tab to see the timing histograms for all of your pages, or drill into the stats for a particular page. From there, you can track your document interactive, DCL, and onload events.</p>
<p>Just for fun, here is a side by side comparison of the DCL vs. onload histograms for my site:</p>
<p><img src="http://www.igvita.com/posts/12/igvita-dcl-onload.png" alt="DCL vs. onload histogram" /></p>
<p>The median time to DCL is under 1s, whereas the median for onload is ~1.5s. The relatively high DCL timing immediately tells me that there is likely a script that is blocking the construction of the DOM &#8211; something I should revisit. Having said that, the fact that there is a ~0.5s delta between DCL and onload tells me that I&#8217;m not forcing users to wait for all the assets to download before they can see <i>some of the content</i>.</p>
<p>When do your document interactive and DCL events fire?</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/deciphering-the-critical-rendering-path/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Life on the edge with ESI</title>
		<link>http://calendar.perfplanet.com/2012/life-on-the-edge-with-esi/</link>
		<comments>http://calendar.perfplanet.com/2012/life-on-the-edge-with-esi/#comments</comments>
		<pubDate>Tue, 25 Dec 2012 23:41:52 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1546</guid>
		<description><![CDATA[Let&#8217;s start with some numbers. The speed of light is 299,792 kilometers per second. The refractive index of optic fiber is about 1.5. That means light travels slower inside optic fiber. How slow? That&#8217;s roughly 299,792 kilometers per second divided by 1.5. In general, we round it off to 200,000 kilometers per second. The distance [...]]]></description>
				<content:encoded><![CDATA[<p>Let&#8217;s start with some numbers. The speed of light is 299,792 kilometers per second. The refractive index of optic fiber is about 1.5. That means light travels slower inside optic fiber. How slow? That&#8217;s roughly 299,792 kilometers per second divided by 1.5. In general, we round it off to 200,000 kilometers per second. The distance from Seoul to Buenos Aires is about 19454 kilometers. So theoretically it will take around 100 ms to transfer a signal from Seoul to Buenos Aires. </p>
<p>In reality, there are many more things to consider for web applications when transferring web contents from Seoul to Buenos Aires. First of all, the line of optic fiber is definitely not straight. Thus the actual distance is much longer. Also there is 3 way handshake needed to establish a TCP connection between the two points. And then there can be multiple objects such as JavaScript, CSS and images to transfer in addition to HTML. Finally there are the actual time to process a request and provide a response in the web server.</p>
<p>According to some studies from Google, users consider anything more than 100ms to be a perceivable delay. i.e. if your web site does not respond within 100ms, it is considered to be slow. The interaction have to be smooth, just like flipping a magazine. However, as the numbers above have shown, it can be quite a challenge. </p>
<p>One of the more popular tricks to lower the latency for web applications is to use edge cache. The idea is to cache JavaScript, CSS and images in servers that are close to the user (i.e. the edge of the Internet). So the latency to access these resources will be much lower. </p>
<p>But what about the HTML page? Normally there are many information captured in a HTML page. Some (e.g. news article) are cacheable. Some (e.g. personal information) are not. That makes it hard to cache the entire HTML page. And that&#8217;s where Edge Side Include (ESI) can help.</p>
<h3>ESI</h3>
<p>So how can ESI help? We can take a look at the diagram below. Suppose we have a news article page with some user specific modules such as advertisement and recently read news. The edge cache will forward the user request for the article page to a special ESI service. The ESI service will then return a cacheable ESI document.The document will contain the content of the news article as well as one or more &#8220;ESI includes&#8221; for each of the user specific modules. The ESI includes are markup that instructs the edge cache to request these modules from origin servers. The edge cache will replace the ESI include markups with the actual contents of these requests before sending the final processed response back to user.</p>
<p><img src="http://calendar.perfplanet.com/wp-content/uploads/2012/12/esi.png" alt="esi" width="851" height="644" class="alignnone size-full wp-image-1547" /></p>
<p>For subsequent requests from other users, the ESI document with the news article is already cached in the edge cache and the ESI includes will be processed for each user to request the corresponding personalized modules. The overall latency can be lower because much of the HTML page is coming from an edge cache server close to the user and only parts for the personalized module are coming directly from origin servers where the latency in between could be higher.</p>
<p>One can also use client side Ajax JavaScripts to fetch modules to display on the page. However, JavaScript may not always be supported (e.g. in cell phone) or enabled (e.g. user can disable it in browser) in all situations. Also this requires extra coding while the ESI solution returns a normal HTML page transparent to browser. Finally as a rule of thumb, we would always like to limit the number of HTTP connections from the client but the Ajax solution is actually creating more. </p>
<h3>Performance Characteristics</h3>
<p>There are two performance characteristics that we need to consider before we talk about how to start using ESI.</p>
<ol>
<li><b>Concurrent Requests for ESI includes</b> &#8211; When there are more than one ESI includes in the ESI document, we would want all these requests to be executed concurrently.</li>
<li><b>First Byte Flush</b> &#8211; After the edge cache server executes the requests for the ESI includes, it should start flushing the ESI document to user till the first ESI include. Then when the request for that include is done with a response, the edge cache should flush the content of the include to user and the rest of the ESI document till the next ESI include is reached.</li>
</ol>
<h3>Where do I get started?</h3>
<p>CDN vendors such as F5 and Akamai provide support for ESI. So if you are using services from these vendors, you can take advantage of that. Akamai even extended the original specification. You can read more about it on its <a href="http://www.akamai.com/dl/technical_publications/akamai_esi_extensions.pdf">web site</a>.</p>
<p>If you aren&#8217;t using a CDN vendor, there are still other open source proxy software that support ESI, such as Varnish and Apache Traffic Server. Varnish supported a minimum subset of the ESI specification. However, error handling is missing. So when the ESI includes fail, you cannot use ESI markup to instruct Varnish to take alternative action, such as rendering an error message or fetch a different module. You can read more about the Varnish ESI support <a href="https://www.varnish-cache.org/docs/3.0/tutorial/esi.html">here</a>. Apache Traffic Server curreently has an experimental ESI plugin available. It supports most of the specification and you can read more about it <a href="https://github.com/apache/trafficserver/blob/master/plugins/experimental/esi/README">here</a>. </p>
<p>In terms of the performance characteristics described above, Varnish supports first byte flush but not concurrent requests for ESI includes. And it is the opposite for Apache Traffic Server ESI plugin. Work is under way to provide first byte flush support for the Apache Traffic Server ESI plugin and it should be available soon.</p>
<h3>Future &amp; Conclusion</h3>
<p>ESI can be a very powerful tool to lower latency for web sites and applications. However, it had been around for over 10 years without any change. And the markup is not powerful and expressive enough to support some modern use cases. e.g. If we want to render different modules for different device, we need to be able to use a custom function on the user agent request header to determine the device type and use different ESI includes for the modules. So an update on the specification is definitely needed and we should see some progress of it in 2013</p>
<p>For more information, you can also check out my <a href="http://velocity.oreilly.com.cn/2012/index.php?func=session&#038;id=2">presentation</a> and the corresponding <a href="http://velocity.oreilly.com.cn/2012/ppts/VelocityChina2012Kit.pdf">pdf</a> on this topic at <a href="http://velocity.oreilly.com.cn/2012/">Velocity China 2012</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/life-on-the-edge-with-esi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Proactive Web Performance Optimization</title>
		<link>http://calendar.perfplanet.com/2012/proactive-web-performance-optimization/</link>
		<comments>http://calendar.perfplanet.com/2012/proactive-web-performance-optimization/#comments</comments>
		<pubDate>Mon, 24 Dec 2012 19:06:55 +0000</pubDate>
		<dc:creator>editor</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1531</guid>
		<description><![CDATA[I recently spoke at a few conferences about how to avoid performance regression which I called Proactive Web Performance Optimization or PWPO. This is nothing much than the ordinary WPO we already know. The only difference is where/when in the development cycle we should apply that proactively. Performance is a vigilante task and as such [...]]]></description>
				<content:encoded><![CDATA[<p>I recently spoke at a <a href="http://2012.highload.co/">few</a> <a href="http://velocity.oreilly.com.cn/2012/index.php?func=autobio&amp;id=36">conferences</a> about how to avoid performance regression which I called <a href="http://www.slideshare.net/marcelduran/velocity-china-2012-pwpo">Proactive Web Performance Optimization</a> or PWPO. This is nothing much than the ordinary <a href="http://en.wikipedia.org/wiki/Web_performance_optimization">WPO</a> we already know. The only difference is where/when in the development cycle we should apply that proactively.</p>
<p>Performance is a vigilante task and as such one has to always keep an eye on the application performance monitoring. This is especially true after new releases when new features, bug fixes and other changes might unintentionally affect the application performance, eventually breaking end user experience.</p>
<p>Web Performance Optimization best practices should always be applied while developing an application. Whereas some tools might help identifying potential performance issues during the development cycle, it is a matter of where in the development cycle should WPO tools be run.</p>
<h3>Worst case scenario: no instrumentation</h3>
<p>In a development cycle without any instrumentation we have no idea how the application is performing for end users. Even worse we have no clue how good or bad the user experience is. In this scenario when a performance regression is introduced, the end user is the one having a bad experience and ultimately raising the red flag for performance. With luck some bad review will be published forcing us to reactively fix the issue and start over. This might last a few cycles until no one cares and then sadly the application is abandoned for good.</p>
<p><img alt="worst case scenario: no instrumentation development cycle" src="http://i.imgur.com/lgcte.jpg" /></p>
<ul>
<li>Build the application</li>
<li>Test to ensure nothing is broken</li>
<li>Deploy to production</li>
<li>Possible happy users and angry ones raising the red flag for performance</li>
<li>Angry users write a bad review</li>
<li>Reading the news, it&#8217;s time to improve performance and start over</li>
</ul>
<h3>Better case: RUM</h3>
<p><a href="http://en.wikipedia.org/wiki/Real_user_monitoring">Real User Measurement</a> (RUM) is an essential piece of instrumentation for every web application. RUM gives the real status of what is going on for the user end. It provides valuable data such as bandwidth, page load times, etc. which allows monitoring and estimating what the end user experience is like. In the case of a performance regression, RUM tells when exactly the regression happens. Nevertheless, the end users are the one suffering with a bad experience. Reactively the issue should be fixed for a next cycle release.</p>
<p><img alt="better case: RUM development cycle" src="http://i.imgur.com/CiMIq.jpg" /></p>
<ul>
<li>Build the application</li>
<li>Test to ensure nothing is broken</li>
<li>Deploy to production</li>
<li>Possible happy and angry users</li>
<li>Possible RUM raising the red flag for performance knowing end users get bad experience</li>
<li>It&#8217;s time to improve performance and start over</li>
</ul>
<h3>YSlow</h3>
<p><a href="http://yslow.org/">YSlow</a> was initially developed to run manually in order to perform static performance analysis of the page, reporting any issues found based on a set of performance rules. There was some attempts at automation like hosting a real browser with YSlow installed as an extension and scheduling URLs to be loaded and analyzed.</p>
<p>Since 2011 YSlow has also been available from the command line for NodeJS using HAR files to perform the static analysis. As of early 2012 YSlow is also available for <a href="http://phantomjs.org/">PhantomJS</a> (Headless WebKit browser) which allows for static analysis of a URL loaded by PhantomJS and analyzed by YSlow reporting the results all via command line. <a href="https://github.com/marcelduran/yslow/wiki/PhantomJS">YSlow for PhantomJS</a> also provides two new output test formats: <a href="http://en.wikipedia.org/wiki/Test_Anything_Protocol">TAP</a> and <a href="http://en.wikipedia.org/wiki/Junit">JUnit</a>. Both techniques test all the rules based on a configurable threshold, producing an indication of exactly which tests pass.</p>
<h3>Even better case: RUM + YSlow on CI</h3>
<p>With the advent of YSlow for PhantomJS it becomes easy to integrate YSlow into the development cycle plugging it into the continous integration (CI) pipeline. If there is a performance regression, it breaks the build avoiding a potential performance regression from being pushed to production. This saves the users from ultimately getting a bad experience, as CI is the one raising the red flag for performance regressions. RUM will show no regression was introduced intentionally, however other causes might affect performance and RUM will notify something went wrong.</p>
<p>There is a comprehensive section on <a href="https://github.com/marcelduran/yslow/wiki">YSlow Wiki</a> that explains <a href="https://github.com/marcelduran/yslow/wiki/PhantomJS#wiki-jenkins-integration">how to plug YSlow + PhantomJS into Jenkins</a>, but it&#8217;s also worth noting that the <code>--threshold</code> parameter is the ultimate way to configure the desired performance acceptance criteria for CI.</p>
<p><img alt="even better case: RUM + YSlow on CI development cycle" src="http://i.imgur.com/RBl9J.jpg" /></p>
<ul>
<li>Build the application</li>
<li>Test to ensure nothing is broken</li>
<li>Analyze build with YSlow to either pass or fail web performance acceptance criteria</li>
<li><em>If it fails, proactively go back and fix the performance issue</em></li>
<li>Once performance is fine, deploy to production</li>
<li>Hopefully happy users only</li>
<li>Keep monitoring RUM for performance issues</li>
<li>It&#8217;s always time to improve performance and start over</li>
</ul>
<h3>Best case: RUM + YSlow on CI + WPT</h3>
<p>For high performance applications in a well defined building cycle, YSlow scores might become stale always reporting A or B. This doesn&#8217;t tell much about smaller performance regressions and still might lead to some regression being pushed to the end user. It&#8217;s very important to keep monitoring RUM data in order to detect any unplanned variation. This is the most accurate info one can get about the user experience.</p>
<p>Once the YSlow score is satisfied, i.e., it doesn&#8217;t break the build, the next layer of proactive WPO is benchmarking the build in real browsers with a reasonable sample (the greater the better) to avoid variations. Use either the median or average of these runs. This should be compared to the current production baseline and within a certain threshold this should either pass or break the build to avoid fine performance regression.</p>
<p>To automate the benchmarking part, <a href="http://www.webpagetest.org/">WebPagetest</a> is a good fit and the <a href="http://marcelduran.com/webpagetest-api/">WebPagetest API Wrapper</a> can be used power NodeJS applications (more info on <a href="http://calendar.perfplanet.com/2012/xmas-gift-webpagetest-api-swiss-army-knife/">previous post</a>).</p>
<p><img alt="best case: RUM + YSlow on CI + WPT development cycle" src="http://i.imgur.com/3UsfC.jpg" /></p>
<ul>
<li>Build the application</li>
<li>Test to ensure nothing is broken</li>
<li>Analyze build with YSlow to either pass or fail web performance acceptance criteria</li>
<li><em>If it fails, proactively go back and fix the performance issue</em></li>
<li>Once YSlow is satisfied, it&#8217;s time to benchmark and compare against the current production baseline</li>
<li><em>If it fails, proactively go back and fix the performance issue</em></li>
<li>Once the two layers of performance prevention are satisfied, deploy to production</li>
<li>Very likely lots of happy users only</li>
<li>Keep monitoring RUM for other performance issues</li>
<li>There&#8217;s always a few <em>ms</em> to squeeze in order to improve performance and start over</li>
</ul>
<p>When comparing new builds (branches) to the production baseline, the ideal scenario is to have performance boxes as close as possible to production boxes (replica ideally) and performance WPT benchmarks in an isolated environment so test results are reproducible and don&#8217;t get too much deviation.</p>
<p>The last two cases: <strong>RUM + YSlow on CI</strong> <em>vs</em> <strong>RUM + YSlow on CI + WPT</strong> are quite similar to prevent performance regression whereas the latter gets more in-depth performance metrics to either pass or fail minimum performance acceptance criteria. <strong>RUM + YSlow on CI</strong> is analogous to going to the doctor for a routine checkup. The doctor asks a few questions and checks for heart beat amongst other superficial exams, eventually recommending lab exams. <strong>RUM + YSlow on CI + WPT</strong> on the other hand is analogous to going straight to the lab for a full body exam. It&#8217;s more invasive however more precise and would tell exactly what&#8217;s wrong.</p>
<h3>Takeaway</h3>
<p>Stop introducing performance regressions. Don&#8217;t let your end users be the ones raising the red flag for performance when you can proactively prevent regressions by simply plugging YSlow into the CI pipeline, and even better by benchmarking before releasing.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/proactive-web-performance-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Make your mobile pages render in under one second</title>
		<link>http://calendar.perfplanet.com/2012/make-your-mobile-pages-render-in-under-one-second/</link>
		<comments>http://calendar.perfplanet.com/2012/make-your-mobile-pages-render-in-under-one-second/#comments</comments>
		<pubDate>Sun, 23 Dec 2012 21:28:46 +0000</pubDate>
		<dc:creator>stoyan</dc:creator>
				<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://calendar.perfplanet.com/?p=1550</guid>
		<description><![CDATA[Over the past few years, we&#8217;ve made great strides in understanding and optimizing mobile web performance. However, for the most part, mobile web browsing continues to be slow. Google Analytics data shows that the average web page takes over 10 seconds to load on mobile. We know that a user&#8217;s thought process is typically interrupted [...]]]></description>
				<content:encoded><![CDATA[<p>Over the past few years, we&#8217;ve made great strides in understanding and optimizing mobile web performance. However, for the most part, mobile web browsing continues to be slow. Google Analytics data shows that the average web page takes over <a href="http://analytics.blogspot.com/2012/04/global-site-speed-overview-how-fast-are.html">10 seconds to load on mobile</a>. We know that a user&#8217;s thought process is typically interrupted after waiting for just one second, resulting in that <a href="http://www.useit.com/papers/responsetime.html">user starting to become disengaged</a>. So at a minimum, the &#8220;above the fold&#8221; content of a web page should render in less than one second. Clearly, we&#8217;ve still got a lot of work to do.</p>
<p>But where should we focus our attention when optimizing mobile web performance? We know that mobile networks have highly variable latency and bandwidth characteristics, and that in general <a href="http://calendar.perfplanet.com/2011/carrier-networks-down-the-rabbit-hole/">mobile network latency is substantially higher</a> than that on desktop connections. We also know that for modern networks, <a href="http://www.belshe.com/2010/05/24/more-bandwidth-doesnt-matter-much/">it is round trip time, not bandwidth</a>, that is the dominating factor in page load time. Given this, to make the mobile web fast, our attention should be focused on minimizing the number of blocking round trips incurred before a web page can render its content to the device&#8217;s screen.</p>
<h2>What blocks rendering a web page to the screen?</h2>
<p>First, let&#8217;s look at the sequence of events that happens between the time a user initiates a page navigation and the time the browser can render that page to the screen. Round trips may be incurred for DNS resolution, TCP connection, and the request being sent to the server and the response being streamed back. Unfortunately, there&#8217;s not much developers can do to avoid these round trips. For repeat visitors, a longer DNS TTL can help, but TCP connection and request/response overhead will be incurred on every new navigation to a page (assuming there is no warm TCP connection ready to be reused by the client).</p>
<p>Once these initial round trips are incurred, the mobile device can begin parsing the HTML response. But the browser can&#8217;t paint content to the screen just yet. Before content in the HTML can be painted to the screen, the browser must construct the render tree to determine where the elements in the DOM will appear on screen. And before the render tree can be constructed, the DOM tree must be constructed. The DOM tree is constructed through a combination of parsing HTML and possibly JavaScript execution.</p>
<p>So what are the things that block parsing of HTML, DOM tree construction, and render tree construction? Most of the time, parsing, DOM tree, and render tree construction are very fast. However, there are a few antipatterns that can cause these processes to get blocked on the network.</p>
<h2>Sources of delay during rendering: external JavaScript and CSS</h2>
<p>The most significant source of delay during HTML parsing is external JavaScript. When a browser encounters a (non-async) external script during HTML parsing, it must halt parsing of subsequent HTML until that JavaScript is downloaded, parsed, and executed. This incurs additional round trips, which are especially expensive on mobile. If the script is loaded from a hostname other than the hostname the HTML was served from, additional round trips may be incurred for DNS resolution and TCP connection.</p>
<p>In addition, render tree construction gets blocked on stylesheets, so just as external JavaScript introduces delays during DOM tree construction, external stylesheets introduce delays during render tree construction.</p>
<p>In short, external JavaScript and CSS loaded early in the document (e.g. in the <code>&lt;head&gt;</code>) are performance killers, and they are especially expensive on mobile due to the higher round trip times associated with mobile networks.</p>
<h2>Making mobile pages fast</h2>
<p>To be fast, a mobile web page must include all of the content needed to render the above the fold region in the initial HTML payload without blocking on external JavaScript or CSS resources. Ideally, all the content needed to render the above the fold region should be in the first 15kB on the network (this is post-gzip-compression size; pre-gzip can be larger), since this is the size of the initial congestion window on modern Linux kernels. This does not mean simply inlining all of the JavaScript and CSS that used to be loaded externally. Instead, just the JavaScript and CSS needed to render the above the fold region should be inlined, and JavaScript or CSS needed to add additional functionality to the page should be loaded asynchronously. For instance, if we have a page like the following:</p>
<div class="hl-main">
<pre><span class="hl-brackets">&lt;</span><span class="hl-reserved">html</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;</span><span class="hl-reserved">head</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">stylesheet</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">my.css</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">script</span><span class="hl-code"> </span><span class="hl-var">src</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">my.js</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">script</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">head</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;</span><span class="hl-reserved">body</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">div</span><span class="hl-code"> </span><span class="hl-var">class</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">main</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
    Here is my content.
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">div</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">div</span><span class="hl-code"> </span><span class="hl-var">class</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">leftnav</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
    Perhaps there is a left nav bar here.
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">div</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  ...
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">body</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">html</span><span class="hl-brackets">&gt;</span></pre>
</div>
<p>We need to identify the parts of <code>my.js</code> and <code>my.css</code> needed to render the initial content, inline those parts, and delay or async load the remaining JavaScript and CSS needed for the page. This may end up looking something like:</p>
<div class="hl-main">
<pre><span class="hl-brackets">&lt;</span><span class="hl-reserved">html</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;</span><span class="hl-reserved">head</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">style</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  .main { ... }
  .leftnav { ... }
  /* ... any other styles needed for the initial render here ... */
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">style</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">script</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  // Any script needed for initial render here.
  // Ideally, there should be no JS needed for the initial render
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">script</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">head</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;</span><span class="hl-reserved">body</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">div</span><span class="hl-code"> </span><span class="hl-var">class</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">main</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
    Here is my content.
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">div</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">div</span><span class="hl-code"> </span><span class="hl-var">class</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">leftnav</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
    Perhaps there is a left nav bar here.
  </span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">div</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  ...
  </span><span class="hl-comment">&lt;!--</span><span class="hl-comment"> 
    NOTE: delay loading of script and stylesheet may best be done
     in an asynchronous callback such as `requestAnimationFrame` 
     rather than inline in HTML, since the callback will be invoked 
     after the browser has rendered the earlier HTML content to the screen.
   </span><span class="hl-comment">--&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">link</span><span class="hl-code"> </span><span class="hl-var">rel</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">stylesheet</span><span class="hl-quotes">&quot;</span><span class="hl-code"> </span><span class="hl-var">href</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">my_leftover.css</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-code">
  </span><span class="hl-brackets">&lt;</span><span class="hl-reserved">script</span><span class="hl-code"> </span><span class="hl-var">src</span><span class="hl-code">=</span><span class="hl-quotes">&quot;</span><span class="hl-string">my_leftover.js</span><span class="hl-quotes">&quot;</span><span class="hl-brackets">&gt;</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">script</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">body</span><span class="hl-brackets">&gt;</span><span class="hl-code">
</span><span class="hl-brackets">&lt;/</span><span class="hl-reserved">html</span><span class="hl-brackets">&gt;</span></pre>
</div>
<h2>The current state of the mobile web</h2>
<p>A brief survey of mobile web pages shows that nearly all pages include blocking external JavaScript and/or CSS before any above the fold content is displayed. Exceptions include Google Maps, Google Search, Yahoo! News, and sites like <a href="http://www.yummly.com/">http://www.yummly.com/</a> and Kayak. Unfortunately, some of these sites inline JavaScript and CSS that isn&#8217;t needed for the initial render, which unnecessarily delays the time it takes to render these pages.</p>
<p>Interestingly, browsing the mobile web with JavaScript disabled reveals that even though most pages load blocking external JavaScript in the head, few of these pages actually need that JavaScript to render their initial content. These pages would benefit from delay or async loading their JavaScript to get the JavaScript out of the critical path of the initial render of the page.</p>
<h2>What else can block the initial render of a web page?</h2>
<p>Blocking external JavaScript and CSS are the most common sources of delay on the web. However, there are other common sources of delay that mobile developers should be aware of. One source is HTTP redirects of the main HTML document. These redirects incur additional round trips, and if the redirect navigates to a different hostname (e.g. <code>www.example.com</code> to <code>m.example.com</code>), they add even greater delays due to DNS resolution and TCP connection times. A second source of delay is server backend time spent generating the HTML response. All of the time spent generating the initial HTML response will block rendering on the screen, so server backend time should be kept to a minimum.</p>
<h2>Rendering in under one second</h2>
<p>If we estimate 3G network round trip time at 250ms, we can compute the minimum estimated time between when a user initiates a web page navigation and when that page renders its above the fold content on the screen. Assuming no blocking external JavaScript or CSS, we incur three round trips for DNS, TCP, and request/response, for a total of 750ms, plus 100ms for backend time. This brings us to 850ms. As long as render-blocking JavaScript and CSS is inlined and the size of the initial HTML payload is kept to a minimum (e.g. under 15kB compressed), the time it takes to parse and render should be well under 100ms, bringing us in at 950ms, just under our one second target.</p>
<h2>Summary</h2>
<p>In summary, to make your mobile web page render in under one second, you should:</p>
<ul>
<li>keep server backend time to generate HTML to a minimum (under 100ms)</li>
<li>avoid HTTP redirects for the main HTML resource</li>
<li>avoid loading blocking external JavaScript and CSS before the initial render</li>
<li>inline just the JavaScript and CSS needed for the initial render</li>
<li>delay or async load any JavaScript and CSS not needed for the initial render</li>
<li>keep HTML payload needed to render initial content to under 15kB compressed</li>
</ul>
<p>If you are looking to improve the performance of your mobile web pages, give these optimizations a try.</p>
]]></content:encoded>
			<wfw:commentRss>http://calendar.perfplanet.com/2012/make-your-mobile-pages-render-in-under-one-second/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
