Web Performance Calendar

The speed geek's favorite time of year
2014 Edition
ABOUT THE AUTHOR

For the last sixteen years Alex Podelko (@apodelko) has worked as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products. Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. He maintains a collection of performance-related links and documents. Alex currently serves as a director for the Computer Measurement Group (CMG), an organization of performance and capacity planning professionals.

When people are talking about web performance, they may talk about different aspects of the subject depending on their role and the task on hand. The real life is rather messy, so we use abstractions that let us get away from details not important for the moment. The same reality may look quite differently depending on how we look at it. Adjusting our view for our specific needs, we probably may highlight four major angles to look at web performance.

1. How Fast is Fast Enough?

That angle of performance focuses on the requirement side: what performance should be, usually without diving into implementation details. The traditional approach here usually discusses system usability and user perception of performance. Older research (which may be tracked back at least to Robert Miller’s paper published in 1968 – still very good reading) usually focused on how fast the system should be to optimize user productivity (and typical scenario included users working with an internal system all the time – for example, entering orders). Most researches agreed that there are several threshold levels of human attention fundamental for human-computer interaction.

I tried to summarize it in my 2011 Performance Calendar post How Response Times Impact Business? and later factor in recent developments in Response Times: Digesting the Latest Information.

While it doesn’t look like anybody attacks the idea of several fundamental threshold levels of human attention, specific numbers vary. Some reports suggest that response time expectations increase with time. Forrester research of 2009 suggests two second response time; in 2006 similar research suggested four seconds (both research efforts were sponsored by Akamai, a provider of web accelerating solutions). While the trend probably exists (at least for the Internet and mobile applications, where expectations changed a lot recently), the approach of these reports was often questioned because they just asked users. It is known that user perception of time may be misleading.

While we still have users working with internal systems (and probably many more now), it is not the focus anymore. All discussions are about sales now (or, to say more generically, conversions) – and the typical model is free users in the Internet jungles – free to select where to go and what to abandon.

So the major topic of interest now is performance impact on conversions. There are many specific numbers quoted on the Internet, like a page slowdown of just one second cost $1.6 billion in sales each year at Amazon or slowing search by just 0.4 second results in losing 8 million searches per day by Google, but sometimes it is difficult to find the original source of information. A good example of such research is Case study: The impact of HTML delay on mobile business metrics , which found that delaying the investigated site by just one second on a mobile device resulted in losing 3.5% percent of conversions. Many other references are mentioned in two excellent presentations by Tammy Everts The Real Cost of Slow Time vs Downtime and Mobile Web Stress: Understanding the Neurological Impact of Poor Performance.

While it is important to know and understand these data, we shouldn’t forget that response time expectations depends on the number of elements viewed, the repetitiveness of the task, user assumptions of what the system is doing (see, for example, How Fast Is Fast Enough by Peter Sevcik), and interface interactions with the user (see, for example, An Introduction to Perceived Performance by Matt West). Stating a standard without specification of what page we are talking about may be an oversimplification.

Even more careful we should be with statements how much change in performance will cost you in sales and conversions. The published numbers are important points giving us an idea of what the relation can be, but we shouldn’t assume that it would be exactly the same in our specific case.

Discussing this “how fast is fast enough” angle we see that it usually doesn’t jump into details and often results in mantra-like statements – details are either absent at all or you need to apply a lot of efforts to find what is really behind the numbers. This angle concentrates on the requirements and cost of deviations from them – leaving most other details (such as specific metrics, ways of aggregation, and levels of load) out of discussion.

2. Web Performance Optimization

Another angle is Web Performance Optimization (WPO) which, in a way, established itself as a separate engineering discipline. WPO looks into minute performance details of every element of a specific page. Basically we analyze single-user performance, focusing on the front end, for specific pages and client configurations. We are looking into where exactly time is spent – while usually abstract from other details such as variability of response times or the level of load on the system. WPO is the central topic of the Performance Calendar, so I’d rather leave further description of this angle to experts in the field.

For the purpose of this discussion, I want only to highlight that here we have discussions on how exactly we should measure performance as far as we have many relevant metrics for a web page (not to mention that user action may not result in loading a web page – and probably we will have more such things in the future). See, for example, A non-geeky guide to understanding performance measurement terms by Joshua Bixby or Moving beyond window.onload() by Steve Souders for a discussion about available options. The topic of web performance metrics getting a lot of attention recently with several new approaches suggested – such as the Speed Index (see, for example, Measuring web performance by Patrick Meenan.

3. Presenting Data

The third angle is data aggregation and presentation: how would we monitor, analyze, and report? Response times, even if we agree on the way how to measure them, are not a single number. And even not a few numbers for typical configurations. It is a huge array of individual response times – with at least one number for every single action of every single user (or several if we measure different metrics). Full raw data are just not comprehensible by a human mind – you need to find a way to aggregate, present, and visualize this information to make it useful.

The problem is that whatever way you aggregate information, you lose some. Different ways of aggregation – averages, percentiles, min and max values, etc. – have their pros and cons, but none is ideal. Rigorous Performance Testing on the Web by Grant Ellis has a nice discussion about the topic starting slide 26, up to using histograms and CDFs (Cumulative Distribution Functions). We need different ways of aggregation for different purposes. For example, to track down issues we need a way to slice and dice information to narrow down the problematic area. In this case you need access to rather granular data – because if problematic results would be averaged with other data, they would be practically useless for further analysis.

A completely different task is high-level reporting of system’s health. You want to see the whole picture and overall trend at once. No ideal solution is suggested here either. One of probably most interesting approaches is Apdex (Application Performance Index). While many are skeptical about it and it looks like not much was happening with Apdex for many years – it still attracts a lot of interest. For example, Apdex is used by New Relic and, with some modification, by Dynatrace as User Experience Index.

At the beginning I was rather confused by the way how satisfied / tolerating / frustrated users are defined in Apdex. Now I rather understand Apdex not as a method to define specific satisfaction levels (while I consider Peter Sevcik’s presentation on how to find T, the threshold between satisfied and tolerating users, as one of best documents describing how to do it), but as a method of data aggregation based on given criteria (requirements).

So we have different ways to aggregate and report information – and we need all of them: from a high-level health indicator to deep-level slicing and dicing of information to get to specific issues. No ideal solutions are found yet, but it looks like this angle got a lot of interest recently as a new generation of motoring products get to maturity.

4. Load and Scalability

The fourth angle is load and scalability. It is most used in realms of back-end design and development, load testing, and back-end monitoring. For a high-level summary, see, for example, Andy Hawkes’ post When 80/20 Becomes 20/80 and my post Performance vs. Scalability.

Historically back-end performance was in the center of performance engineering. The main issue is that response times grow non-linearly as you approaching the limits of resources, which is not intuitive. So, if you take a classic book about performance, you would probably find a lot of the queuing theory there (by the way, for those who wants to understand these issues, but don’t want to study the queuing theory too deep, I’d recommend Every Computer Performance Book: How to Avoid and Solve Performance Problems on The Computers You Work With by Bob Wescott – very good explanations in plain English, some chapters are available online).

There are systems nowadays, with parallelized architectures and auto-scaling, when load may not noticeable impact response times in normal modes of operations (if forget about third parties components and services). In such cases this angle may be less important. But, unfortunately, such systems are much rarer in the real life than it may seem from Internet discussions – and, when you see such system, it means that somebody did a very good job designing and optimizing back end.

Why Do We Care?

These different angles are useful abstractions to concentrate on what is important for the moment to overcome excessive details of the real life. In a way, they are four different dimensions and they may be considered orthogonal for some particular tasks. But, in general, they are not – and in reality are heavily interconnected on some levels. So it is important to remember that the subject has other facets you may need to factor in.

Ideally performance should be addressed at all phases of system lifecycle: from the very beginning (performance requirements, how fast the system should be and how much load to handle) to design and development (using scalable design and using performance good practices, both back-end and front-end) to testing (for both single-user performance and load) to support and maintenance (closely monitoring performance in production and providing input for both development and testing for further improvement). We look at performance from different angles depending on lifecycle phases and task on hand – but we need all of them for a holistic view.