Magic Numbers or Psychological Thresholds?

14thDec 2019 by Alex Podelko

ABOUT THE AUTHOR

Alex Podelko (@apodelko) has specialized in performance since 1997, working as a performance engineer and architect for several companies. Currently he is Consulting Member of Technical Staff at Oracle, responsible for performance testing and optimization of Enterprise Performance Management and Business Intelligence (a.k.a. Hyperion) products.

Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His collection of performance-related links and documents (including his recent articles and presentations) can be found at alexanderpodelko.com. Alex currently serves as a director for the Computer Measurement Group (CMG), an organization of performance and capacity planning professionals.

I would like to return to the last year Performance Calendar post Magic numbers by Gilles Dubuc. When I read it, I had an urge to reply – but then realized that too many subtle topics are involved and the reply would be too long for a comment. So replying by a full post this year. I believe that Gilles has some good points – but throwing the baby out with the bathwater.

Are these thresholds not round magic number – exactly 100 ms? Absolutely. Are most sited research are rather shabby and misinterpreted? Sure. Is the whole area rather vague and inconclusive, so somewhat difficult to apply to practical work? Yes, I would completely agree with that.

I can easily add to the list. My favorite example is the 8.5 sec limit for user abandonment that was quoted everywhere some time ago (haven’t heard about it for a while – but those who are longer in the trade should remember it) referring to Peter Bickford’s research. But when I checked the source – Worth the Wait? (published in 1997) – it turned out that Peter Bickford investigated user reactions when, after 27 almost instantaneous responses, there was a two-minute wait loop for the 28th time for the same operation. It took only 8.5 seconds for a half of the subjects to either walk out or hit the reboot. Switching to a watch cursor during the wait delayed the subject’s departure for about 20 seconds. An animated watch cursor was good for more than a minute, and a progress bar kept users waiting until the end. An interesting research for sure – but it can hardly be directly used for setting response time requirements for Web applications.

I myself was charged in 2005 by my employer at that time to develop clear response time guidelines for development. Well, of course I wrote something – and remained interested in the subject ever since. And wrote more about it later – including my 2014 Performance Calendar post Different Angles of Web Performance and, in a more generic form, Performance Requirements: An Attempt at a Systematic View. Part I and Part 2. And I still struggle to make full sense of it. My take here is that we do have some science behind these limits and performance requirements – but we have so many layers on the top of it that we can’t just blindly use it as magic numbers. And, yes, some interesting research results were often misused as such.

I see two points that Gilles appears to miss in his post. First, the focus of research completely changed. First 30+ years researches tried to find performance requirements to optimize user efficiency. Assuming that they are corporate users doing some internal tasks – there were no external users to talk about. Last 20+ years nobody appears to care about corporate user efficiency anymore – everybody talking how much money would be lost due to bad performance. Great info for Business Case for Performance – but quite a different focus.

Second, psychology and physiology are not as exact as math. Gilles correctly points that people are different and their reactions do differ. It doesn’t mean that we don’t have some basic underlying psychological limits – but only that these limits are fuzzy rather than exact numbers.

Different Layers of Web Performance Requirements

Actually, if we look at a larger picture, I see at least four different layers in Web performance:

Natural human psychology thresholds (which I consider the subject of the discussion here) and their impact on Human-Computer Interaction (HCI).
Expectations of Web performance (which is changing with time, technology, etc.) and, closely related, user satisfaction (when performance meet users expectations). A closely related question is how we aggregate satisfaction of numerous diverse users having different expectations. No good way to directly measure it – the most interesting attempt in that direction was probably APDEX.
Perceived performance (and all that make it to appear better from the performance point of view – from progress bars / animations to providing some info to look at while still working in the background). A way to satisfy the above expectations indirectly when you can’t do it directly (not only that, of course, but user experience is another large topic, let’s keep it performance-only here).
Performance metrics we use to measure performance – Page Load Time (PLT), Time To Interactivity (TTI), Above The Fold (ATF), etc. Attempts to quantify the above – and the issue here is none of these metrics fits it perfectly (at least not generically). Any one of them, in a way, is just an approximation. And we have a related question of how we can aggregate these measures for numerous diverse users.

While the last three layers are, of course, very important and properly get a lot of attention – they still are quite separate from the first one, the subject of this discussion (well, probably it may be more correct to say they are based on these psychological thresholds – but it is rather a fundamental relationship, not an exact equation).

Getting back to the first layer, let’s consider the three thresholds that are usually highlighted there. There is many other aspects of it and abundant other research not mentioned here. Steven Seow’s book Designing and Engineering Time: The Psychology of Time Perception in Software, published in 2008, may be a good introduction – appears to be available online too.

Another great review is System Latency Guidelines Then and Now – is Zero Latency Really Considered Necessary? by Christiane Attig, Nadine Rauh, Thomas Franke, and Josef Krems.

Instantaneous Response

Gilles wrote:

Are 100 ms Fast Enough? Characterizing Latency Perception Thresholds in Mouse-Based Interaction by Forch, Franke, Rauh, Krems 2017 looked into one of the most popular magic numbers from the Miller/Nielsen playbook: 100ms as the threshold for what feels instantaneous. Here’s the key result of that study:
The latency perception thresholds’ range was 34-137 ms with a mean of 65 ms (Median = 54 ms) and a standard deviation of 30 ms.
This is quite different than the 100ms universal threshold we keep hearing about.

Well, I’d rather see it as proving the point than disproving it. Jakob Nielsen in Response Times: The Three Important Limits wrote:

0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.

See “about the limit” and “feeling”. Not checking if is exactly instantaneous (which would be faster, at least for some people), but “feeling instantaneous” from the practical point of view. Say, the time between typing a symbol and its appearance on the screen. Even if I may see a delay if I explicitly pay attention to it – but it doesn’t matter for me as soon as symbols appears on the screen fast enough to keep up with my typing.

According to Peter Bickford, for example, 0.2 second forms the mental boundary between events that seem to happen together and those that appear as echoes of each other. So the specific number here – say, 0.1 or 0.2 second – doesn’t mean the exact limit for every person in the world, but rather just illustrates the order of that limit.

That limit is rather for client-side operations – such as keystrokes and mouse moves – which we expect to appear on the screen practically instantaneously. And as soon as it gets over that limit (whatever it is for a specific person / context), working becomes very frustrating.

Free Interaction

This limit Nielsen describes as “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.”

Which is somewhat interesting as various other research put it rather higher. Miller, for example, also wrote in his 1968 paper:

For a routine request (as defined by the user performing a task) the acknowledgment should be within two seconds. A routine request is likely to be a demand for an information image in the store. For an impromptu, complex request, the delay may extend to five seconds.

Actually there was a lot more research – Bailey did a short overview of them at his Response Times post. Some concluded that for problem-solving tasks (which are more like Web interaction) there is no reliable effect on user productivity up to a 5-second delay.

If we follow three states of user satisfaction – as, for example, they were defined in APDEX as satisfied, tolerating, and frustrated (APDEX implementation is another story – but these three categories definitely make sense) – we probably may say that this limit separate satisfied users from tolerating users.

But this limit is not so straightforward. Peter Sevcik in his article How Fast is Fast Enough identified two key factors impacting this threshold: the number of elements viewed and the repetitiveness of the task. The amount of time the user is willing to wait appears to be a function of the perceived complexity of the request. The complexity of the user interface and the number of elements on the screen both impact the thresholds. Back in 1960s through 1980s the terminal interface was rather simple and a typical task was data entry, often one element at a time. So earlier researchers reported that one to two seconds was the threshold to keep maximal productivity. Modern complex user interfaces with many elements may have higher response times without adversely impacting user productivity. Users also interact with applications at a certain pace depending on how repetitive each task is. Some are highly repetitive; others require the user to think and make choices before proceeding to the next screen. The more repetitive the task is, the shorter the expected response times.

Some research suggested that response time expectations change with time. 2009 Forrester research suggests two second response time; in 2006 similar research suggested four seconds (both research efforts were sponsored by Akamai). While the trend probably exists, the approach of this research was often questioned because they just asked users. It is known that user perception of time may be misleading. Also, as mentioned earlier, response time expectations depends on the number of elements viewed, the repetitiveness of the task, user assumptions of what the system is doing, and UI showing the status. Stating a standard without specification of what page we are talking about may be overgeneralization.

Do we have basic psychological or physiological limits underneath? Alla Gringaus pointed me to a completely different set of research stating a 3-second threshold. Jean Paul Zogby in his book The Power of Time Perception: Control the Speed of Time to Make Every Second Count wrote: “Scientists have estimated that information is normally stored in short-term memory for two to five seconds.” and “The brain’s neural circuitry defines the mental switch period and determines its duration. This is similar to a computer screen’s refresh rate. The brain’s refresh rate seems to be three second long.”

Ernst PÃ¶ppel in his paper Lost in time: a historical frame, elementary processing units and the 3-second window mention many research in that area: “All these observations suggest that conscious activities are temporally segmented into intervals of a few seconds and that this segmentation is based on an automatic (pre-semantic) integration process providing a temporal platform for conscious activity. It should be stressed that the temporal platform does not have the characteristics of a physical constant but that an operating range of approximately 2 to 3 seconds is basic to mentation; obviously, one has to expect subjective variability for such a temporal integration window.”

(And three seconds here is not an exact magic number here either – it spreads two to five second in one quote above and two to three seconds in another.)

Focus on the Dialog

Nielsen describes the third threshold as “10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.”

We can probably say that this limit separates tolerating users from frustrated users – and definitely has some relationship to user abandonment. And, of course, it is not exactly 10 sec – most researchers place it between 5 and 10 sec.

Already mentioned Bickford’s research introduced 8.5 seconds for user abandonment if there is no other ways to mitigate the delay – although applying this observation to the general problem of user abandonment was clearly a misuse.

Another interesting research in HP Laboratories attempted to identify how long users would wait for pages to load. Users were presented with Web pages that had predetermined delays ranging from 2 to 73 seconds. While performing the task, users rated the latency (delay) for each page they accessed as high, average or poor. Latency was defined as the delay between a request for a Web page and the moment when the page was fully rendered. They reported the following ratings:

Good	Up to 5 seconds
Average	From 6 to 10 seconds
Poor	Over 10 seconds

This is the threshold that gives us response time requirements for most user-interactive applications. Response times above this threshold cause users to lose focus and lead to frustration. Exact numbers vary significantly depending on the interface used, but it looks like response times should not be more than 8-10 seconds in most cases. Still the threshold shouldn’t be applied blindly; there are many cases when significantly higher response times may be acceptable when appropriate user interface is implemented to alleviate the problem.

This limit appears to be also heavily dependent on user expectations (which do change) and context. Do we have basic psychological or physiological limits underneath? Probably memory and attention limits mentioned above are directly involved (this is exactly what “losing focus” means, isn’t it?), but it looks like it may be a good area for further research.

Other Research

Last 20+ years most research and discussions are focused on business cost of Web performance. Good lists could be found in Tammy Everts’ book Time Is Money: The Business Value of Web Performance (published in 2016) and WPO stats site. Shaun Anderson also compiled a nice list of quotes and references in How Fast Should A Website Load in 2019?

I reached out to Gilles asking about “original research of my own” he mentioned in his original post. He pointed to the excellent paper he wrote with co-authors: A large-scale study of Wikipedia users’ quality of experience. The research they did is very valuable input into the discussion about customer satisfaction (and the dataset they collected will probably provide quite a lot of valuable performance insights) – but it doesn’t actually discuss psychological limits per se. And, of course, we shouldn’t generalize findings too much as Wikipedia has a very specific context (although actually it would be much more general than a lot of the research mentioned above).

Of course, there is a lot of other research touching different aspects of time and speed – so mentioned above are just those that can’t be missed when we are talking about Web performance.

Summary

It appears that we have some psychological thresholds underneath of our perception of performance. But we definitely shouldn’t consider them as magic numbers – whichever we have (we may have several layers even on the physiology / psychology level), they are not exact numbers and probably vary significantly for different people and in different contexts. While it is important to understand them and it would be great to see more rigorous research in that area – they are probably too fundamental / low level to derive performance requirements directly from them. We have at least three more layers on the top of these limits that do disguise them.

Web Performance Calendar