The Golden Rule of Web Performance and Different Performance Engineering Specializations

20thDec 2023 by Alex Podelko

ABOUT THE AUTHOR

Photo of Alex Podelko

Alex Podelko (@apodelko) is a senior performance engineer at Amazon Web Services (AWS), responsible for performance testing and optimization of Amazon Aurora. He has specialized in performance since 1997, working in different performance-related roles for MongoDB, Oracle/Hyperion, Aetna, and Intel before joining AWS. Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His recent talks and articles could be found at alexanderpodelko.com. He currently serves as a board director for the Computer Measurement Group (CMG) and a member of the SPEC Research Group Steering Committee.

Performance engineering, being rather a narrow field by itself, has many well-established specializations. While main generic principles are the same, it is surprising that the overlap in specific skills is rather small and working in one specialization it is easy to miss what is going on in another specialization – making them almost isolated silos. While I am addressing rather trivial topics here, they are somewhat between these specializations and may be worth discussing to make sure that we all are on the same page.

The Golden Rule of Performance Engineering

Let’s start from Steve Souders’ Golden Rule: “80-90% of the end-user response time is spent on the frontend. Start there.” Tim Kadlec makes some interesting points in his update on the subject. Actually, this statement bothered me for a long time because it looks like wrong conclusions could be made from it without understanding all its aspects.

First of all, this statement specifies what you need to do to improve response time. It may appear trivial – basically, it is application of Amdahl’s law to the Web. I believe that the genius of Steve Souders here was not only in pointing out where we should spend efforts improving response time – but in pointing out that response time does matter by itself and we should improve it. Not that it was a completely novel idea either – but he started a real movement with his books and the Velocity conference creating a separate discipline. Because before that, practically, nobody cared. Well, of course, there were some discussions that response time shouldn’t be too long – but the time spent in the client’s browser was mostly the client’s problem. It used the client’s resources. What concerned performance professionals was the server – and its utilization. Most efforts were focused on the server side – it was the resources that you needed to provide and pay for. Performance optimization efforts were usually evaluated in how much resources were saved – and frontend didn’t impact it (except some exotic – at least then – cases; more blurred now as Tim Kadlec elaborated in the above mentioned post). Of course, business impact of performance was discussed much earlier (see, for example, Business Case for SPE) – but it wasn’t the mainstream understanding then (and, actually, even now – although it was good progress since).

The main change was understanding that improving response time has significant business value completely independent from backend costs – and bad response times may have devastating impact on business. Later it was elaborated, for example, in Tammy Everts’ book Time is Money: The Business Value of Web Performance (or her older presentation on the topic). The WPO Stats web site has many good examples. A little more background info can be found in Business Case for Performance.

Wrong Conclusions Only

The worst conclusion that may be made from the Golden Rule is that backend time doesn’t matter as it doesn’t add much to end-to-end response time. Even if we assume that backend is properly optimized and scale seamlessly (a very big if as it is often not the case), every improvement of the backend time may get huge savings in infrastructure costs. Assuming that all backend time is server processing (not waiting in a queue), it is quite possible that decreasing backend time from 5% to 4% may save 20% of infrastructure costs – while, of course, will save only 1% from response time. Optimizing the frontend won’t help with saving backend cloud costs – which is still the major concern of such disciplines as Cloud Economics and FinOps.

Unfortunately, rather few systems scale seamlessly and many have different resource limitations. When you hit these resource limitations, response times skyrockets until the system becomes unusable or crashes (resilience engineering tries to address that by making sure that systems handle it in a nicer way). In this case it is not important at all if backend time was 5% or 15% – overall response times will skyrocket as its backend part skyrocket (while frontend part of it will remain the same).

Frontend vs Backend Performance

While in some cases the difference between frontend and backend response times may get a little blurred, performance engineering for frontend and backend remain completely different specializations. And the main difference is single-user vs multi-user.

Single-user is not something specific to Web frontend – it may be for anything from desktop and mobile application to any backend component when you look at performance of a single user. It doesn’t mean that it is trivial – you still may have all kinds of multi-threading effects as execution environments and technologies become extremely sophisticated. Quite a lot of performance engineering principles may be re-used – although each technology has so many of its own idiosyncrasies that it would take some time to move between technologies even for experienced performance professionals.

Single-user performance engineering is a must in every technology – while it is slow for one user it won’t be any better for multiple users. But multi-user aspects add new dimensions – you need to think about throughput, capacity, scalability, resource utilization, contention, and many other things. Actually, most books concentrate on multi-user performance – and many are very heavy on math and queueing theory which may be not trivial to apply. Two good books to start with to understand multi-user performance implications are Every Computer Performance Book: How to Avoid and Solve Performance Problems on The Computers You Work With by Bob Wescott or Fundamentals of Performance Engineering; You can’t spell firefighter without IT Perfect by Keith Smith and Bob Wescott.

Backend Performance

I guess that term may be used only in the Web Performance crowd (as something outside their interest – although Web Servers, I guess, are somewhat in the middle and fall in both categories). People who work in backend performance (which usually focuses on application, application servers, database, containers, and similar stuff) usually referred to it just as performance – as performance engineering started as a discipline when there were dumb terminals. I was able to trace the history of performance engineering at least back to 1966 when System Management Facilities (SMF) were introduced (which is, basically, instrumentation and tracing). By the way, Response time in man-computer conversational transactions by Robert Miller was published in 1968 – starting a long conversation on what is good response time. The first performance professionals were performance analysts and capacity planners charged with efficient usage of mainframe resources.

Performance / load testing appeared in the end of 80s as a response to the spread of distributed systems that didn’t have much insight into system performance beyond system monitoring. Only way to ensure performance of the application was to test it under load – so tools were created to generate synthetic load. Designing and implementing this load was a separate craft – which needed to be supplemented by traditional performance engineering skills to properly analyze results and provide recommendations. It had more overlap with performance analysis and capacity planning (first of all in workload characterization) than with functional testing.

Cloud computing brought back centralization, specific price tag for resources, and flexibility of deployments – which increased demand for, practically, the same performance analysts and capacity planners, but under new names. They are usually referred to as performance engineers, efficiency engineers, cloud economists, and FinOps professionals. Need in traditional performance testers somewhat decreased, but Continuous Performance Testing became a new trend.

All these groups are separate specializations with limited overlap – usually only in most generic performance principles. The specific sets of skills are quite different – each group with their own interests, events, and organizations.

Holistic View vs Specializations

I always advocated a holistic view of performance and context-driven performance engineering – as performance depends on all underlying parts and their collaboration. If you address each performance silo separately, you may have gaps where performance issues happen. However, modern technologies become so sophisticated (and quickly changing) that you can’t have deep expertise across all technologies. So, we probably should talk about a combination of generic performance professionals who may see a holistic view of performance and develop a performance engineering strategy, and professionals specializing in specific areas (from database to Web Performance). Of course, we are talking rather about large systems where performance is critically important – small startups probably can’t afford dedicated performance specialists and responsibilities get spread across other members of the team.

One Response to “The Golden Rule of Web Performance and Different Performance Engineering Specializations”

Sascha December 21st, 2023
80-90% of the end-user response time is spent on the Cloudflare.

😀

Web Performance Calendar