Informal Thoughts about the Winding Road of Performance Engineering

12thDec 2024 by Alex Podelko

ABOUT THE AUTHOR

Photo of Alex Podelko

Alex Podelko (@apodelko) is a senior performance engineer at Amazon Web Services (AWS), responsible for performance testing and optimization of Amazon Aurora. He has specialized in performance since 1997, working in different performance-related roles for MongoDB, Oracle/Hyperion, Aetna, and Intel before joining AWS. Alex periodically talks and writes about performance-related topics, advocating tearing down silo walls between different groups of performance professionals. His recent talks and articles could be found at alexanderpodelko.com. He currently serves as a member of the SPEC Research Group Steering Committee.

There are very interesting trends in performance engineering nowadays, so I’d like to share a few personal observations and thoughts (not representing in any way my current and former employers – and not pretending that it has any real research behind it). I spoke about Current Trends in Performance Engineering before, but here I want to contemplate on some other developments.

History

When I refer to performance engineering here, I mean the discipline that covers all aspects related to performance and efficiency of computer systems. However, except the very beginning, it never existed as a holistic discipline. It was very fragmented at least since alternatives to mainframes emerged.

We may track the history of performance engineering at least to 1966. Originally it was centered around performance analysts and capacity planners for mainframes and grew into rather an advanced discipline. It took a different approach with advance of distributed systems – when it was mostly centered around load testing. Later we got Web Performance that developed into its own discipline. Then we got two tectonic shifts in the industry – namely agile and cloud – which changed things a lot. While all previously existing branches of performance engineering are still around (adjusting to new realities), we got several new developments that took rather different approaches – and it didn’t settle down yet.

Mostly it happened because things indeed changed – but in some cases, I guess, it happened because performance professionals, in a way, dropped the ball. One was reliability testing (which led to the emergence of Chaos Engineering – later becoming Resilience Engineering). Another was financial aspects of performance (which led to the emergence of FinOps). Partially because these areas were further away from the core expertise of performance engineers, but partially because they usually didn’t have access to production systems (in case of Chaos Engineering) or to financial data (as in case of FinOps). One more thing was a failure to keep a performance engineering community that represents the profession (CMG played that role in the beginning, but failed to keep up with the changes in the community and now somewhat deviated from performance).

Holistic Approach

I was advocating for a more holistic approach to performance engineering and breaking silos between areas of specialization. Here is a couple of my older post on the topic:
Shift Left, Shift Right – Is It Time for a Holistic Approach? and Context-Driven Performance Engineering. Of course, there are reasons why a holistic approach is a real challenge and why we are not there yet, but we do see some interesting developments.

From one side, performance got much more attention in the industry and some things, which were a rather specialized knowledge before, became common wisdom. And we still see attempts to unify performance engineering – for example, DevPerfOps.

From another side, it appears that we got several groups somewhat overlapping with performance engineering – such as SRE and FinOps – which took some attention away.

Performance Advances in the Industry

We definitely see increasing attention to performance (and all related areas) in the industry. You hardly can get a developer job nowadays without talking about time and space complexity of algorithms.

It is interesting that even System Design interviews started to be centered around performance (as in this example). While it was always my view of system design, it is interesting to see that it becomes common wisdom now.

All major cloud vendors define performance as one of the pillars for good architecture:

AWS Well-Architected Framework is a unique collection of materials that is a must reading for everybody who develop for the Cloud (at least for AWS – but it would be beneficial in general, although probably not up to the same degree).

It is somewhat interesting that Azure Well-Architected Framework uses exactly the same terminology as Amazon Framework and Google Cloud Architecture Framework made only few minor changes – while the terminology is not completely obvious. For example, cost optimization and performance efficiency often overlap – probably that was the reason why Oracle put them together. Disclaimer: I am currently working for AWS and did work for Oracle before, but wasn’t involved in creation of either framework in any way.

SRE

Site (or System) Reliability Engineering (SRE) was a big development and highlighting performance and cost-optimization work as a part of overall SRE responsibilities definitely helped promote performance knowledge and expertise in the industry. But it is just a part of the long list of SRE responsibilities – and probably not the largest part in most cases. It appears that in some places SREs were considered as a replacement of specialized performance professionals (not sure if it was the original idea or its misinterpretation). I don’t see SREs as a replacement of performance engineering professionals, I’d rather see these two groups as complementing each other.

Overall, SRE proliferation definitely improved performance culture overall, but I am not sure how it impacted performance engineering as a separate discipline as it distracted a lot of attention from it.

FinOps

Another major development was FinOps and the Foundation behind it. I was actually surprised to see that FinOps got a lot of traction optimizing cloud costs rather than the financial side, leaving performance and capacity teams behind. While I guess I see why – I believe that it is a major shortcoming of FinOps for now. What can be done from the organizational and financial side is limited – to go further, you need to integrate performance engineering.

But it looks like the Foundation is making drastic steps to extend beyond Could and Finance at the moment. Two most interesting publications in that direction are Cost-Aware Product Decisions and The Scope of FinOps Extends Beyond Public Cloud.

Cost-Aware Product Decisions is very correlated with existing performance engineering ideas (just replace the word cost with performance). “Performance by Design”, etc. – see, for example, Daniel Menasce’s book (as well as many other sources).

J.R. Storment, Executive Director at FinOps Foundation, stated in his post:

People talk about “Shift Left FinOps” but usually the “Left” starts at the development or architecture process. That’s better than thinking about cost after you deploy, but it’s worse than the ideal: applying FinOps when you are still designing the product you are considering building.

It is a very interesting statement, indicating that the idea of “shift left” is getting traction in FinOps. Which basically means incorporation of performance engineering and getting to a holistic view of costs and performance (as these two concepts have a lot of overlap – although, of course, are still different things). It would be interesting to see if FinOps indeed would be able to shift left – as it requires quite different expertise from what is included into FinOps so far.

The Business Value of Performance

The business value of performance is one very interesting aspect that was usually ignored by both traditional performance engineering and FinOps. They usually concentrate on the cost and how to minimize it – basically considering the business value as given. That approach actually quite limits the value of performance engineering and missing that we have a much more complicated equation here – as nowadays performance (here more in the form of speed) by itself directly impacts business outcome. Web Performance championed that idea and collected quite a lot of interesting data to confirm that. The most prominent are probably the WPO Stats site: Case studies and experiments demonstrating the impact of web performance optimization (WPO) on user experience and business metrics and Tammy Everts’ Time is Money: The Business Value of Web Performance. It is not an exact science as it is business specific and the relationship is not straightforward – but if you ignore the business value of performance, you severely limit your understanding of why we need performance engineering. Check The Business Case for Performance for more information on the subject.

DevPerfOps

Recently the DevPerfOps foundation was created with its own manifesto. See also Scott Moore’s post for details. Apparently DevPerfOps is taking a more holistic approach not limiting to technical issues only. For example, DevPerfOps manifesto explicitly states “Performance is not just a technical issue, but a business driver. We believe that performance directly impacts user satisfaction, brand reputation, and ultimately, business success.”

DevPerfOps appears to be a very promising development – but it remains to see if it will become an organization to consolidate performance engineering.

As a side note, riding on the top of popular terms in the industry is somewhat dangerous as new terminology arrives all the time – as now there is a lot of discussion that Platform Engineering is a replacement for DevOps. Although apparently the main idea of Platform Engineering is that it should be a holistic approach to what and how we are running, the idea that was behind the emergence of performance engineering in the first place 50+ years ago.

Community

After CMG somewhat deviated from performance, I believe that the International Conference on Performance Engineering (ICPE) is the most interesting performance-centered event for the moment – with the next conference held in Toronto, Canada on May 5-9. It has multiple interesting workshops and the industry presentation track (call for proposals is still open).

Of course, there are other more specialized communities. FinOps have its own community and events – but they currently appear to be rather centered around “Fin”. The Web Performance community has its own events. Each vendor has some performance-related content in their event (and we see more of it). But I am not aware about other generic performance engineering events at the moment that cover it as a holistic discipline.

Summary

We definitely see drastic changes in performance engineering – mostly triggered by agile development and cloud. Performance culture improved overall, but the future of performance engineering as a separate discipline is not defined yet. The transition is still happening and we see multiple trends – but is not clear which trends are here to stay and in what form.

Web Performance Calendar

Informal Thoughts about the Winding Road of Performance Engineering

Leave a Reply

Search

Planet Performance

Archives