Web Performance Calendar

The speed geek's favorite time of year
2021 Edition
ABOUT THE AUTHOR

Yoav Weiss (@yoavweiss) has been working on mobile web performance for longer than he cares to admit, on the server side as well as in browsers. He now works as part of Google Chrome developer relations team, helping to fix web performance once and for all.

He takes image bloat on the web as a personal insult, which is why he joined the Responsive Images Community Group and implemented the various responsive images features in Blink and WebKit. That was his gateway drug into the wonderfully complex world of browsers and standards.

When he's not writing code, he's probably slapping his bass, mowing the lawn in the French countryside, or playing board games with his family.

Largest Contentful Paint is a loading performance metric that’s part of the Core Web Vitals program. It measures the time the largest contentful element (a block of text or an image) was displayed to the user.

When thinking about expanding LCP’s definition and adapting it to animated images, video resources or progressive images, we intuitively want to expose values that most-closely represent the user’s experience. This post outlines the emotional roller-coaster that is my thinking around this problem, and ends up with a not-so-satisfying solution. Apologies in advance.

Anyway…

Exposing values that represent the user experience means slightly different things in those various LCP cases: for animated images, we’d want to expose the time in which the first image frame was displayed to the user, as from the user’s perspective, the image’s content is “there” and starts playing. The current timestamps exposed (the time in which the last frame was displayed) is less relevant, and can vary widely depending on the image itself, the animation’s length and other factors.

For videos (which we currently don’t report at all), we’d want to report a very similar value – the time the first frame was displayed.

For progressive images, things are a bit fuzzier, but if we were to take them into account, we’d want to expose times where the image was in “good enough” quality, such that most users wouldn’t be able to notice that the image is not complete. I’ve played around with such heuristics, but we’re still waiting on data to prove that this is a desirable outcome, and the extra bytes are not simply something that we should encourage developers to drop entirely.

One point that those 3 cases have in common is that all of them expose information about various points inside the resource’s byte stream. That’s not an issue for same-origin resources, but when we start talking about cross-origin resources, it’s less obvious that this is a thing we can do.

Cross-origin leaks!!

On the web, cross-origin leaks are a type of security or privacy breach that is enabled when one origin can draw conclusions about the user’s past activity on other origins by downloading and inspecting resources from those origins. That’s something we’re generally trying to block on the platform, and the same-origin policy is the major means with which we do that.

So, for example, when Resource Timing exposes information about a cross-origin resource, it doesn’t expose any information that’s not already available to the embedder page unless the resource specifically opts-in to expose its timing info (through the Timing-Allow-Origin header).

That way we ensure that unsuspecting origins that e.g. change their resource’s bytesize based on the user’s login status or some other user state, will not leak information about the user to e.g. evil.com that downloads their image.

There are still some potential timing side-channels which are harder to block, but as a general principle, we avoid directly exposing that info.

That’s fascinating, dad! *yawn*

How is that related to LCP and animated images?

I’m glad you asked! Currently for LCP and cross-origin images we only expose the resource’s onload time, which is a good approximation of its render time, and at the same time, doesn’t expose any extra info that the embedder site shouldn’t have. But what would happen if we start exposing the first frame render time, or the time where our heuristics estimated we have “good enough” image quality?

That could constitute a significant cross-origin leak if applied to credentialed cross-origin images/videos, as origins can respond with different resources (that have different first-frame points) which potentially expose the user’s state. So we can’t expose those times for cross-origin resources without an adequate opt-in.

At the same time, the overall resource loading time is not relevant info here and cannot approximate the value that we want to expose to the user. Beyond that, in the case of videos, it’s not even information that’s currently exposed to the web (I know, right? I was similarly surprised!)

So just don’t expose the time!

So, that means we’d expose LCP entries with no relevant timestamps, and at least in some cases, all 0 timestamps? That’s a bit weird. But it’s about to get weirder..

Currently, popular JS code to collect LCP entries (e.g. webvitals.js) assumes that later emitted entries represent larger LCP candidates. And even if that assumption is not made explicitly, with PerformanceObservers, entries are sorted based on their startTime, which is… the irrelevant loadTime in the cross-origin with no opt-in case, at least today.

That means that if we naively report the loadTime as the startTime, a lot of existing collection code would collect the wrong LCP entries :/

For example, we may have 2 images on the page: an animated image and a later loaded and larger non-animated image. And let’s say the animated images finished loading its first frame at 1 second and its overall content at 3 seconds, and the non-animated image finished loading its content at 2 seconds.

We can’t report the animated image’s first frame time, due to cross-origin restrictions. So we’d just report the load time instead, right?

But that means that the larger non-animated image was reported as an entry before the animated image, and lots of code out there today would assume the animated image is the LCP candidate, even though the non-animated image is larger 🙁

OK, so what if we emit the entries at the time we would mark as their start time in the same origin case, but without explicitly marking that start time? That would work, right?

Unfortunately, attackers can observe the time in which entries were emitted and draw conclusions on user state from that time alone, even if we don’t expose the timestamp explicitly.

What can we do then??

Should we give up then? Declare computers as irreparably broken, quit tech, throw out all our devices into the nearest lake and move to a sunny island somewhere?

Tempting as it may sound, there may be a path forward that doesn’t require resorting to such drastic measures.

What we could do is something such as the following:

  • When an cross-origin LCP entry is created, prepare it to be queued with a 0 startTime timestamp, but don’t queue it just yet
  • When the resource is done, queue the entry. Alternatively, when a larger LCP entry is about to be queued, queue the previous one before it, in order to maintain order

That would mean that LCP entries still fire in order and the latest one remains the largest. It can cause delays in times in which the entries are actually queued, but that seems like a lesser evil.

We would probably need to somehow mark these entries as ones where the startTime is not meaningful, potentially using a boolean attribute on the entry.

That sucks!!

I know. Exposing 0 startTime would be confusing. I don’t have a better option though. If y’all have one, hit me up (on this issue or on Twitter)

* Thanks to Nicolás Peña Moreno for poking holes in my thinking around this issue.