An HTML-first Mental Model

15thDec 2022 by Noam Rosenthal

ABOUT THE AUTHOR

Noam Rosenthal (@noam) is a software engineer for Google Chrome's speed metrics team. Co-editor of several web-perf specs and a seasoned web developer.

… while building a fast movies app

Overview

The Movies App

The TasteJS movies app is a showcase for different frameworks. I was excited about it because it gave me an opportunity to test some hypothesis, and see if it holds in the context of an app that’s a bit less trivial than TodoMVC.

The hypothesis

The hypothesis is that by following a certain mental model, apps built without a framework can not only perform significantly better than framework-based apps, which might not be that surprising, but also that the development experience of such apps is compelling enough, and that their debugging experience (which is perhaps a more important “DX”) is unmatchable.

Please Don’t Shoot

I’m not saying that “frameworks are bad”. Also not that shaving off some milliseconds off your metrics is necessarily a worthy goal, or that performance is the most important thing in the world. It always depends on many factors. My goal hereis to look into alternatives and see if they hold water. Keep your mind open and choose the tradeoffs that serve you the most.

Note also that some framewotks share the value of an HTML-first approach, e.g. Astro. Their movies app version is an MPA and looks different, so it was hard to compare. I wanted to focus more on the mental model and less on a particular solution.

Show me the demo

The (unfinished but close enough) version of the movies app is at https://h0-movies-demo.vercel.app/. It’s roughly based on the UI of the nextJS Movies app.

The code for the app is here.

How fast is it

The Page Speed Index score for this version is quite good:

0.8s FCP (same as the nextjs version)
20ms TBT (vs 110ms)
1.1s TTI (vs 2.9s)
2.0s LCP (vs 2.9s)
3K JS bundle size (vs 120K)

Note that this is a comparison with a specific nextjs app built with React 17, not with “anything you can do if you optimize the hell out of your nextjs app”. Still, it’s the closest I got to a comparison.

How is it built

When we use frameworks, we don’t get only the framework’s runtime, but also a mental model: how should we think when we write apps using the framework. Here I’m trying to provide this mental model without an entire UI framework.

Stable DOM, CSS “reactivity”

The most important bit is to keep the DOM relatively static. It means that, apart from lists with an unknown number of items, the hierarchy of the DOM doesn’t change. You don’t add/remove things. Instead, you change attributes and let CSS show/hide things.

In the H0 version, the HTML is there in the codebase and is not really generated from anything (with a few exceptions).

For example, in React you would do something like this:

const List = array =>
  array.length ?
    <UL>array.map(item => <Item {item}>)</UL> :
    <EmptyListIndicator />

This means JS has to manage the “is the list empty” state, plus a million other local state thingies.

Here, I prefer something like this:

<ul>...</ul>
<div class="empty-list-indicator">...</div>
<style>
  ul:has(:first-child) + .empty-list-indicator {
    display: none;
  }
</style>

I use techniques like this for automatically-hiding side-drawers, image loading indicators, rating, and other UI state indications that are usually done as some sort of a framework conditional.

Why does it matter?

By authoring small interactions with zero (or minimal) JS, I can reduce the overall JS size significantly, which helps with LCP, and I let the browser optimize reactivity instead of doing it in JS, which helps with long tasks and responsiveness.

Simple, shallow DOM

In many apps today I see a very deep DOM and a very deep React component tree, to draw just a few lines of text and some image. Often these DOM elements and React nested components are part of a design system – e.g. based on TailWindCSS. Here, the DOM structure is shallow – when looking at the HTML code you can see the product, rather than something to do with how it’s designed/developed or the developing company’s org-chart.

Instead of deeply nesting components and propagating design-system attributes, I keep a shallow DOM, and use cascading and CSS grid to create the layouts I want.

A snippet from the DOM:

<article id="movie">
  <h1>Title</h1>
  <h2>Tagline</h2>
  <img class="artwork gradient"
    width="780"
    alt="movie artwork"
    height="1170">
  <h3>Genres</h3>
  <ul id="movieGenresList"></ul>
  <h3>Synopsys</h3>
  <p id="synopsys"></p>
  <section class="rating"></section>
  <h3 >Cast</h3>
  <section id="cast" >
    <ul id="castList"></ul>
  </section>

Another note about this HTML is that I don’t often use classes. Many times the tag name is enough for styling, and creates an even thinner and more readable HTML.

Why does it matter?

A deep DOM doesn’t come for free. It affects style calculations, virtual DOM calculations, and rendering/layout. Also since frameworks hold the HTML in JS, a more complex DOM means more JS, more libraries to manage state (even more JS), and then complicated solutions like resumability.

By keeping things tight and simple, we have less of a problem so we need less solutions.

Breathe when updating

To avoid long-tasks and blocking, I update different parts of the UI in their own tasks. This allows the browser to “breathe”, and handle events between updates. This makes the app feel more responsive, without needing complicated concepts. Simply breathe when you can.

// view.ts
for (const renderer of [
    renderNav,
    renderMovieList,
    renderPagination,
    // ... more renderers
]) {
  // Make changes
  renderer(root, model);

  // Let events in!
  await yieldScheduler(); // This uses requestIdleCallback/requestAnimationFrame
}

Note that this technique reduces blocking time, but it means the app perform redundant style calculations. Monitor what works best for your use case, and choose your tradeoffs.

MVC + Vanilla + Edge

It’s great to have all the static parts in HTML/CSS, but what about the dynamic parts that update the UI state based on interactions? Some JavaScript is surely needed, especially if you want the “SPA”-style experience, where the page doesn’t reload with each click.

The Mental Model

For this, instead of a framework, I use vanilla with a specific mental model / design, which is really a version of MVC:

The app code is divided into three operations: fetch model, render view, apply behaviors.
Model fetching looks like an HTTP fetch. It takes a Request and returns a Response, usually JSON-based.
View rendering takes the Response that comes from the model fetching and updates a Document.
Applying behaviors happens once when the JS loads.
Both the model & view functions are “isomorphic” – can run in the window, node, service-worker, edge… Anything that supports Requests, Responses and Documents. I use the LinkeDOM library for node/edge-based DOM APIs.
The behaviors code is window-only.

Model Fetching

The model fetcher build everything I need to display the current state of the UI as a JSON object. In the case of the movie app, it performs several fetches to the TMDB API, in parallel whenever possible, and eventually slices and dices the result into JSON.

The model fetcher also handle POST requests, e.g. to add a movie to the list, and handles authentication and redirects.

View Rendering

When the view-model is “cooked”, the view renderer can do the minimum work possible – update a document with the result. This looks like a bunch of querySelectorAll and setAttribute calls etc. It’s perhaps less “pretty” than React code, but it’s so damn easy to debug, and so small!

Having a small update function relies on two earlier principles – when the DOM is both stable and shallow, updating it doesn’t take that much code.

But…

List Reconciliation

The exception is list reconciliation. Frameworks do this very nicely, by mapping arrays into the DOM and doing everything “under the hood”. But here we’re into doing everything above the hood.

The technique here is to have an HTML template element, and use a little function I wrote to “reconcile” a model to the list – only add/remove/update the necessary items.

Example:

The HTML:

<ul id="castList"></ul>
<template id="castPerson">
<li>
  <a href="#">
    <img width="45" height="45">
  </a>
</li>
</template>

The update function:

reconcileChildren<Person>({
  model: arrayModel(movie.cast! || [], "id"),
  view: templateView({
    container: movieRoot.querySelector("#castList")!,
    template: root.querySelector("template#castPerson")!,
    updateItem: (listItem, person) => {
      const anchor = listItem.querySelector("a")!
      // ... update anchor ...
      const img = listItem.querySelector("img") as HTMLImageElement;
      // ... update img...
    }
  })
});

Forms

When using HTTP/REST as the mental model, forms fit like a hand in glove. By populating the form method, action and inputs, and performing an action in the model fetcher based on that, we can do a lot without client-side JS.

In the movies app, we have 3 forms:

Searching
Logging in
Removing/Adding a movie from a list

The first one is more obvious. With having log-in and list actions as a form, we can update things like the list ID or next-URL as inputs when updating the view, which means we don’t have to create bespoke click event handlers for every button (or at all).

The code for the movies app does not contain button-specific onclick! This means also that all those interactions work without JavaScript, or before the JavaScript is loaded.

Behaviors

Not all logic is related to updating a view from a model, and unfortunately not everything can be done with CSS tricks & forms. The movies app consists of several behaviors, most of them generic to single-page applications and one custom:

Click and submit events are captured and converted into model fetches (this is what makes this an “SPA”)
Back/forward navigations are captured with the popstate event.
The page scrolls to the top upon navigation (this is a custom app behavior). There is actually a pure HTML trick to do this (using something like a #top URL fragment) but I preferred to do this particular thing using JavaScript.

In frameworks this kind of work is traditionally done in components. This creates perhaps nicer encapsulation, but often with a lot of added code. By using behaviors and event propagation/capturing, we can apply complex behaviors without the need of a lot of code overhead.

Deployment

When I divide things in this way, I have flexibility in how to deploy things. In this case:

The model fetching method runs on Edge (e.g. Vercel).
The view rendering method runs once on Edge to generate the initial HTML (“Server-Side Rendering”). Subsequent updates run on the client.

Why does all of this matter?

By doing most of the view-model-aggregation work on edge, rendering with vanilla (first time on Edge), parsing and executing only the minimal code we need for subsequent updates and behaviors, and using forms, the resulting JS bundle is miniscule. 3K in the case of the movies app. That’s 2.5% of the nextjs app bundle. In a bigger app, this can make a big difference in performance. Even if you just take a few of these approaches, you can potentially reduce your bundle size somewhat. And when using this mental model, I found the debugging experience to be super straightforward, taking away some of the pain of having code that arguably doesn’t look as nice as JSX (you judge).

Wait, what about web components?

I simply didn’t need them in this project – regular HTML & CSS with a handful of capture-based behaviors were enough. I think that in today’s frontend world we got used to overly componentizing thing, creating more code than necessary to express simple behaviors and style. However, I see how in a bigger project with elaborate reusable components, Shadow-DOM and web components would make a great addition.

Pain Points & Wish List

Building this was a lot of fun, and I would love to test this approach on larger apps. I know for sure that other large apps out there are built in a similar fashion.

Of course, not everything was perfect. There were points where I said to myself “I wish I had this and that handy”.

Built-in optimization, e.g. nextjs/image

At the time of writing this post, the Movies app PSI performance score is 99. The extra point is for unoptimized images. The nextjs app optimizes images using nextjs/image, a NextJS solution for image optimization, which Vercel supports internally.

The Navigation API

I love the new Navigation API, it allows performing some of the SPA behaviors in this app without click/submit event capturing. I would love to see it in all browsers, as it’s difficult to effectively polyfill.

CSS Ergonomics

Authoring CSS by hand is almost great. There are several things from SCSS that I would have loved to have: nesting and custom env() variables.

Copy-pasting widths for media queries is not fun.

HTML/CSS/JS intellisense

Writing HTML/JS/CSS separately requires a lot of copy/pasting of classes and IDs… I wish there were tools for cross-language type-safety. There are zero-runtime tools like Vanilla Extract that help with this, but I wish I had something that was more of a dev-tool than a code generator.

Summary

Writing apps without a framework is not trivial. There are so many mental models to choose from. The outcome is that often people use a framework – which is a mental model that includes a JS runtime (whether pre-bundled or generated).

I offer an alternative mental model that focus on minimizing JavaScript. Hopefully this mental model can help you build faster/nice vanilla apps, or still give you value when you use a framework:

The mental model in short:

Start with HTML. Keep the DOM stable and shallow. Don’t overuse clases
Use CSS tricks for little UI-behavior things when possible
Aggregate your view-model on edge
Server-side render with vanilla
Behaviors instead of components
Breathe

Happy holidays!

Web Performance Calendar