Beyond JSON Performance

28thDec 2021 by Andrea Giammarchi

ABOUT THE AUTHOR

Andrea Giammarchi (@WebReflection) is a Web standards advocate developer and veteran with great focus on performance, cross-platform compatibility (polyfills author) and passion all over the field when it comes to pushing the Web forward

Despite being likely the most deployed data exchange format in the software industry to date, and beside the existing competing implementations chasing the best way to stringify or parse faster than a speeding bullet, there are still various “gotchas” around this standard that most developers are unaware of, and some of these gotchas could arguably be a life-saver!

The Homogeneous Collection Case

Nowadays deprecated, yet working as module, JSONH has been quite successful so far, and not only because it’s also available in a PHP, Ruby, or Python2 variant.

The concept around homogeneous collection is pretty simple: if it’s an array of always same-shape objects, we can drop all repeated keys and flat ‘em out instead.

That is:

// regular list
const current = [
  // all shapes are {a, b}
  {"a":"A", "b":"B"},
  {"a":"C", "b":"D"},
  {"a":"E", "b":"F"}
];

// optimized one
const hc = [
  // key amount + names
  2, "a", "b",
  // and values
     "A", "B",
     "C", "D",
     "E", "F",
];

A very simple implementation of a revival for that hc value would be:

const hcRevive = array => {
  // collect all keys
  const [length] = array;
  const keys = [];
  for (let i = 1; i <= length; i++)
    keys.push(array[i]);

  // regenerate all object literals
  const values = [];
  for (let i = length + 1; i < array.length; i += length) {
    const object = {};
    for (let k = 0; k < keys.length; k++)
      object[keys[k]] = array[i + k];
    values.push(object);
  }
  return values;
};

The usual counter-argument about this technique is:

Compression already makes repeated values irrelevant
A specialized format requires extra code over the well-known JSON standard
There’s no standard about homogeneous collections … so why even bother?

Fair enough, but my counter-argument has always been:

Compression could improve everything else, instead of taking care of needlessly repeated keys. Also 37% of application/json HTTP requests in the wild are not even compressed
If parsing the JSON is faster out of the box, and recreating specific PLs/domain-related objects is easier, all environments win
Standards are very often born out of necessity or de-facto adoption of practices … so why even worrying about this?

The Repeated Case

While repetition has been discussed already in a few paragraphs, its meaning spans across various patterns and situations, that might also be critical.

Avoiding repeated same-strings or same values, as example, is one of the rarely discussed issues:

const a = "a";
const same = [a, a];
const maybe = JSON.parse(
  JSON.stringify(same)
);

Now … I know this might look and sound like those “premature optimizations” cases, but the truth is that you can reduce 2X up to 10X or more the heap, or the storage, needed to deal with those same strings: read this post if you’re in doubt.

What is going on ?!?

The math is simple:

multiple strings, once parsed (deserialized), allocate their own size, because strings are static values, but not necessarily references, in JS
parsing doesn’t take into account previously parsed strings, assuming duplicates are not so common in JSON (structured clones or …) structures

… about that … I am pretty sure we all know lists and values, where those lists are the source of such values, result into duplicated strings all over our serialized JSON content, isn’t it?

In this case, there’s indexed-values to the rescue, described more in this post, which basically does the following:

// current, and common, status
const travelingData = [
  {"what": ["one"]},
  {"what": ["one", "or-more"]},
  {"what": ["value"]}
];

// after indexed-values
const travelingData = {
  "what": ["one", "or-more", "value"],
  "data": [
    {"what": [0]},
    {"what": [0, 1]},
    {"what": [2]}
  ]
};

If we look close enough though, this example can play perfectly with the previous JSONH case too, producing the best outcome of them all:

never repeated, or redundant, data traveling around
the least amount of physical space needed to store the value
a fast-enough/simple logic to restore the values
a heap-memory, postMessage, and IndexedDB friendly way to save, read, send, or consume, data

And that would be:

const idealData = {
  "what": ["one", "or-more", "value"],
  "data": [1, "what", [0], [0, 1], [2]]
};

Wait… what ???

Ok, ok, I understand the confusion, and I’d like to clarify that, so… let’s start with avoiding same value redundancy:

import('https://cdn.skypack.dev/indexed-values')
  .then(({IndexedValues}) => {

    // the source of not-repeated data (as Set)
    const what = new IndexedValues;
    what
      .add('one')
      .add('or-more')
      .add('value')
    ;

    // any list of items that points at same data
    const data = [];
    data.push(
      {what: what.bindValues(['one'])},
      {what: what.bindValues(['one', 'or-more'])},
      {what: what.bindValues(['value'])}
    );

    // serialize all data related items
    console.log(JSON.stringify({what, data}));
    //  {
    //    "what":["one","or-more","value"],
    //    "data":[{"what":[0]},{"what":[0,1]},{"what":[2]}]
    //  }
  });

At that specific point, the only missing extra optimization to have the most dense output that’s network, compression, and heap memory friendly, is to remember the homogeneous collection part of this post:

// previous code ...

// the Homogeneous Collection Packer
const hcPack = list => {
  if (list.length < 1)
    return list;

  const keys = Object.keys(list[0]);
  return [
    keys.length,
    ...keys,
    ...list.flatMap(value => keys.map(k => value[k]))
  ];
};

console.log(JSON.stringify({what, data: hcPack(data)}));

// the unicorn outcome
//  {
//    "what":["one","or-more","value"],
//    "data":[1,"what",[0],[0,1],[2]]
//  }

So now we talk:

the traveling data is smaller out of the box
the computation power needed to restore data outperforms network conditions
the lower memory footprint to deal with traveling data works even on cheap mobile hardware
because the parsing is reduced, it’s also very possible the retrieved data is easier, and faster, to consume too!

To demonstrate that, let’s retrieve our initial {what, data} intent, shall we ?!?

// our previous good old friend
const hcRevive = array => {
  // collect all keys
  const [length] = array;
  const keys = [];
  for (let i = 1; i <= length; i++)
    keys.push(array[i]);

  // regenerate all object literals
  const values = [];
  for (let i = length + 1; i < array.length; i += length) {
    const object = {};
    for (let k = 0; k < keys.length; k++)
      object[keys[k]] = array[i + k];
    values.push(object);
  }
  return values;
};

// after JSON.parse(...)
const parsed = {
  "what": ["one", "or-more", "value"],
  "data": hcRevive([1, "what", [0], [0, 1], [2]])
};

// use the foreign indexed-values utility
import('https://cdn.skypack.dev/indexed-values')
  .then(({IndexedValues}) => {
    // restore all values as Set
    const what = new IndexedValues(parsed.what);

    // restore all values previously parsed as index
    for (const data of parsed.data) {
      data.what = what.fromIndexes(data.what);

      // for post sake:
      // read all related values for this data
      // pointing always at the `what` source
      console.log([...data.what.values()]);
    }
  });

Summary

This whole post goal is not to tell anyone they are using JSON badly or anything, rather to inform that in some very specific circumstances there’s more than just the parsing library to care about, specially wen it comes to constrained environments, such as Internet of Things devices, low memory and CPU mobile devices, and so on.

But on top of that, because both network and storage are, usually, a problem to either consider or take care about, this post suggests a few simple techniques to workaround possible constraints and limitations, whenever these are actually an issue.

Above all that, consider the ability to add recursion in the mix, and realize JSON is both extremely portable, handy, simple, and full of possible features that don’t come out of the box, but can be easily implemented.

Web Performance Calendar