Web Performance Calendar

The speed geek's favorite time of year
2018 Edition
ABOUT THE AUTHOR

Neil Gunther

Neil Gunther, M.Sc., Ph.D. (@DrQz), is a researcher and teacher at Performance Dynamics where, amongst other things, he developed the both the Universal Scalability Law (USL) scalability model and the PDQ: Pretty Damn Quick open-source performance analyzer, and wrote some books based on them. Prior to that, Dr. Gunther was the Performance Manager at Pyramid Technology and also spent a decade at the Xerox Palo Alto Research Center. Dr. Gunther received the A.A. Michelson Award in 2008 and is a Senior member of ACM and IEEE. He sporadically blogs at The Pith of Performance but much prefers tweeting as @DrQz.

Time: The Zeroth Performance Metric

That’s the title of the third chapter in my book, Analyzing Computer System Performance with Perl::PDQ, There, the discussion is centered around various technical concepts of time, such as: Unix epoch time, virtual clocks, benchmark timers, response time distributions, and so on. Time, whether explicit or implicit, is fundamental to the definition of all performance metrics. One aspect of time in that chapter which might’ve stood to be stressed even more is, the vital importance of timestamps.

Timestamp formats involve more subtleties than you might recall. As a refresher, here are some examples generated in the R language (my goto weapon for performance analysis).

# Human-readable format
> date()
[1] "Fri Nov 16 08:44:06 2018"

# Canonical machine-readable format
> Sys.time()
[1] "2018-11-16 08:46:09 PST"

# Change TZ (death to time zones!)
> as.POSIXlt(Sys.time(), "UTC")
[1] "2018-11-16 17:27:37 UTC"

# Unix epoch time (serenity now!)
> as.numeric(as.POSIXct(Sys.time()))
[1] 1542386998

Although most modern performance tools do record timestamps as part of the monitoring and logging process, the importance of timestamps really goes beyond just collecting performance metrics. What I’m referring to is, timestamping all information, and most particularly, web pages.

  • Performance and capacity management reports, which are commonly published as internal web pages, these days, should always be timestamped. You’re bound to want to know in two years time, when you’ve completely forgotten, why you wrote it.
  • When diagnosing a performance problem, it can be vital to know if the time at which the performance incident occurred was correlated with server activity involving the publication of certain web pages.

At first glance, this may seem like a rather trivial point but, the lack of support for, and enforcement of, timestamps in web pages and blog posts has been a long-standing pet peeve of mine because I think the implications are more significant than most people realize. As a chemist, in a former life, I quickly learnt at ten years old that writing the date and time on all lab work is de rigueur. That’s never been a consistent part of IT.

Historically, timestamps have been de rigueur in pre-web printed publications, viz., books, newspapers, journal articles, and the like. Even personal letters (does anyone still write those?) are expected to have a date. The static nature of the printed medium made affixing dates both natural and meaningful (even in Italian).

One the other hand, the dynamic nature web publication and the associated omission of timestamps is a relatively new phenomenon that has gained creeping acceptance largely by virtue of going unnoticed. Although there is no obvious or easy solution to this problem that I know of, no significant effort seems to be devoted to finding one, either.

To emphasize what I mean, consider the following rather dramatic case in point.

Dead Docs

A few months ago, I happened to land on some very interesting web pages written by a mathematician. Ultimately, I wanted to cite his work but that requires quoting the date when the web pages were published (to a reasonable approximation, at least — given the dynamic nature of web editing). Unfortunately, these particular web pages (of which there are many) don’t have any kind of date on them. I even riffled the page source. They look like late 90s era HTML.

Luckily, the author did provide his email address so, I shot off a request for the original publication date. The email reply I got, however, was not from him but his wife. Sadly, she informed me that he’d since developed severe Alzheimer’s disease. In other words, there was no longer any way that he could provide the date information. His wife had no idea, either.

Since the pages are hosted at a .edu website, another possibility might’ve been to go through the webmaster to get the create date of the HTML files. That would have required permission from his wife and possibly getting a login, all of which was just too messy, given the sensitive circumstances. Effectively, the publication date was lost, as if the author had actually died. All for the lack of timestamp enforcement.

I’m Not Alone

As the following posts demonstrate, others have voiced similar frustrations with not being able to conveniently know web publication dates.

  1. How do I find when a web page was last updated?, Stackoverflow, May 14, 2014

    A: “Open your browsers console and enter the following: javascript:alert(document.lastModified)

  2. How to get file creation date on browser using javascript or jquery, Stackoverflow, Jan 6, 2015

    A: “You can not get the creation date. Only the last modified date is available in file properties.”

  3. I Want A Google No Date/Timestamp Penalty,
    Search Engine Roundtable, Feb 17, 2016

It Can Be Done

The situation is not hopeless, however. HTML time tags, like HTML <time> Tag and HTML <time> datetime Attribute, are defined by the W3C consortium, but W3C does not seem to have given any consideration to enforcing timestamps, or at least encouraging their use, within HTML. Conversely, some markup conversion tools like, TtH and LaTeXML, do append a timestamp, e.g., Generated on Fri Feb 9 15:23:09 2018 by LaTeXML [LOGO], automatically to each HTML page.

More sophisticated web sites, like arXiv and Stackoverflow, that are cognizant of their information repository status, also incorporate a timestamp corresponding to the original publication date, or comment date, in human-readable form. Even Twitter, YouTube and Facebook incorporate a dynamic timestamp corresponding to the elapsed time since publication. Not ideal, not as accurate as a canonical timestamp, but better than nothing.

Twitter Threads

A related, and more recent peeve, is the burgeoning popularity of Twitter Threads. For those not familiar, the basic Twitter paradigm provides for a user to see a stream of tweets published by those that the user has selected to follow. Although each such tweet is temporally ordered into a Twitter Timeline, the content of each tweet is issued independently so, they’re unrelated to each other. Content correlation increases when a certain global topic is Trending — many users tweeting about the same thing. The intellectual value of the tweets is a function of who you choose follow: everything is not a famebot or flamebot rant.

The most significant difference with Twitter Threads is that the content of each tweet is related. Unlike the standard Timeline, a Twitter Thread comprises a contiguous set of time-ordered tweets with related content — like a series of paragraphs in a blog post. Another difference is that Followers can also chime in on the same Thread, and those tweets are integrated in temporal order.

Usually, the originating author initiates a sequence of tweets (sometimes a dozen or more, including images and hyperlinks) by Replying-to-self to generate the Thread. This sequence can then become extended with Replies from Followers and other readers. As far as I can see, this unwieldy structure has arisen as a workaround for Twitter’s (revised) limit of 280 characters per tweet. Creative Twitter-heads discovered a way to approximate a full-blown blog with micro-blog parts.

Although each tweet in a Thread has a timestamp (or some kind of time reference), a new problem arises. There’s no convenient way to record the entire thread, which can easily run to the equivalent of many printed-pages when all the embedded Replies are included.
That can make answering simple questions like:

  • What things were said?
  • When were they said?

a much trickier task than it ought to be. Other than resorting to the irksome process of taking a succession of disjoint screenshots, there seems to be no way to easily capture a Thread as a separate file for later perusal. Even if they’re referenced with an external URL, they still remain trapped inside the standard Twitter paradigm. Of course, that may be a deliberate ploy on the part of Twitter Inc. to garner more users.

Further research indicates that there have existed third-party apps for capturing the entire Thread as a single file. However, they no longer work since Twitter Inc. scrambled to change their APIs in response to unwanted criticism about abusive tweet-bots, political malfeasance, and other craziness. In any other serious context, this kind of functional brittleness would be totally unacceptible.

We sincerely apologize for any inconvenience but, there are no wheels on your car today because we upgraded the design overnight and the wheel manufacturer is no longer compliant. Such innovation shows our passion for creating the best cars in the industry.

I love Twitter because I generally learn interesting and useful things there long before I see them anywhere else. But, I find this unintended encapsulation of Threads very frustrating — especially when taken together with the timestamp issue already discussed. One step forward, two back.

Aggravated Archeology

Missing timestamps, and the concomitant inability to cite archival content, is reminiscent of bit rot: the loss of tools that know how to translate binary data whose formats have become forgotten or lost in the pall-mall rush towards each new generation of technology. Can you still read your Windows email from 1995?

Harking back to the mathematical web pages mentioned above, even if I’d been given permission to work with the webmaster to obtain the file timestamps, there is no guarantee those would have been correct. Although Unix stat filename can reveal the create date of a file, that’s not necessarily the date when the HTML file was created or published. Files at .edu websites often get copied for backup or otherwise migrated across different storage subsystems, and that changes the meaning of create in the file timestamp. The genesis timestamp really needs to come from the HTML source. Perhaps this is yet another application for a distributed ledger technology, like blockchain, but scalabie performance is likely to be even more problematic at web scale than it already is for financial transactions.

So, here we are in the twenty first century beset with a regressive problem of missing timestamps that leaves us in the embarrassing situation akin to trying to do carbon dating without the radioactivity. In my view, this temporal and spatial brittleness of the web is an indictment of the obsession with monetizing anything that looks remotely like “innovation”, while vital problems go unaddressed. Vitals are not always sexy.


Thanks to the following followers on Twitter: @jdevoo, @SrPerf, @sPerformance, @pjpuglia, @ruslanrusu, @peterlharding, and @JonDHill for comments on earlier drafts of this piece that grew out of a brief Twitter exchange with @stoyanstefanov on October 17, 2018.