Carrier Networks: Down the Rabbit Hole

5thDec 2011 by Tim Kadlec

ABOUT THE AUTHOR

Tim Kadlec is web developer living and working in northern Wisconsin with a propensity for efficient, standards-based front-end development. His diverse background working with small companies to large publishers and industrial corporations has allowed him to see how these standards can be effectively utilized for businesses of all sizes.

His current interests include creating cross-platform sites and applications using the open web stack and improving the state of performance optimization on the web.

He sporadically writes about a variety of topics at timkadlec.com. You can also find him sharing his thoughts in a briefer format on @tkadlec.

There’s a point in Lewis Carroll’s “Alice’s Adventures in Wonderland” where Alice believes she may never be able to leave the room she has found herself in after following the rabbit down its hole, and she starts to question her decision:

“I almost wish I hadn’t gone down that rabbit hole—and yet—and yet—it’s rather curious, you know, this kind of life.”

The world of mobile performance can feel the same—particularly when you start to explore mobile carrier networks. If you’re looking for consistency and stability, you should look elsewhere. If, on the other hand, you enjoy the energy and excitement that so often can be found in the chaos that surrounds an unstable environment, then you’ll find yourself right at home.

Variability

The complexity of a system may be determined by the number of its variables, and carrier networks have a lot of variables.

The performance of a network varies dramatically depending on factors such as location, the number of people using a network, the weather, the carrier—there isn’t much that you can rely on to remain static.

One study demonstrated just how much variance there can be from location to location. The test involved checking bandwidth on 3G networks for three different mobile carriers—Sprint, Verizon, and AT&T—in various cities across the United States. The results were stunning for their diversity.

The highest recorded bandwidth was 1425 kbps in New Orleans on a Verizon network. The lowest was 477 kpbs in New York City in AT&T—a difference of 948 kbps. Even within a single carrier, the variation was remarkable. While Verizon topped out at 1425 kbps, their lowest recorded bandwidth was 622 kbps in Portland.

Another informal experiment was recently conducted by Joshua Bixby. Joshua randomly recorded the amounts of bandwidth and latency on his 3G network. Even from a single location, his house, the latency varied from just over 100 ms all the way up to 350 ms.

Latency

There is remarkably little information about the amount of latency on mobile networks. In 2010, Yahoo! released some information based on a small study they had done. They monitored traffic coming into the YUI blog, recording both bandwidth and latency. They then averaged those numbers by connection type and published a graph demonstrating the results. Their study showed that the average latency for a mobile connection was 430ms, compared to only 130ms for an average cable connection.

The study isn’t foolproof. The sample size was small and the type of audience that would be visiting the YUI blog is not exactly a representation of the average person. At least it was publicly released data. Most of the rest of the latency numbers I see mentioned come without much context; there usually isn’t any mention of how it was measured.

Transcoding

Another concern with mobile networks are frequent issues caused by carrier transcoding. Many networks, for example, attempt to reduce the file size of images. Sometimes, this is done without being noticed. Often the result is that images become grainy or blurry and the appearance of the site is affected in a negative way.

The Financial Times worked to avoid this issue with their mobile web app by using dataURIs instead, but even this technique is not entirely safe. While the issue is not well documented or isolated yet, a few developers in the UK have reported that O2, one of the largest mobile providers in the UK, will sometimes strip out dataURIs.

Transcoding doesn’t stop at images. T-Mobile was recently found to be stripping out anything that looked like a Javascript comment. The intentions were mostly honorable, but the method leads to issues. The jQuery library, for example, has a string that contains “*/*“. Later on in the library, you can again find the same string. Seeing these two strings, T-Mobile would then strip out everything that was in between—breaking many sites in the process.

This method of transcoding could also create issues for anyone who is trying to lazy-load their Javascript by first commenting it out—a popular and effective technique for improving parse and page load time.

One carrier, Optus, not only causes blurry images by lowering the image resolution, but they also inject an external script into the page in a blocking manner.

Unfortunately, most of these transcoding issues and techniques are not very exposed or well documented. I suspect countless others are just waiting to be discovered.

Gold in them there hills

This can all sound a bit discouraging, but that’s not the goal here. We need to explore carrier networks further because there is an incredible wealth of information we can unearth if we’re willing to dig far enough.

One example of this is the idea of inactivity timers and state machines that Steve Souders was recently testing. As it turns out, mobile networks rely on different states which determine the amount of the allotted throughput which in turn affects battery drain. To down-switch between states (thereby reducing battery drain, but also throughput) the carrier sends an inactivity timer. The inactivity timer signals to the device that it should shift to a more energy efficient state.

This can have a large impact on performance because it can take a second or two to ramp back up to the highest state. This inactivity timer, as you might suspect, varies from carrier to carrier. Steve has set up a test that you can run in an attempt to identify where the inactivity timer might fire on your current connection. The results, while not foolproof, do strongly suggest that these timers can be dramatically different.

We need more of this kind of information and testing. Networks weren’t originally optimized for data; they were optimized for voice. When 3G networks were rolled out, the expectation was that the major source of data traffic would come from things like picture messaging. The only accessible “mobile” Internet was WAP—a very simplified version of the web.

As devices became more and more capable, however, it became possible to experience the full internet on these devices. People started expecting to see not just a limited version of the Internet, but the whole thing (videos, cat pictures, and all) leaving the networks overwhelmed.

There are undoubtedly going to be other techniques, similar to these transcoding methods and state machines, that carriers are doing in order to try and get around the limitations of their network in order to provide faster data and serve more people.

4G won’t save us

Many people like to point to the upcoming roll-out of 4G networks as a way of alleviating many of these concerns. To some extent, they’re right—it will indeed help with some of the latency and bandwidth issues. However, it’s a pretty costly endeavor for carriers to make that switch meaning that we shouldn’t expect widespread roll-out overnight.

Even when the switch has been made we can expect that the quality, coverage and methods of optimization used by the carriers will not be uniform. William Gibson said, “The future is already here—it’s just not very evenly distributed.” Something very similar could be said of mobile connectivity.

Where do we go from here

To move this discussion forward, we need a few things. For starters, some improved communication between developers, manufacturers, and carriers would go a long, long way. If not for AT&T’s research paper, we may still not be aware of the performance impact of carrier state machines and inactivity timers. More information like this not only cues us into the unique considerations of optimizing for mobile performance, but also gives us a bit of perspective. We are reminded that it’s not just about load time; there are other factors at play and we need to consider the trade-offs.

Improved communication could also go a long way towards reducing the issues caused by transcoding methods. Take the case of T-Mobile’s erroneous comment stripping. Had there been some sort of open dialogue with developers before implementing this method the issues would probably have been caught well before the feature made it live.

We could also use a few more tools. The number and quality, of mobile performance testing tools is improving. Yet we still have precious few tools at our disposal for testing performance on real devices, over real networks. As the Navigation Timing API gains adoption that will help to improve the situation. However, there will still be ample room for the creation of more robust testing tools as well.

There’s light at the end of the tunnel

You know, eventually Alice gets out of that little room. She goes on to have many adventures and meet many interesting creatures. After she wakes up, she thinks “what a wonderful dream it had been.”

As our tools continue to improve and we explore this rabbit hole further, one day we too will be able to make some sense of all of this. When we do our applications and our sites will be better for it.

Web Performance Calendar