At Shunra earlier this year, we decided to create a library of global network conditions to enable our customers to better test the performance of their applications under conditions similar to those experienced by their users. Our focus was, quite naturally these days, on mobile networks. We expected to see high variability in conditions due to the client’s location, provider, device, even the day of the week. All of those indeed turned out to be important factors, but we were surprised to find that the location of gateways and the routing decisions made by providers also play a major role in the performance of mobile networks.
This article details some of our findings. It is based on over three hundred thousand samples collected during the first six months of 2012 using our Network Catcher Express mobile app, our mobile website and other sources. Each sample comprises the location of the user, his or her public IP address, the observed latency, the device’s model and operating system, and for Android devices – the mobile technology in use (e.g. UTMS, HSDPa , LTE). Unless otherwise specified, we only used samples where users accessed nearby servers so their physical distance would not affect the latency measured. We used traceroute and geo-ip services to estimate the location of the client’s gateway. Although those methods are unreliable it will become apparent that our results are not affected by those estimates.
What Happens in Vegas Doesn’t Stay in Vegas
Omitting many important details, the connectivity between the internet and a 3G mobile provider’s network is managed by one or more elements called Gateway GPRS Support Nodes, or GGSNs for short. All web traffic originating from a mobile device to the world, or coming back in to the mobile network, passes through one of those GGSNs. We can identify the GGSN a user is routed through by looking at her public IP address which was assigned to her by the GGSN.
GGSNs provide many important services such as packet filtering, billing and even interfaces for lawful interception. Thus, mobile providers typically keep their numbers low and prefer a centralized rather than a distributed deployment of the GGSNs (see for example this white paper from Cisco). In other words, we expect to see the same GGSN serving many geographically dispersed clients.
Compare for example the locations of Comcast’s ADSL subscribers who were given an arbitrary IP address (Figure 1, look for the red dot in Minnesota)
with AT&T subscribers who were given the public address 144.160.98.xxx in Figure 2 (addresses were censored to protect the privacy of the innocents):
Although extreme, this kind of geographical spread of users is not specific to that address, or even to this provider. It’s a fact of life in mobile networks. Obviously those unfortunate users who access the network far from the nearest GGSN will observe longer delays and a worse experience. A good example of this is in Las Vegas.
One would assume that users in Las Vegas should experience low latencies when accessing websites (or actually servers) located in Vegas. However, on mobile networks, they are usually routed through a GGSN located either in California or Arizona first and only then are their packets sent to their destination. Thus, the round trip time for a user located in Vegas trying to access a website hosted in Vegas, will be at the very least 50 milliseconds. This latency occurs because packets go from Nevada to California and back twice; once from the user to the GGSN, and then from the GGSN to the website while each roundtrip takes about 25ms.
To illustrate this point further (and taking it to the extreme), in Figure 4 we chart histograms of the latencies experienced by Sprint users situated in Las Vegas when accessing servers based in Las Vegas, versus servers based in California. It becomes immediately apparent that, in contrast to our prior assumption, the physical distance between the user and server in mobile networks is not necessarily correlated with the measured latency. You may notice that the difference between the two peaks is even higher than expected, and is slightly above 100ms. Avid performance fans already know that 100ms has a big impact on revenue and on user engagement. Moreover, an additional 100ms in latency usually translates into even larger delays when measuring transaction and webpage load times.
Residents of Las Vegas can consider themselves the lucky ones (well, in terms of mobile network latency, at least). Mobile traffic from Hawaii is also routed through California, though the added latency in this case is much greater.
The Wild, Wild West
Surprisingly, you don’t have to go as far as Hawaii to suffer from long delays due to mobile routing “mischief”. You (and the server you are accessing) can be in one of the top tech hubs in the US and still your traffic may be routed through a distant state. Let’s take a typical AT&T iPhone user based in Seattle. This user has a 97% chance to be routed to the internet through an IP address matching the mask 18.104.22.168/22. A quick look up reveals that 16% of AT&T’s iPhone users based in California are also routed through that set of IP addresses. Hence, it’s a good guess that AT&T’s iPhone users based in California and Washington share the same GGSN, but where is that GGSN located?
Using traceroute, we can see that addresses in the 22.214.171.124/22 range resolve to San Francisco, so we would expect that users based in California, accessing Californian websites will experience shorter delays than Washingtonians accessing Washington based websites. Indeed, this is the case, as can be seen in Figure 5.
What about the other 84% AT&T iPhone users from sunny California? Most of them are assigned with public IP addresses from the range 166.205.136/22. Bafflingly, traceroute indicates that these addresses reside in, yes, Washington. However, since it seems that no user from Washington is routed through those addresses, and some geo-ip services claim theses addresses are located in California, we can’t draw definite conclusions. What we can say is that California-based users routed through 166.205.136/22 suffer from longer delays than those lucky 16% that are routed through 126.96.36.199/22, as demonstrated in Figure 6.
and that such users will experience slightly lower latencies when accessing servers in Washington than in California (Figure 7). In fact, the delta between the two peaks in Figure 7 is suspiciously similar to the one presented in Figure 5.
AT&T is far from the only mobile provider that displays weird routing decisions. Apparently, there were a few months when a substantial portion of T-Mobile’s traffic originating in Florida was routed through Illinois.
Travelers face the eternal dilemma – whether to pay an arm and a leg for WiFi access in their hotels or for mobile roaming access. In terms of performance, the former seems like the wiser choice. Many times when travelling abroad, your traffic will be routed back through your originating country. Figure 8 presents an extreme case – two users situated in Tel Aviv, one American (on the left), one Israeli, measuring the latency to a popular Israeli website over a 3G connection. The American experienced almost four times the latency the Israeli did (444ms versus 130ms). Using traceroute it seems that the American’s traffic was routed through Atlanta.
The Future Looks Brighter
Fortunately, things are getting better. As end users are consuming more and more content using their mobile devices, mobile providers face difficulties routing their traffic to remote GGSNs. The backhaul networks, and indeed the GGSNs themselves, are not designed for these higher throughputs. Providers routinely establish new peering points to the internet, making it “closer” to their users than ever before. LTE goes a step further and eliminates network elements between the user and the gateway, further decreasing the latency.
In the meantime, you can check your route to the internet using this little web application. With the data collected with this web-app we’ll create a better map of the mobile internet, which (given enough samples) we’ll publish in the near future.
I’m grateful to Dror Vinkler. This article is based on his work and findings.
Israel Nir (@shunra) likes to create stuff, break other stuff apart, code, the number 0x17 and playing the ukulele. He also works as a team leader at Shunra, where he builds tools to make applications run faster.