Web Performance Calendar

The speed geek's favorite time of year
2019 Edition
Lohith Bellad

Lohith Bellad (@lohith_bellad) is a Protocol Engineer at Cloudflare, helping to scale the Internet.

At Cloudflare, we develop protocols at multiple layers of the network stack. Earlier we focused on HTTP 1.1, HTTP 2.0, TLS 1.3 protocols. Now, we are working on QUIC and HTTP/3, which is still in IETF draft, but gaining a lot of interest recently. Cloudflare also had UDP based transport protocol for mobile apps from the Neumob acquisition.

QUIC combines a couple of network protocol layers into a single spec mostly for improving performance and security. Its draft includes a transport layer which specifies packet format and basic state machine, recovery and congestion control, and crypto based on TLS 1.3 and even HTTP application layer, which is now called HTTP/3.

Let’s focus on transport and recovery layer first. This layer provides a basis of what is sent on the wire (packet binary format) and how we send it reliably. It includes how we open the connection, how to handshake a new secure session with help of crypto/TLS layer, and how we send data reliably and how to react when there is a packet loss or reordering of packets. Also it includes a flow control and congestion control for being nice with other transport protocol in the same network. Once you have some confidence of basic transport and recovery layer, then you can work on higher application layers such as HTTP/3.

For development of such transport protocol, you need multiple stages of development environments. Since this is a network protocol, it’s best to test in an actual physical network to see how it is sent and received. You may start the development using localhost, but after some time you may want to send and receive packets with other hosts. You can build a lab with a couple of virtual machines, using Virtualbox, VMWare or even with Docker. We also have a local testing environment with a Linux VM. But sometimes these have limited network (localhost only) or are noisy due to other processes in the same host or virtual machines.

Next step is to have a test lab, typically an isolated network focused on protocol analysis only consisting of dedicated x86 hosts. Lab configuration is particularly important for testing various cases – there is no one-size-fits-all scenario for protocol testing. For example EDGE is still running in production mobile network but LTE is dominant and 5G deployment is in early stages. Also Wifi is very common these days. You want to test your protocol in all those environments. Of course you can buy each machine or having a very expensive network simulator for those types of environment, but using cheap hardware and open source OS you can configure a similar environment.

Mobile devices have become a crucial part of our day to day life, testing the new transport protocol on mobile devices is critically important for app performance. To facilitate that, you need to write a mobile testapp which will proxy data over the new protocol under development. Embedding this mobile test device to test lab gives the ability to analyse protocol functionality and performance in different network conditions.

QUIC Protocol Testing lab

The goal of QUIC testing lab is to aid transport layer protocol development. To develop a transport protocol you need to have a way to control your network environment and a way to get all different types of debugging data possible. Also you need to get some numbers/metrics for comparing with other protocol in production.

QUIC Testing Lab has the following goals:

  • Help multiple transport protocol development: Developing a new transport layer requires many iterations, from building and validating packets as per protocol spec and need to make sure everything working fine under moderate load with very harsh conditions such as low bandwidth and high packet loss. You need a way to run tests with various network conditions repeatedly for catching any issues unexpected.
  • Debugging multiple transport protocol development: Getting debugging info as much as you can is important for fixing bugs. Looking into packet capture definitely helps but you also need a detailed debugging log of server and client for understanding what/why for packets. For example, when a packet is sent, you want to know why — is this because there is an application which wants to send some data, or is this a retransmit of previously known as lost, or is this a loss probe which is not an actual packet loss but sent.
  • Performance comparison between each protocol: You want to compare a new protocol performance by comparing with existing protocols such as TCP, or with previous version of current protocol under development. Also you want to test with varying parameters such as changing congestion control mechanism, or changing various timeouts, or changing buffer size at various levels.
  • Finding a bottleneck or errors easily: By using tests above, you may see an unexpected error – transfer can be timed out, or end with an error, or transfer may be corrupted at the client side – each test needs to make sure every test is run correctly, by using checksum of original file to compare with what is actually downloaded, or checking various error code in protocol level or API level.

When you have a test lab with separate hardware, you have benefits, as follows:

  • Can configure the testing lab without public Internet access – safe and quiet
  • Handy access of hardware and its console for maintenance purpose, or adding/updating hardwares
  • Try other CPU architectures. For clients we use Raspberry PI for regular testing because this is ARM architecture (32bit or 64bit), similar to modern smartphones. So testing with ARM architecture helps for compatibility testing before going into a smartphone OS.
  • You can add a real smartphone for testing, such as Android or iPhone. You can test with Wifi but these devices also supports Ethernet, so you can test them with wired network for better consistency.

Lab Configuration

Here is a diagram of QUIC Protocol Testing Lab:

image alt text

This is a conceptual diagram and you need to configure a switch for connecting each machine. Currently we have Raspberry PI (2 and 3) as a Origin and Client. And small Intel x86 boxes for Traffic Shaper and Edge server. Ethernet switches for inter connectivity.

  • Origin is simply serving http and https test objects using nginx. Client may download a file from Origin directly to simulate a download from customer origin server.
  • Client will download a test object from Origin or Edge, using a different protocol. In typical CDN configuration Client is connecting to Edge instead of Origin, so this is to simulate CDN edge server in a real world. For TCP we are using curl command line client and for QUIC, quiche’s http3_client with some modification.
  • Edge is running Cloudflare nginx for serving http/https via TCP and also QUIC protocol using quiche. Edge server is installed with linux kernel used in Cloudflare production to match real network conditions.
  • Traffic Shaper is sitting between Client and Edge (and Origin), controlling network condition. Currently we are using FreeBSD and ipfw + dummynet. Traffic shaping can also be done using Linux netem which provides more simulating features.

The goal is to run tests with various network conditions, such as bandwidth, latency and packet loss upstream and downstream. Since QUIC is running over UDP, need to control TCP and UDP traffic both. Lab is able to run a plaintext HTTP test but currently our focus of testing is HTTPS (TCP) and QUIC (UDP).

Debugging and performance analysis using Smartphone:

Adding a smartphone to testbed gives advantage in terms of understanding real performance issues. Major smartphone operating systems, iOS and Android have quite different networking stack. Adding a smartphone to testbed gives the ability to understand these operating systems networking stack in depth which aides new protocol designs.

Debugging network performance issues is hard when it comes to mobile devices. By adding actual smartphone into testbed itself gives the ability to take packet captures at different layers.

These packet captures are very critical to analyse and understand protocol performance.

Mobile device either Android or iOS is installed with testapp built with proprietary client proxy (cproxy) software which proxies data over the new transport protocol under development which in our case is QUIC. Testapp also has the ability to make HTTP requests over TCP for comparison purposes. Using testapp multiple HTTP requests are issued either sequential or concurrent using QUIC and TCP as the underlying protocol.

image alt text

The above figure shows the networking block diagram of another similar lab testbed used for protocol testing where a smartphone is connected both wired and wirelessly. Linux netem based traffic shaper sits in-between client and server shaping the traffic. Various networking profiles are fed to traffic shaper to mimic the real world scenario. The client can be either Android or iOS based smartphone, server is vanilla NGINX serving static files. Client, server and traffic shaper are all connected to internet along with private lab network for management purposes.

Test Automation and Visualization

In the lab, there is a script installed in Client, which can run a batch of testing with configuration parameters – for each test combination, you can define a test configuration, including:

  • Network Condition – Bandwidth, Latency, Packet Loss (upstream and downstream)For example using netem traffic shaper we can simulate LTE network as below,

    (RTT=50ms, BW=22Mbps upstream and downstream, with BDP queue size)

    $ tc qdisc add dev eth0 root handle 1:0 netem delay 25ms
    $ tc qdisc add dev eth0 parent 1:1 handle 10: tbf rate 22mbit buffer 68750 limit 70000
  • Test Object sizes – 1KB, 8KB, … 32MB
  • Test Protocols: HTTPS (TCP) and QUIC (UDP)
  • Number of runs and number of requests in a single connection

As a result, test script drops a CSV format of text for importing into other tools for data processing and visualization – such as Google sheet, Excel or even with jupyter notebook. Also it’s able to post the result to SQL database (Clickhouse), so you can query and visualize the results.

Sometimes a whole test combination takes a long time – current standard test set with simulated 2G, 3G, LTE, Wifi and various sizes with 10 times of each request may take several hours to run. Especially large object testing in a slow network takes most of the time, so sometimes we also need to run a limited test (e.g. testing LTE-like conditions only for a sanity check) for quick debugging.

Following are some of the visualizations:

Chart using Apache Zeppelin:

image alt text

Chart using Google Sheets

image alt text

Android or iOS testapp can be used to issue multiple HTTPS requests of different object sizes in sequential and concurrent fashion using TCP and QUIC as underlying transport protocol. Later, TTOTAL of each HTTPS request is used to compare TCP and QUIC performance over different network conditions. One of such comparisons is shown below,

image alt text

Debugging transport protocol

It’s easy and straightforward to capture packets and analyse using tcpdump tool on x86 boxes, but it’s a challenge to capture packets on iOS and Android devices. On iOS device ‘rvictl’ lets us capture packets one external interface. But ‘rvictl’ has some drawbacks like timestamps are not accurate. Since we are dealing with millisecond level events, timestamps need to be accurate to analyse and root cause the problem.

We can capture packets on internal loopback interfaces on jailbroken iphones and rooting the android devices. Jailbreaking a recent iOS device is nontrivial. Also need to make sure that auto update of any sort is disabled on such a phone else it’d disable the jailbreaking rom and you’d have to do the whole dance again. With jailbroken phone we have root access of the device which lets us take pcaps as needed using tcpdump.

Packet captures taken using jailbroken iOS device or rooted Android device connected to lab testbed helps us analyse the performance bottlenecks and improve the protocol performance.

iOS and Android devices have different network stack at their core operating systems. These packet captures also helps us understand the network stack of these mobile devices, for example in iOS devices packets punted through loopback interface had a mysterious delay of 5 to 7ms.