Web Beacons have been used by site developers to understand the behavior of customers. These Beacons are used, among other things, to count the users who visit a web page, track scrolling within the page, or count clicks on a particular ad/video etc.
The Problem
Sometimes these beacons can either take too long or too many beacons are fired from the page, thereby slowing down the performance of the site.
Most beacon servers normally have an proxy server fronting them, so all requests pass through them. Lets look at what we can do to minimize the impact of these Beacons on the site performance by using the proxy.
Figuring out what is Important
A simple version of a beacon is a tiny clear image that is the size of a pixel. When a web page with this image loads, it will make a call to a server for the image. These clear GIFs are invisible because they record specific activity on a web page rather than deliver content. So what is important is the “recording” part and what is irrelevant is the response from the beacon server (because it stays the same) back to the client.
What can we do to return this response faster?
The Secret Sauce
The first ingredient is by using stale-while-revalidate in the Cache-Control HTTP header. We can instruct the proxy to respond with a cached copy of the beacon response. We can specify a large value (in seconds) for stale-while-revalidate to ensure that this gives real benefits.
The returned Cache-Control header will contain the following: "stale-while-revalidate={big_number}"
. But what about the “recording”? If we specify stale-while-revalidate, the proxy will not call the server if the cached copy is still fresh (which is determined by “max-age”). But it will asynchronously call the server if the copy is stale. So the second ingredient is to specify the max-age in the Cache-Control header as 0, thereby making the copy stale always. The resulting Cache-Control header will look like the following:
Cache-Control: max-age=0, stale-while-revalidate={big_number}
So all calls to the proxy will result in 2 things:
- Respond with a cached copy of the response right away
- Asynchronously call the server and “record” the action
But wait there is more! This solution can be enhanced further by adding few more things
Stop overwriting cache entries
Since the HTTP response code from the beacon server on a successful call will be a 200 the cache entry will be continuously updated for each asynchronous call. This is unnecessary since we already know that the response doesn’t change. Besides that writing to the cache continuously can be costly. So once a copy of the response is cached, we can need to do something to trick the proxy into not updating the cache.
Use the same cache copy for all URLs to a particular beacon server
Calls to the same beacon server can be different based on the URL query parameters. We can enhance the solution by using the same cache key for all these calls. This saves space in the cache.
Cache pinning
We can prevent the old cached entries from cleanup in the proxy by using options such as Cache Pinning (in ATS) which ensures that certain objects stay put in the cache for a given amount of time.
Results
We did a test on one of the sites in Yahoo and saw an improvement close to 90% in the beacon response times.
As you can see, we can improve the performance of the site greatly by creatively using a proxy and making no change to the beacon servers themselves.
In conclusion
At Yahoo, we use Apache Traffic Server (ATS) as a proxy fronting almost all of our sites and beacon servers. ATS has the flexibility (via configurations and plugins) to allow us to implement all of the steps mentioned above. The results published above were tested against a Yahoo beacon server fronted by ATS.
Note: The open source stale-while-revalidate (SWR) plugin is currently not working, but Yahoo has a working version of this plugin. We intend to either contribute towards making SWR work in ATS or submit our plugin to open source soon.