Leon Fayer (@papa_fire) currently leads engineering at Teaching Strategies. Leon has over two decades of expertise concentrated on architecting and operating complex, web-based systems to withstand crushing traffic (often unexpectedly). Over the years, he's had a somewhat unique opportunity to design and build systems that run some of the most visited websites in the world and has the opinion that nothing really works until it works for at least a million people.
Every type of content was not created equal. Nor should it be served the same way. At a high level, you can separate your content into 3 groups:
- infinitely static — image is a perfect example of this. Once an image has been published, you should never update it (because multiple reasons) and should be able to cache it almost indefinitely.
- temporarily static — this covers the content that renders into static output. CSS, articles, etc. The content that may change in the future (requiring you to purge cache) but generally can be cached on demand for extended period of time.
- dynamic — the content that is individual for each user/load. Personalization, user profile/history, top/recent content — all these cannot be cached for any extended period of time and require re-computation or rendering on each request.
Not only those groups should be treated differently from cache strategy perspective, but they should also be treated differently when serving them from origin.
The overhead
This applies to any web server (nginx, node.js, Apache), but for the ease of example, let’s use Apache. Chances are, your httpd.conf
starts with something like this (with correction to language used).
# load modules LoadModule php_module /opt/apache22/libexec/amd64/mod_php.so LoadModule apreq_module /opt/apache22/libexec/amd64/mod_apreq2.so LoadModule apreq_module /opt/apache22/libexec/amd64/mod_ssl.so # serve on port 80 for non-ssl content Listen 0.0.0.0:80 # setup mod_rewrite engine RewriteEngine On RewriteLogLevel 4 RewriteLog /www/logs/apache/rewrites.log # validate against rewrite checks Include /www/etc/httpd-rewrite-global.conf # define server/thread limits StartServers 5 ServerLimit 40 MaxClients 40 MinSpareServers 5 MaxSpareServers 10 # if the file doesn't exist is static directory # process dynamic content through the dispatcher function RewriteCond /www/htdocs/static/$1 !-f RewriteRule /(.*)$ /www/htdocs/dynamic/dispatcher.php [L] # if file is static and is found - serve it RewriteRule /(.*)$ /static/$1 [L]
Most people don’t think about it, but with this setup, every request is going to load (or use) required modules and try to compare the requested URL to a (usually) long list of rewrite rules (that have a tendency to accumulate rather quickly). So the question you have to ask yourself is — does serving logo.png
require loading mod_php
module and does it need to be matched against dozens of legacy URL strings the business decided to keep active after the last redesign? Or, in fewer words, does every static asset need the same overhead as dynamic content? The answer is — no, no it does not.
Furthermore, is the service/thread configuration you use for your dynamic content is the same as you would need to serve your assets from origin? And again, the answer is likely no.
Separation of responsibility
Separating the serving of different types of content allows you to optimize and tune serve time for each individual type of content. The way to accomplish it is to have individual web servers running for each type of content that requires individual optimization. For example, instead of having a generic httpd.conf
that listens to requests on port 80, you need to create httpd-static.conf
listening on port 80 and httpd-dynamic.conf
listening on different port (let’s say 8081).
httpd-static.conf
Listen 0.0.0.0:80 # static content configuration parameters StartServers 2 ServerLimit 7 ThreadLimit 1200 ThreadsPerChild 200 MaxClients 1200 MaxRequestsPerChild 0 MaxSpareThreads 1200 RewriteEngine On # If it’s a local static file, go ahead and serve it, # otherwise send to dynamic web server RewriteCond /www/htdocs/static/%{REQUEST_URI} !-f RewriteRule ^(.*)$ http://localhost:8081/$1 [P,L]
httpd-dynamic.conf
# load modules LoadModule php_module /opt/apache22/libexec/amd64/mod_php.so LoadModule apreq_module /opt/apache22/libexec/amd64/mod_apreq2.so LoadModule apreq_module /opt/apache22/libexec/amd64/mod_ssl.so # serve on port 8081 for non-ssl dynamic content Listen 0.0.0.0:8081 # setup mod_rewrite engine RewriteEngine On RewriteLogLevel 4 RewriteLog /www/logs/apache/rewrites.log # validate against rewrite checks Include /www/etc/httpd-rewrite-global.conf # define server/thread limits StartServers 5 ServerLimit 40 MaxClients 40 MinSpareServers 5 MaxSpareServers 10 # serve content through dispatcher RewriteRule /(.*)$ /www/htdocs/dynamic/dispatcher.php [L]
Use httpd-static.conf
as a gateway, to quickly serve static content and pass on the dynamic content to httpd-dynamic.conf
to do the processing with minimal proxy overhead.
You can configure static content to also honor individual extensions instead of files from certain directory. And in addition to having separate configuration parameters, tuned to individual types of content, this allows to easily add different logging/monitoring/caching rules to different types of content.