Creating high performance dynamic web sites with the Varnish HTTP accelerator

This article offers an overview of the Varnish HTTP accelerator: what it is, briefly how it works and how it can benefit you, the webmaster.

The first half of this article introduces the reader to basic web caching concepts. This text is not Varnish-specific and is best skipped if you are already familiar with caching as it is used on the web. The latter half of this article delves into the technical considerations we face during most cache deployments. This article, especially the latter technical components, should be considered a work in progress. It is hoped that we will be able to publish and discuss full VCL listings for some of our more sophisticated deployments here in the future.

What is Varnish and how can it help me?

Varnish is a modern, high-performance, open source caching reverse HTTP proxy implementation. Generally speaking, a proxy is an intermediate hop between you, the user (or the web browsing software running on your PC, also known as an user agent) and an origin web server, such as the Apache HTTP Server or Microsoft's IIS. Unlike routers, reverse proxies operate at a higher layer of the OSI model (HTTP sits up in the Application layer); however, directing traffic is still a feature of their repertoire. Forward proxies are deployed close to a small set of users (say, on a university campus or by your Internet service provider) for the sole benefit of that group of users. Reverse proxies, by comparison, are deployed close to the origin web server and are designed with a different set of goals in mind. A caching reverse proxy is simply any reverse proxy that is capable of remembering information (such as HTML pages or JPEG images) across requests, with an intent to use this cached information to satisfy future requests with minimal latency. Varnish, like other caching reverse HTTP proxy implementations, is most frequently used to alleviate origin web servers of undue load, giving you greater capacity or the ability to handle a higher number of concurrent hits.

All non-trivial web sites developed over the past few years are using dynamically-generated content in some form or another. Logic implemented using complex, high-level interpreted, or bytecode-compiled, languages and frameworks (such as PHP, Java, .NET assemblies, Python and Ruby) take time — and usually one or more database queries — to execute. How much time will depend largely on the nature of your code (which also happens to be why it is so difficult for us to quote server capacity in hard numbers). Chances are that unless your non-trivial web application was designed from the get-go by a crew of high-performance web boffins, your server is likely to fall over when faced with a flood of web requests. Here's where a solution like Varnish can help. Placing a cache in front of your web server can help hide deficiencies in your application's code or database design that are otherwise too difficult or expensive to address directly. Throwing inexpensive hardware at a problem is often easier than performing a complete rewrite of your application's back-end code (mind you, this is not an unconditional fix for all ailments red and blue).

Varnish (the cache) sits right in front of your web server, still within the Anchor network border. If you happen to operate a cluster of dedicated servers with us, Varnish is capable of load balancing requests between your origin machines whilst keeping tab of which ones are unavailable (down for scheduled maintenance or hardware failure, for instance). For most high-traffic sites, it makes sense to situate the cache on its own hardware, which typically need not be anywhere near as powerful as the origin server. For this reason, when planning how to scale up your web infrastructure, it may make more sense to add a small, yet efficient, cache server before moving to a costly cluster of application or database servers. The remainder of this article will assume you are operating only a single origin web server, although the ideas discussed here apply equally well to clustered architectures.

Sadly, life isn't all fun and games. Content caching is not for everyone. Remember that by placing a cache in front of the origin server, you are intentionally serving slightly out-of-date web content to your visitors. This is often not a problem for content like your company logo or CSS stylesheet files, but if your core business revolves around providing nearly up-to-the-second-accurate financial data to your visitors, a cache like Varnish may not be able to sustain hit rates high enough to provide a noticeable gain in interactive browsing performance or capacity. Objects that are only held in the cache for a very short period of time before expiring do not amass enough hits to justify the existence of the cache. Web caching, like so many other topics in optimisation and performance, is a game of trade-offs. Varnish can cache as aggressively as your business requirements will allow.

Terminology

  • Cache
    With respect to this article, a cache is a server-side software device used to accelerate interactive web browsing. Varnish is one particular cache implementation.

  • Cache consistency
    Caching is a loose form of data synchronisation. A cached copy of data is likely to fall behind updates made to the master copy unless a method of cache consistency is employed. Here, consistency refers to the extent of differences between the cached and master copies.

  • Empty cache
    A cache will be empty upon initial start up. During this initial period it will be unable to serve many objects from its cache and will thus suffer from a poor hit rate. See also: primed cache.

  • Fresh object
    Related to cache consistency. A fresh object is any object whose time-to-live counter has not yet expired. Objects will typically only move from being fresh objects to stale objects with the passage of time.

  • Hit
    Any request that is satisfied with a response taken from the cache.

  • Miss
    Any request that is not satisfied with a response taken from the cache. With every miss, the cache must fall back on querying a origin server. This is an unavoidable, sub-optimal condition.

  • Hit rate
    A ratio of hits to total incoming requests. A higher value indicates better utilisation of the cache and higher load transference from the origin servers.

  • Origin server
    Where your web content is resident. The cache fetches its content from here.

  • Primed cache
    A primed cache is any cache that has been running for sufficiently long enough to build up a large set of cached objects. The cache will be operating at its peak during this period. See also: empty cache.

  • Stale object
    Related to cache consistency. A stale object is any object whose time-to-live counter has expired. A stale object can only be made fresh again through validation.

  • Time-to-live
    A counter maintained for all cached objects in order to enforce cache consistency. An object's time-to-live (or TTL) is highest immediately after it is fetched or validated from an origin server and lowest just prior to expiry.

  • Validation
    A process conducted between the origin server and the cache to check the validity of cached objects. Any changes made to the master copy will be exposed during validation.

Caching and SSL/TLS

It is obviously not possible to sit a cache between a pair of SSL/TLS end-points (the origin server and the client's web browser). SSL and TLS were engineered to withstand man-in-the-middle (eavesdropping) attacks, so this 'limitation' is simply by design. Caching secure content is made possible by moving the server-side SSL/TLS end-point from the origin web server to the Varnish cache host. Open source programs like stunnel make trivial work out of this task.

How quickly can I get started with Varnish?

Anchor Systems Administrators have successfully deployed simple Varnish configurations in under two and a half hours for emergency 'capacity boosts' in the past. The scope of the work will vary with the complexity of your web application and target hit rates. Properly planned and tested deployments are likely to span a matter of days or weeks, with e-mails and calls thrown back and forth between our systems administrators and your web developers. The best results are often only realised with changes to the application itself (see below), obviously lengthening the project if such changes are deemed worthwhile.

What is important to note is that the switch from your original architecture to the cache need not involve any downtime and is instantaneously controllable, like a light switch. The illustration below explains how we implement the switch-over with no change to your DNS records or your server's IP addressing:

Integrating your web applications with the cache

Properly deploying a cache in front of your web server is not as simple as installing a handful of packages, ensuring network connectivity and walking away, hoping the cache will magically boost your customers' browsing experiences. Many fine details of the HTTP will dictate how effective (or otherwise) your cache will be. While it is not possible to address all the finer points of such a complex protocol in this short article, we have made note of some of the more obvious targets, below.

HTTP

Related headers

The HTTP/1.0 specification was largely drafted up and solidified before the widespread adoption of web caches; particularly, shared caches. As such, it is mostly void of useful cache control directives. The HTTP/1.0 Expires header is one notable exception and does more or less what it advertises. By responding with this header, a server can include the validity period for an object. Caches must consider an object stale if this period elapses. Stale objects cannot be used without revalidation by the origin server. Varnish will take the Expires header into account when calculating whether or not it is appropriate to cache an object from a origin response, but will not automatically modify the Expires header when you make changes to an object's TTL (obj.ttl) in VCL.

HTTP/1.1, a revision to the original HTTP/1.0 protocol specification, made an allowance for the Cache-control header. It can be thought of as a superior replacement to HTTP/1.0's Expires header. (Although nothing stops you from implementing both on the same server for the sake of compatibility. HTTP/1.1 clients should always treat the Cache-control header with higher priority.) Presented below is a rough overview of the various options to Cache-control. For a complete reference, refer to the RFC document.

  • Cache-control: max-age: The HTTP/1.1 replacement for the Expires header. While Expires specifies the validity period using a full time stamp (valid until X day at Y time), max-age uses a relative system (valid for Z many seconds) to avoid potential problems with time zones and unsynchronised clocks.

  • Cache-control: no-cache: Responses may be stored, but must not be reused without revalidation. (Revalidate with every request.) Field names can be supplied to no-cache. Cache-control: no-cache="Set-cookie", for instance, should be used to signal that re-use of the cookie information sent from the server (this is usually session-specific information and is not shared between visitors) is forbidden. In this example, the cache would be free to satisfy future requests using other parts of the response.

  • Cache-control: private: User agents are free to cache responses. Intermediate shared caches (like Varnish) are forbidden from caching the response. Dynamically-generated pages intended for a single user should include this header in the response.

  • Cache-control: public: Open the cache gates Free-for-all cache fest.

  • Cache-control: no-store: Originally designed to prevent sensitive data from being written to non-volatile storage, this directive is now used to switch off caching at the wall for both intermediate caches and user agents.

Also of interest is the Vary header, which can lead to undesired double-caching (or more) of single objects if used too liberally. Both the RFC and the Varnish wiki go on to discuss this header in more detail.

Varnish does not support all these HTTP/1.1 extensions out of the box. (max-age, as discussed in RFC 2616, is respected with no additional work.) Any missing support can be added through the use of the VCL configuration language. VCL may also be used to inject HTTP headers at will.

One of the first problems we encounter on a new Varnish deployment is a 0% cache hit rate. Many existing application servers were designed with an anti-cache mentality, which is fine if consistency is your primary goal. On closer analysis, it becomes clear that the HTTP headers sent in the responses from the origin server were crafted using fields described above in such a way as to block Varnish from doing its job. The proper fix to such a problem is to have the origin server relax its attitude towards caching. Microsoft have published an article describing cache control for ASP and IIS. Similar documentation is available for the Apache HTTP Server. Ideally, the origin server's cache control should be granular enough to signal different cache behaviour based on the object requested. If that level of granularity is simply not available, it is always possible to override Varnish's default (safe) caching behaviour to achieve the desired results.

Parallelism

Web pages are composed of multiple objects. X/HTML markup, CSS stylesheets, images and script files are all common objects. Web servers and clients do not transfer complete pages; they transfer individual objects, which are then assembled together at the finish line before being displayed as a complete page to the user. (Technically, this is not entirely true, but do go along for the sake of simplicity.)

Traditionally, these objects were transferred in a serial fashion from server to client. A client would first make a request for a page's markup. After having been transferred, this markup would give the client a list of other objects to fetch (in src= attributes), which it would proceed to do one by one. Slowly.

Some modern web browsers are capable of making parallel connections to a web server (or group of web servers) in order to request more than one object at a time. After implementing a cache, it is in your best interests to maximise this parallelism to make better use of the new resource. Reverse proxies like Varnish are usually capable of handling far greater concurrent loads than an application or web server already tied down with dynamic content generation workloads.

HTTP/1.0 and HTTP/1.1 clients enforce per-server concurrency or connection limits. HTTP/1.1 — the version of the protocol you are likely to be using — limits clients to two (2) concurrent, persistent connections per proxy or origin server. Microsoft Internet Explorer 7 (and earlier) make up to two (2) concurrent HTTP/1.0 (the older revision of the protocol) connections to a single host. IE 8 bumps this limit up to six (6). Mozilla Firefox also ships with a default limit of six (6) for HTTP/1.0. Also worth mentioning is the fact that while most modern web servers do support HTTP/1.1 pipelining, the most popular web browsers in use do not make use of this optimisation by default. HTTP/1.1 pipelining is disabled by default on Firefox and not supported in IE 7. Opera, on the other hand, supports HTTP/1.1 pipelining out of the box.

Now while there is nothing you can do to directly override these limits, it is possible to trick user agents (web browsers) into overcoming them by divvying your site content up between different host names. For example: css.domain.com images.domain.com, and media.domain.com. By pointing all host names at the same IP address and configuring them as host-based aliases in the origin web server, user agents are free to make up to two HTTP/1.1 connections per host name, raising the concurrency limit proportionally by the number of host names you have configured. Varnish can be told about this type of arrangement through the use of host normalisation. Remember that without configuring the cache appropriately, you will actually be damanging your potential as Varnish will be caching multiple copies of the same object; one per alias.

Crafting up additional host names is not without a downside. This approach is yet another trade-off: additional host names will result in more delays as clients wait for replies to DNS lookups. Having said that, this approach should still have a net beneficial effect for two to four (2–4) host names given the locality of most DNS servers to the client in comparison to your web server. Clients will typically have a dedicated DNS cache server within three hops' reach (run by either their ISP or their organisation's IT Department). Both Firefox and Internet Explorer will cache DNS records for their own use before falling back on the underlying resolver routines of the operating system which, again, will typically maintain a cache. Simply put, it is likely to be cheaper for the client to fetch (and cache) a couple more DNS A records than it is to wait for a free HTTP connection.

  • Scripts block parallel connections. Each script is downloaded serially and blocks all other HTTP connections. It makes sense to place scripts as late in your pages as possible.
  • Don't inline JavaScript and CSS. This type of content is ripe for caching. By splitting up your content, it becomes possible to set high cache lifetimes for static stuff (like CSS) and have your volatile content (dynamically-generated HTML) cached for only short periods, or not at all.

  • Consider removing the Etags (entity tag) header from the responses sent back by your web server. They can be counter-productive in multi-server clusters and often duplicate functionality that is already available in the form of the Last-Modified header.

Varnish's management interface

Varnish binds to two TCP sockets when starting up. One must obviously be the socket used for HTTP communications with user agents (web browsers). The other is a little more interesting. Somewhat like Squid, Varnish maintains an out-of-band management communications channel that can be used to monitor and tweak the cache during normal operation. (However, unlike Squid, Varnish does not overload a single TCP socket with multiple roles, making it a little easier to filter management connections at the network layer.) Cache administrators may communicate with Varnish simply by telneting to the management port (usually 127.0.0.1:6082) and issuing text-based commands like

vcl.list 

Owing to the simple TCP socket and plain text protocol, your applications can communicate with varnishd (the Varnish cache daemon) in precisely the same, simple manner.

Dynamic server-initiated cache invalidation

Now the usual way to maintain cache consistency on the web is through the use of time-to-live metadata (see the Cache-control and Expires headers in HTTP/1.1 and 1.0, respectively). We had a particular use case at Anchor for active server-initiated cache object invalidation. Thankfully, Varnish made this task absolutely trivial. By implementing a simple Java TCP client (the customer's application was Java-based), our client's developers were able to have cache invalidations automatically fired off as it suited them. All that was required was the following short command sent from the application to Varnish over the management channel:

purge.url <REGEX>

<REGEX> is any regular expression. This will be dynamically generated by your application. Varnish will return with the string:

200 

to signify success.

Remember that Varnish's management socket is bound to the localhost by default for good reason. There is no application-layer authetication applied to management connections, so it is absolutely imperative that they are thoroughly secured at the network layer if the cache server is not also your application server. A packet filter with source address filtering would be considered a minimal requirement. Ideally, such management traffic should be routed over a secured, dedicated VLAN set up specifically for the cache and application servers.

Negative impacts of implementing a cache

Information leak

Cache implementations should always be thoroughly tested prior to being rolled out to a production environment. While, in theory, a default Varnish configuration should not land you in hot water, it is not inconceivable for a heavily customised configuration to return totally inappropriate responses to a client's request. When overriding Varnish's inbuilt safety mechanisms (that should read: when intentionally violating the HTTP protocol specification in order to work-around a broken origin server implementation) it is highly likely that user sessions will be accidentally cross-contaminated, leading to an obvious and highly undesirable information leak.

Incomplete web statistics

For our customers who have not yet moved onto advanced web traffic analysis services like Google Analytics, the reports produced by their existing log-based analysers (we support both AWStats and The Webalizer) will fall horribly out of whack with reality. As the nature of the cache will prevent many requests from hitting the origin web server, such analysers will have an incomplete view of your traffic with which to go on. varnishncsa, an included logging daemon that compliments the varnishd cache server, can be used to log requests to files in a format readable by most log parsers. These logs can be shipped from the cache server as part of a daily cron job for statistics generation.


References/external links