Optimizing Cache Performance on a Rapidly Growing Site

Cache performance is essential to site performance, but most folks don’t understand their cache at a deep enough level to make proper engineering decisions. Tools like SimCache can help by predicting cache performance ahead of potentially costly ops decisions.

Introduction

Modern web applications depend heavily on caching to maintain site performance and reduce loads on their primary databases. However, in many cases, caching strategies are deployed in an ad hoc fashion, without much understanding of how underlying usage patterns affect cache performance. In practice, developers tend to spin up a cache (typically Memcache) and continue adding capacity until site performance is “good enough.” However, without deeper understanding, capacity planning and performance tuning will become harder as traffic grows or usage patterns change.

Posterous was no exception; early in our history, we began to use Memcache heavily to quell the increasing load on our MySQL servers, sizing our Memcache cluster with simple heuristics. However, as we began to use Memcache in different ways and as our traffic grew, the cache began to act erratically, leading to site performance issues, despite rapidly increasing the size of our Memcache cluster. We realized that understanding how our cache performed given our observed usage patterns was essential for appropriately sizing our cache. To do so, we developed SimCache, a tool for predicting cache performance based on observed usage patterns, and used it to plan the second version of our cache, based on Redis.

Background

As a consumer-oriented blogging platform, Posterous is an extremely read-heavy app. Moreover, our usage patterns are extremely “long-tail”; at any given moment, we’ll serve thousands of requests for a heavily-visited site like the Gap’s consumer facing blog but just a few for Mrs. Henry’s sixth grade class blog.

To serve our normal stream of requests, we had been using a fairly large Memcache cluster to store formatted blog posts. However, cache performance began to act erratically, with wild swings in the speed of some requests that we couldn’t really understand. Moreover, the cache was being asked to serve a growing number of requests:

traffic

Understanding that this was unacceptable, I began working closely with Chris Burnett, another engineer at Posterous, to assess the degradation in cache performance.

Assessing the Situation

The importance of proper logging and measurement cannot be overstated when assessing the performance of a given caching strategy. Without collecting statistics on your cache performance, you’re essentially blind, with no understanding of how well your cache is working, or how it could be improved.

For a typical key => value cache, collecting statistics is pretty easy to implement. Anytime a key is accessed, simply log whether or not the cache request resulted in a hit or a miss:

Feb 28 01:14:54 hit!  key = posts/1432
Feb 28 01:14:54 hit!  key = posts/2442
Feb 28 01:14:55 miss! key = posts/2970
Feb 28 01:14:55 hit!  key = posts/6917
Feb 28 01:14:57 miss! key = posts/9363
Feb 28 01:14:57 hit!  key = posts/2969

Such simple data can reveal a wealth of insights. Most important is the cache’s miss rate: how frequently do we need to regenerate data? It is the miss rate that ultimately impacts site performance. Using such data, we were shocked to discover that we were caching a lot less than we thought, and that our cache actually behaved quite erratically, with a greater than 2x difference between peak and trough miss rates (1 = baseline):

plot1

Using SimCache to Choose a Caching Strategy

Given our initial assessment, it was clear that we would need to increase the size of our cache. But by how much? Could we expect much improvement if we increased the cache size by a third? What about doubling the cache? Would that sufficiently improve site performance? To answer these questions, I wrote a tool called SimCache which would replay our observed cache access patterns against a simulated cache of a given size, measuring how cache size would affect cache miss rates and other important metrics of caching performance. Using SimCache, we tested how cache performance varied if we increased our existing cache size (red) by:

  • 33% (green)
  • 66% (blue)
  • 100% (magenta):

plot2

The data indicated that our cache was too small by a factor of almost 2x. Moreover, the undersized cache was resonsible for the wild swings in miss rate. Keys were evicted from the cache far too soon; as the cache size was steadily increased, the variation in miss rate went down dramatically, leading to better consistency in hit rates from our cache.

Using the simulation results, we increased our cache to the appropriate size. Of course, it is important to collect statistics afterwards to verify if the change had its intended effect. In our case, the results were pretty good. At time=0, the newer cache was inserted, resulting in a spike in miss rate. However, as the larger cache began the fill, the measured cache performance (green points) matched the predicted cache performance (blue line) very well:

plot3

Conclusion

Using the data from SimCache allowed us to understand why our cache performance was degraded and how to improve it. Moroever, by predicting the required cache size ahead of time, we avoided costly “ops iteration” —– i.e., we did not have to add servers, wait to see if site performance improved, add more cache, rinse and repeat. Instead we were able to size our cache appropriatey from the beginning.

Interested in working on problems like this? We’re hiring.

Thanks to J. Hui, C. Burnett, R. Pearson, D. Meredith, and G. Tan for reading and commenting on different drafts of this post.

Making Posterous faster with Varnish

Posterous serves an enormous amount of data from our servers every day. As our site grows, we have needed to think about new and interesting ways to improve performance. So, we spent the last couple months trying to improve performance any way we could. We stripped out much of the inline Javascript that our theming engine generated, we started using asset bundling and compression, and we audited the site for inefficient database queries. 

Finally, we decided to add full page caching to Posterous blogs. This has resulted in one of our largest performance boosts to date. How we accomplished full page caching on Posterous is the subject of today's post.

Enter Varnish

Varnish, often called an "HTTP Accelerator" or a "Reverse Proxy", is the mechanism we chose to speed up Posterous. Until Varnish, we only relied on fragment caching to save precious CPU cycles. Now that Varnish is in place, we are caching entire pages. This takes a huge amount of load off of our application and database servers, and results in noticeable speed increases for our users.

To be precise: our tests have shown that pages served out of Varnish see a ~67% speed improvement in total page load time. 

Configuration

For those not familiar with Varnish, it is configured using a file known as a "VCL (Varnish Configuration Language)" file. VCL resembles nginx config files, and is divided into subroutines (e.g. vcl_hash, vcl_recv, vcl_pipe). 

You can learn more about VCL here.

We have included the actual VCL file we are using at the bottom of this post.

Dynamic content

While implementing a full page cache, we had quite a few challenges. The first and most important one was how we were going to render dynamic content on pages that are statically cached. For example, in the upper right corner of Posterous sites, there are elements that are unique to each user (like the list of their Posterous sites, their name, etc.). Since we only cache one version of a page, we cannot include this personalized information in the page that gets cached.

We addressed this by gathering information about the static page being served, and made a secondary AJAX call to our servers to account for a user's logged-in state, post view counts, and other dynamic content. Since this AJAX call is significantly less processor-intensive, and it is only made after the page finishes rendering, users see a significant increase in performance.

To make our lives a little easier, we also removed all inline Javascript generated by theme elements. We used HTML5's new data- attributes to add information to HTML elements directly so they can be more efficiently interpreted by Javascript. 

Private posts on blog index pages

Another challenge we faced was allowing site owners and contributors to view private posts within their blog list pages. Since we want to store only one version of a page in the cache, our only alternative was to disable caching if a user is looking at one of his/her own sites. VCL alone didn't have any mechanisms for accomplishing this, so we turned to a really powerful feature of VCL: the ability to embed C code directly in the configuration file.

This snippet did the trick:

C{
  char *host = VRT_GetHdr(sp, HDR_REQ, "\005Host:");
  char *cookie = VRT_GetHdr(sp, HDR_REQ, "\007Cookie:");
  char* result = NULL;
  if (cookie == NULL) {
    cookie = "";
  }
  if (host == NULL) {
    host = "";
  }
  result = strstr(cookie, host);
  if (result != NULL) {
    VRT_SetHdr(sp, HDR_REQ, "\013X-No-Cache:", "YES", vrt_magic_string_end);
  }
}C

if (req.http.X-No-Cache ~ "YES") {
  return(pass);
}

Essentially, we write to a users cookies the list of all sites they own. If the currently-viewed hostname (the current site) matches this cookie, we simply bypass the cache.

We did it this way because even if a malicious user spoofed this cookie, the worst that could happen is that they would see the non-cached version of the page, and since the malicious user isn't actually logged in as the site owner, they won't see any of the private posts.

Lacquer

On the back-end, we are using a gem called Lacquer. The gem was originally developed by Russ Smith (original gem). Lacquer communicates with the Varnish administration port to deal with the purging of stale pages. It also makes it easy to instruct Varnish that an outbound page should (or should not) be cached.

We found there were several limitations and a few bugs with the original gem, so we forked it and added some enhancements, including the ability to purge to multiple Varnish servers in a performant way. You can check out our forked version here.

Wrapping up

At Posterous, we are always looking for awesome ways to improve our service. Varnish is just one of the many enhancements we have added, and we will be writing about other ones very soon.

As always, we love feedback! If you have any suggestions for us, or questions about Varnish, please don't hesitate to ask.

If you found this post interesting, know that Posterous is hiring Infrastructure Engineers, Front-end Engineers, and more.

Reference

Here is the Posterous VCL file, in its full glory:

#-e This is a basic VCL configuration file for varnish.  See the vcl(7)
#man page for details on VCL syntax and semantics.
#
#Default backend definition.  Set this to point to your content
#server.
#

C{
  #include <string.h> 
  #include <stdlib.h> 
  #include <stdio.h>
}C

backend default {
.host = "127.0.0.1";
.port = "8282";
.max_connections = 2000;
.connect_timeout = 600s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
}

sub vcl_hash {
  ### these 2 entries are the default ones used for vcl. Below we add our own.
  set req.hash += req.url;
  set req.hash += req.http.host;

  # This will make sure that the mobile version of our site gets cached under a different hash
  if (req.http.cookie ~ "mobile_view=true") {
    set req.hash += "mobile";
  }
  
  if (req.http.user-agent ~ "(?i)palm|blackberry|nokia|phone|midp|mobi|symbian|chtml|ericsson|minimo|audiovox|motorola|samsung|telit|upg1|windows ce|ucweb|astel|plucker|x320|x240|j2me|sgh|portable|sprint|docomo|kddi|softbank|android|mmp|pdxgw|netfront|xiino|vodafone|portalmmm|sagem|mot-|sie-|ipod|up\\.b|webos|amoi|novarra|cdm|alcatel|pocket|iphone|mobileexplorer|mobile" && !(req.http.user-agent ~ "(?i)ipad") && !(req.http.cookie ~ "full_site=true")) {
    set req.hash += "mobile";
  }
  
  return(hash);
}

#
# Handling of requests that are received from clients.
# First decide whether or not to lookup data in the cache.
#
sub vcl_recv {  
  # Pipe requests that are non-RFC2616 or CONNECT which is weird.
  if (req.request != "GET" &&
      req.request != "HEAD" &&
      req.request != "PUT" &&
      req.request != "POST" &&
      req.request != "TRACE" &&
      req.request != "OPTIONS" &&
      req.request != "DELETE") {
    return(pipe);
  }

  # Pass requests that are not GET or HEAD
  if (req.request != "GET" && req.request != "HEAD") {
    return(pass);
  }

  # Pass requests for blog pages greater than page 3
  if (req.url ~ "page=([4-9]|[1-9][0-9]+)$") {
    return(pass);
  }

  # Never cache private posts
  if (req.url ~ "\/private\/") {
    return(pass);
  }
  
  # Don't cache the result of a redirect
  if (req.http.Referer ~ "jumpto" || req.http.Origin ~ "poster") {
    return(pass);
  }
  
  # Since we don't site owners and contributors to view the 
  # cached version of their site, we match a special cookie we set
  # with the current host. This assures that site owners see
  # private posts, while other users do not.
  C{
    char *host = VRT_GetHdr(sp, HDR_REQ, "\005Host:");
    char *cookie = VRT_GetHdr(sp, HDR_REQ, "\007Cookie:");
    char* result = NULL;
    if (cookie == NULL) {
      cookie = "";
    }
    if (host == NULL) {
      host = "";
    }
    result = strstr(cookie, host);
    if (result != NULL) {
      VRT_SetHdr(sp, HDR_REQ, "\013X-No-Cache:", "YES", vrt_magic_string_end);
    }
  }C
  
  if (req.http.X-No-Cache ~ "YES") {
    return(pass);
  }

  #
  # Everything below here should be cached
  #

  # Handle compression correctly. Varnish treats headers literally, not
  # semantically. So it is very well possible that there are cache misses
  # because the headers sent by different browsers aren't the same.
  # @see: http://varnish.projects.linpro.no/wiki/FAQ/Compression
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      # if the browser supports it, we'll use gzip
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      # next, try deflate if it is supported
      set req.http.Accept-Encoding = "deflate";
    } else {
      # unknown algorithm. Probably junk, remove it
      remove req.http.Accept-Encoding;
    }
  }

  # Clear cookie and authorization headers, set grace time, lookup in the cache
  #unset req.http.Cookie;
  #unset req.http.Authorization;
  set req.grace = 1s;
  return(lookup);
}

#
# Called when entering pipe mode
#  
sub vcl_pipe {
  # If we don't set the Connection: close header, any following
  # requests from the client will also be piped through and
  # left untouched by varnish. We don't want that.
  set req.http.connection = "close";
  return(pipe);
}


#
# Called when the requested object has been retrieved from the
# backend, or the request to the backend has failed
#
sub vcl_fetch {
  # Comments are now fetched via ESI.
  esi;
  
  # Do not cache the object if the backend application does not want us to.
  if (beresp.http.Cache-Control ~ "(no-cache|no-store|private|must-revalidate)") {
    return(pass);
  }

  # Do not cache the object if the status is not in the 200s
  if (beresp.status >= 300) {
    # Remove the Set-Cookie header
    #remove beresp.http.Set-Cookie;
    return(pass);
  }

  #
  # Everything below here should be cached
  #

  # Don't cache the comments ESI
  if (req.url ~ "\/posts\/comments") {
    set beresp.ttl = 0s;
  }

  # Remove the Set-Cookie header
  remove beresp.http.Set-Cookie;

  # Set the grace time
  set beresp.grace = 1s;

  # Static assets aren't served out of Varnish just yet, but when they are, this will
  # make sure the browser caches them for a long time.
  if (req.url ~ "\.(css|js|jpg|jpeg|gif|ico|png)\??\d*$") {
    /* Remove Expires from backend, it's not long enough */
    unset beresp.http.expires;

    /* Set the clients TTL on this object */
    set beresp.http.cache-control = "public, max-age=31536000";

    /* marker for vcl_deliver to reset Age: */
    set beresp.http.magicmarker = "1";
  } else {
    set beresp.http.Cache-Control = "private, max-age=0, must-revalidate";
    set beresp.http.Pragma = "no-cache";
  }

  # return(deliver); the object
  return(deliver);
}

sub vcl_deliver {
  if (resp.http.magicmarker) {
    /* Remove the magic marker */
    unset resp.http.magicmarker;

    /* By definition we have a fresh object */
    set resp.http.age = "0";
  }   

  # Add a header to indicate a cache HIT/MISS
  if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT";
  } else {
    set resp.http.X-Cache = "MISS";
  }
}