Posterous API status
API requests are taking longer than usual due to a surge in post volume. We're working on a longer term solution to add capacity and apologize for the inconvenience. We'll keep you updated as we go.
API requests are taking longer than usual due to a surge in post volume. We're working on a longer term solution to add capacity and apologize for the inconvenience. We'll keep you updated as we go.
To build ubuntu physical ubuntu servers we use ubuntu preseed.
This works great but if you use a static preseed file you end up building a host that doesn’t have its hostname or static ip address set. This means that you have to manually set it afterward and we decided to automate it.
BTW it took us a while to figure out how to set a static ip in a preseed file. We blogged about it here: network-preseeding-debianubuntu-with-a-static
To do this we wrote a small sinatra app that dynamically generates the preseed file with the hostname and static ip address.
This is done by looking up the mac address of the requested host from the arp table and comparing it to a pipe delimited file that contains the mac address, what the static ip should be and its hostname.
The list is stored in a file named ip2mac.txt and was populated by a script.
The ip2mac.txt file looks like this:
172.28.0.71|a4:ba:db:35:e6:09|chi-devops11a 172.28.0.72|78:2b:cb:03:c5:44|chi-devops11b
Instead of calling a static preseed file from the pxelinux.cfg/default file we instead make a request to the sinatra app which generates it dynamically. The line in the default file we use looks like this:
append console=tty0 console=ttyS1,115200n8 initrd=ubuntu-10.04-server-amd64- initrd.gz auto=true priority=critical preseed/url=http://172.27.0.115:4567/lucid-preseed-noraid interface=eth0 netcfg/dhcp_timeout=60 console-setup/ask_detect=false console-setup/layoutcode=us console-keymaps-at/keymap=us locale=en_US --
When the request is made the sinatra app does the following:
* 1. looks up the mac address of the request from the apr table * 2. compares the mac address to the matching line in ip2mac.txt * 3. uses the ip and hostname to populate hostname and ip variables in the preseed file * 4. returns the preseed file to the host making the request
The code:
require 'rubygems' # skip this line in Ruby 1.9 require 'sinatra' require "erb" require 'logger' def log(message) flog = Logger.new('foo.log') flog.info(message) end def lookup_mac(mac) rr = Array.new hostfile = File.open("ip2mac.txt","r") hostfile.readline hostfile.each do |line| list_ip,list_mac,name = line.split('|') if mac.match(list_mac) rr.push(list_ip) end end return rr[0] end def get_mac_address() ip = @env['REMOTE_ADDR'] cmd = "arp -n " + ip.chomp + " | grep -v Address | awk '{print \$3}'" mac = `#{cmd}` return mac end def rev_lookup(ip) cmd = "host " + ip + " | awk '{print \$5}'" hostname = `#{cmd}` fqdn = hostname.chop.chop return fqdn end get '/lucid-preseed-noraid' do mac = get_mac_address() log(mac) ips = lookup_mac(mac) log(ips) fqdns = rev_lookup(ips) @ip = ips @fqdn = fqdns log(fqdns) erb :lucid_preseed_noraid end get '/lucid-preseed-nosrv' do mac = get_mac_address() log(mac) ips = lookup_mac(mac) log(ips) fqdns = rev_lookup(ips) @ip = ips @fqdn = fqdns log(fqdns) erb :lucid_preseed_nosrv end get '/' do "ops11" end
To start the sinatra app just run the following:
ruby preseeder.rb
I had a hard time figuring this out and there seems to be lots of conflicting information out there so I thought I'd write down my thoughts about this immediately after I figured it out.
I was trying to get a fully automated network install of Ubuntu 10.04 working with a static ip address that gets set up in the preseed file. My entire preseed was working but the static ip address was not. The example preseed has the following entries:
# If you prefer to configure the network manually, uncomment this line and # the static network configuration below. #d-i netcfg/disable_dhcp boolean true # If you want the preconfiguration file to work on systems both with and # without a dhcp server, uncomment these lines and the static network # configuration below. #d-i netcfg/dhcp_failed note #d-i netcfg/dhcp_options select Configure network manually # Static network configuration. #d-i netcfg/get_nameservers string 192.168.1.1 #d-i netcfg/get_ipaddress string 192.168.1.42 #d-i netcfg/get_netmask string 255.255.255.0 #d-i netcfg/get_gateway string 192.168.1.1 #d-i netcfg/confirm_static boolean true
Great...so I should just be able to uncomment the disable_dhcp and the Static network configuration entries and it should work right? WRONG. I went through various permutations and could not get it to work. Finally, I broke down and decided to RTFM and found this:
http://d-i.alioth.debian.org/manual/en.amd64/apbs04.html#preseed-network
It reads:
“Although preseeding the network configuration is normally not possible when using network preseeding (using preseed/url”), you can use the following hack to work around that, for example if you'd like to set a static address for the network interface. The hack is to force the network configuration to run again after the preconfiguration file has been loaded by creating a “preseed/run” script containing the following commands:
killall.sh; netcfg
So....I added the following to my preseed.cfg file
d-i preseed/early_command string /bin/killall.sh; /bin/netcfg
and boom the static network config works. Yay!
Man, it's been a while since we've updated this Space. Well, no more! I'm here to tell you a little bit about the technologies behind the Posterous Spaces redesign.
As you may have noticed, Posterous has taken on a new name, and a new look. To go along with those cosmetic changes, we've rewritten our app from the ground up to take advantage of cutting edge technologies. Using these technologies has allowed our team to work at a feverish pace to deliver you a Posterous that is faster, more engaging, and more fun!
At the core of our new stack is our very own Posterous API. This has effectively made us the biggest consumer of our own API. That's pretty nifty, but I'm not going to talk about the API today.
To interact with the API, we've used some awesome front-end technologies. Among all the awesome stuff we've been able to work with while creating Spaces, the most notable are Backbone, CoffeeScript, Haml.js, Sass, and Compass. Today I will touch on our use of Backbone.
For those unfamiliar, Backbone is an MVC-esque framework for JavaScript. It separates large JavaScript applications into models, views, collections, and routers.
Backbone provides some basic structure to a large JavaScript codebase. This has allowed us to create readable, and most importantly reusable, classes that separate functionality from presentation, which is a constant struggle in front-end programming.
Lest this turn into a primer on MVC 101, I'll just outline how we're using Backbone classes in our application:
Models & Collections: These serve as an interface to our API. For each model we want to interact with in the front-end—for example, a post—we create a subclass of Posterous.Model (a subclass itself of Backbone.Model). If we want to deal with lists of a particular model, as we do with lists of posts, we must also create a subclass of Posterous.Collection (a subclass of Backbone.Collection). With both a Posterous.Model and Posterous.Collection, we now have a link between the front-end and our RESTful API.
Routers: For those familiar with Rails development, a Backbone.Router is very similar to your routes.rb file. For each URL on Posterous Spaces, a router fires and tells our app to render a view (or sometimes two views in the case of multi-column layouts).
Views: The meat of our business logic occurs here. The term "View" is a bit misleading to us; we tend to use Backbone views in a manner similar to UIViewControllers in the iOS world. Views observe events, and fire responses. Views also render templates that we have built in Haml.js.
A typical page in Posterous Spaces is actually a tree of views and subviews, each observing behavior within its outermost DOM elements. For example: when you click on the "Reader" tab, we are actually instantiating a ReaderListView, which in turn contains many PostListItemView instances. Within the PostListItemView, we instantiate a LikeButtonView, among other things.
I'll leave it at that, for now. I know this is just a birds-eye view of our architecture, so please feel free to ask any questions you may have about our use of Backbone (or other front-end technologies) in the comments.
We're hiring!
If you're interested in using cutting-edge technologies to build user interfaces that delight millions of people, definitely check out our open job listings page!
We like t-shirts. And we love people who build on the Posterous API. So we've decided to give the first 50 in-person attendees on July 16th the official Posterous Hack Day shirt.
Don't forget to tell us you are coming
Fresh off the launch of our new API, we're hosting a Hack Day on Saturday July 16th to provide anyone using the Posterous API with direct access to our development team.
Who: Any developer interested in using the Posterous API to build something cool. So far, we know of mobile apps and a few web services built on top of Posterous. Whether you need ideas on what to build, are looking to team up with someone else or are already working on an app, the Posterous dev team will be here to help.
What: API overview sessions, office hours for anyone with questions and end of day demos. Plenty of food, red bull and beer. After we're done, we'll take everyone out for drinks.
When: Saturday July 16th 10am - 6pm PDT
Where: Posterous HQ at 2973 16th Street in San Francisco. Also available via IRC ( #posterous-dev on freenode).
July 12th update: The first 50 in-person attendees will receive the official Posterous Hack Day shirt.
Today, we're happy to announce a new API that allows third-party developers to access the full Posterous technology stack.
The new API gives developers unprecedented access to methods and actions that were formerly available only to the core Posterous engineering team including the ability to create sites, add users for those sites and assigning custom themes for each user. Additionally, we've added API methods for retrieving and manipulating data around sites, users, posts, comments, and a number of other Posterous data types.
Aside from just adding new endpoints, we've also designed the API to be super easy to use. The new API is RESTful and presents a clean and concise set of URLs whose intent is easy to parse and understand. Moreover, we've designed the API site to be a powerful developer tool for working with the new API. Not only does this site document every single available method, it also allows developers to interface directly with the API from their browsers. Using this new tool, developers can experiment by dynamically changing the parameters and inspecting the response.
The use cases for the new API are impressive - whether you are empowering your users to be editors like Pulse did or distributing your content in real-time like Turner Broadcasting did for March Madness, our API can power it. Another great example is Oxfam, who are using the Posteorus API to to drive awareness and participation in their recently announced campaign to grow a better future. Anyone visiting the Oxfam grow site will soon be able to sign up and create a blog, hosted on the grow.gd domain (e.g. chris.grow.gd) with a custom theme developed by Obox. All of the Grow blogs will contain an embedded grow widget to educate consumers on the growing food crisis and solicit their ideas for creating a different future. And, of course, anyone signing up through Oxfam will have a full Posterous account and access to all our feature.
We can't wait to see what you do with it and welcome your feedback on how we can make it better. Start by checking out our new API site.
I am reposting this from my personal blog because the solution I described is in use at Posterous, and I just think it was a really interesting problem to solve.
Cache performance is essential to site performance, but most folks don’t understand their cache at a deep enough level to make proper engineering decisions. Tools like SimCache can help by predicting cache performance ahead of potentially costly ops decisions.
Modern web applications depend heavily on caching to maintain site performance and reduce loads on their primary databases. However, in many cases, caching strategies are deployed in an ad hoc fashion, without much understanding of how underlying usage patterns affect cache performance. In practice, developers tend to spin up a cache (typically Memcache) and continue adding capacity until site performance is “good enough.” However, without deeper understanding, capacity planning and performance tuning will become harder as traffic grows or usage patterns change.
Posterous was no exception; early in our history, we began to use Memcache heavily to quell the increasing load on our MySQL servers, sizing our Memcache cluster with simple heuristics. However, as we began to use Memcache in different ways and as our traffic grew, the cache began to act erratically, leading to site performance issues, despite rapidly increasing the size of our Memcache cluster. We realized that understanding how our cache performed given our observed usage patterns was essential for appropriately sizing our cache. To do so, we developed SimCache, a tool for predicting cache performance based on observed usage patterns, and used it to plan the second version of our cache, based on Redis.
As a consumer-oriented blogging platform, Posterous is an extremely read-heavy app. Moreover, our usage patterns are extremely “long-tail”; at any given moment, we’ll serve thousands of requests for a heavily-visited site like the Gap’s consumer facing blog but just a few for Mrs. Henry’s sixth grade class blog.
To serve our normal stream of requests, we had been using a fairly large Memcache cluster to store formatted blog posts. However, cache performance began to act erratically, with wild swings in the speed of some requests that we couldn’t really understand. Moreover, the cache was being asked to serve a growing number of requests:

Understanding that this was unacceptable, I began working closely with Chris Burnett, another engineer at Posterous, to assess the degradation in cache performance.
The importance of proper logging and measurement cannot be overstated when assessing the performance of a given caching strategy. Without collecting statistics on your cache performance, you’re essentially blind, with no understanding of how well your cache is working, or how it could be improved.
For a typical key => value cache, collecting statistics is pretty easy to implement. Anytime a key is accessed, simply log whether or not the cache request resulted in a hit or a miss:
Feb 28 01:14:54 hit! key = posts/1432 Feb 28 01:14:54 hit! key = posts/2442 Feb 28 01:14:55 miss! key = posts/2970 Feb 28 01:14:55 hit! key = posts/6917 Feb 28 01:14:57 miss! key = posts/9363 Feb 28 01:14:57 hit! key = posts/2969
Such simple data can reveal a wealth of insights. Most important is the cache’s miss rate: how frequently do we need to regenerate data? It is the miss rate that ultimately impacts site performance. Using such data, we were shocked to discover that we were caching a lot less than we thought, and that our cache actually behaved quite erratically, with a greater than 2x difference between peak and trough miss rates (1 = baseline):

Given our initial assessment, it was clear that we would need to increase the size of our cache. But by how much? Could we expect much improvement if we increased the cache size by a third? What about doubling the cache? Would that sufficiently improve site performance? To answer these questions, I wrote a tool called SimCache which would replay our observed cache access patterns against a simulated cache of a given size, measuring how cache size would affect cache miss rates and other important metrics of caching performance. Using SimCache, we tested how cache performance varied if we increased our existing cache size (red) by:

The data indicated that our cache was too small by a factor of almost 2x. Moreover, the undersized cache was resonsible for the wild swings in miss rate. Keys were evicted from the cache far too soon; as the cache size was steadily increased, the variation in miss rate went down dramatically, leading to better consistency in hit rates from our cache.
Using the simulation results, we increased our cache to the appropriate size. Of course, it is important to collect statistics afterwards to verify if the change had its intended effect. In our case, the results were pretty good. At time=0, the newer cache was inserted, resulting in a spike in miss rate. However, as the larger cache began the fill, the measured cache performance (green points) matched the predicted cache performance (blue line) very well:

Using the data from SimCache allowed us to understand why our cache performance was degraded and how to improve it. Moroever, by predicting the required cache size ahead of time, we avoided costly “ops iteration” —– i.e., we did not have to add servers, wait to see if site performance improved, add more cache, rinse and repeat. Instead we were able to size our cache appropriatey from the beginning.
Interested in working on problems like this? We’re hiring.
Thanks to J. Hui, C. Burnett, R. Pearson, D. Meredith, and G. Tan for reading and commenting on different drafts of this post.