Posted by: Matt Heusser
deployment, IT, metrics, monitoring
Last January I spent a week in New York City, including one day at the international headquarters of Etsy, the folks behind the massive 24-hour shopping mall for crafters. I took pictures (lots of pictures), took notes (lots of notes), and even uploaded a video or two.
On the plane ride home, I took my notes and typed. And typed, and typed, then submitted the results to my editor, and waited. Just last week, the folks at CIO.com published my article “Continuous Deployment at Etsy.”
But this is Uncharted Waters, and you want the rest of the story, right?
Read on, dear reader. Read on.
The article covered a great deal of material, but the biggest thing I thought was missing was a picture of the wall of metrics. No, really, it’s a collection of monitors that work together in real time; here’s a picture:
If you are an operations and worried about a deploy at ETsy, you don’t need to feel uncomfortable and fish for a cigarette. Instead, you can walk over the board and watch it. The wall measures everything, from the errors the show up in the logs, to the time it takes to log in, number of simultaneous users, response time of specific URLs, even the number of code modules that do not yet have automated unit tests.
But don’t take my word for it; here is a little video of Noah Sussman, then Test Architect at Etsy, talking about the massive wall of monitors:
[kml_flashembed movie="http://www.youtube.com/v/1W2vIE1nSxg" width="425" height="350" wmode="transparent" /]
As for the how of the infrastructure, it’s pretty simple: The developers have an API for their PHP code that can send a packet to a logging server, then the logging server writes to a database, and the reports come out of the database in real time, using a free open-source tool called Graphite.
Here’s Noah again, showing us around the office:
[kml_flashembed movie="http://www.youtube.com/v/FPvqKwQssAg" width="425" height="350" wmode="transparent" /]
Here’s The Key
The biggest thing I took away from Etsy was that the company consciously built infrastructure that was designed to free developers.
Instead of a tool for heavy-handed command and control, the tools make it possible to deploy quickly, and the monitors allow the staff to notice a problem and fix it. This mitigates the risk of a quick rollout, which means the technical staff can afford to take a few risks.
And some of those risks won’t burn, they will pay off.
Multiply a net positive risk by ten thousand, and, all of a sudden, you have competitive advantage.
It was a great trip out to Etsy, and I wish them the best of luck, but somehow, I suspect they are making their own.