How One Startup Is Training Engineers To Handle Real-World Tech Disaster Scenarios
Payments startup Stripe’s “capture the flag” annual event is designed to teach engineers how to handle intense, worst-case scenario situations before they arise.
Most engineers never learn to keep a website running during a “denial of service” attack or a massive security breach until one happens. And by then it is usually too late, with each passing second in these very real scenarios potentially costing companies millions upon millions of dollars.
Enter payments startup Stripe, which runs an annual event for senior engineers called “capture the flag” that pits them against each other to solve sophisticated tech disaster scenarios its business may face. The goal is to educate engineers on how to handle these situations before they arise rather than force them to learn on the fly after they do. While the exercise sounds like the traditional Silicon Valley hackathon ritual ceremony, there is one key difference: the event is designed to simulate a real-world experience rather than simply building an application.
For instance, the theme of this year’s games, which concluded earlier this month, was “distributed systems,” or problems that are too hard for a single computer to solve. The stakes are not all that high — basically bragging rights and a t-shirt for the winners — though in technical circles that’s typically more than enough to entice engineers.
“We all get the internet is growing very fast, the stakes are very high, but the thing that we don’t talk enough about is the lack of skills between a computer science degree and a software engineer,” said Siddarth Chandrasekaran, a lead engineer at Stripe. “I studied computer science at Harvard and had the same problem — I’d learned a bunch about security, but didn’t understand it in such a way I could sit down and fix it in a hands-on matter. The thing that helped me the most was to get my hands on an actual problem, so we figured we’d build one ourselves.”
The goal is to capture a “flag” by completing levels of increasing difficulty by solving a complex technical problem that programmers typically encounter. For “distributed systems,” that meant building out pieces of technology that could very quickly solve large problems and scale up to millions of users without crashing — which is typical for a startup when it is either launching new products or having to execute a fix for an application that is getting crushed under pressure.
This is precisely the kind of problem Stripe, which handles billions of dollars in payments, theoretically may face. The startup, which raised $80 million last month at a $1.75 billion valuation — placing its worth right up there with buzzy startups like Dropbox, Snapchat, and Pinterest — handles the pipes for developers, making it quick and easy for a business to start accepting payments. It counts companies like Lyft and Rackspace as customers.
Chandrasekaran invited me to join part of this year’s event, which took place in a small conference room at Stripe’s new offices in San Francsisco’s Mission District.
This year’s event featured five levels in total, and you had to capture the previous flag just to get access to details about the following level. The challenge I participated in required writing software that would continuously generate and publish a series of random letters and numbers to an online code repository called Github. If the program generated a specific string of numbers (in this case “001”), the player is awarded a “Gitcoin,” a name referring to the digital currency Bitcoin because the process of “mining” to procure a Gitcoin is similar to that of verifying transactions when trying to acquire Bitcoins.
I sat next to Chandrasekaran and fired up Sublime Text 2 — a popular text editor among programmers in Silicon Valley because it colors certain phrases depending on the language in which you are programming, making it easier to spot specific kinds of syntax (like an “if” statement) and quickly edit large blocks of code. Other participants sat huddled at a table in the common room adjacent to us, typing but not speaking.
Given my limited coding experience, we opted to write the script in Python, a popular introductory programming language used by many Web applications. Chandrasekaran scoured the internet for documentation on how to modify Github files through a Python script, sending me the links to documentation and lines of code to add to the script through instant message, much in the same way engineers actually communicate with each other during the workday.
After about an hour of programming, the miner was up and running, though at a significantly reduced difficulty compared to the rest of the game. Making it operative and competitive with the other programmers would have essentially required my turning the reigns over to Chandrasekaran, who actually already built the software earlier using Ruby instead of Python.
In total, 216 people were able to capture the final “flag,” which required players to build out a very sophisticated and robust distribution system that wouldn’t crash and could be deployed in a few days rather than a few months.
“Generally, I think the existence of challenges like this is really good, although that’s at least partly because my career has been shaped by them: I dropped out of high school and college, and it would have been very difficult for Facebook to find me (or for me to find Facebook) through normal recruiting channels,” said Evan Priestley, an early Facebook employee and current co-founder at Phacility who completed the challenge. “But I was able to get my foot in the door by solving a challenge, and I’d like to think I was a successful hire. Facebook managed to make a few other really good non-traditional hires from similar channels, too. As a recruiting tool, challenges like this can give companies access to engineers who don’t fit the traditional mold and would be hard to find otherwise.”
Indeed, Facebook also runs “capture the flag” events as a way to get its employees prepared for security threats like denial of service attacks — a situation an everyday computer science student might not encounter, but something that is basically a rite of passage at a large tech company.
“Building defensive systems is a critical skill set that is often underrepresented within the security community,” said Jennifer Lesser, Director of Security Operations at Facebook. “That’s why Facebook’s CTF events have been geared toward security practitioners and have rewarded defense just as highly as offensive skills, if not more. I can see why a CTF could also be an engaging way for a programmer to gain some hands-on security experience.”
Events like these tend to serve a number of purposes, from developing new products to recruiting. But the key part is simply training the company’s engineers to handle the kinds of real-world scenarios they wouldn’t normally see until it was too late.
“In order to become a doctor you have to know biology, but residencies exist as a pathway to becoming a doctor,” Chandrasekaran said. “Knowing just biology makes it harder to be a doctor. Astronouts learn physics and are obviously very good at it, but it doesn’t convert them into an astronaut. While writing code is not rocket science, it does have the same problem, so we’re trying to build these simulators so we can have a space camp for developers.”