I started doing some research on microservices and came across this really interesting video from about 5 years ago where Werner Vogels, Amazon’s CTO, talks about how (and why) Amazon switched to a microservices architecture. It’s a really interesting presentation that explains the challenges that amazon.com was facing in its early years and how internal solutions to those early problems were the basis for AWS cloud computing later.
Werner Vogels – Amazon and the Lean Cloud
It’s a relatively short presentation – about 30 minutes – but it’s full of interesting details about those ‘early days’ of cloud computing. Here are some highlights:
- In the early 2000’s Amazon’s main e-commerce site – amazon.com – was facing some technical challenges. Its architecture at that time was typical of the web applications we still build today – a single monolith application code base, a common technology stack in all web areas, with massive relational databases on the backend. What were some of the problems they were having in those early days? Code compiles and deployments were taking too long. The backend databases were massive and hard to manage. Bottlenecks existed everywhere – it was getting harder and harder to make progress, release new features and keep up with growth.
- Amazon’s technical architects analyzed the problem and realized that the path they were on would not take them far in the future. The decision was made to move towards a microservices architecture (they didn’t call it that back then but that’s what they were basically building). The idea with microservices was that every little feature and capability for the retail site would be provided by a mini-service that would interact with other services through well-defined interfaces. This is the path that amazon.com went on for the next few years. According to Werner the current homepage for amazon.com is put together by a few hundred such microservices.
- It’s hard to believe that such an architecture could actually work at the scale that amazon.com needed – it sounds like the perfect recipe for chaos. Specific changes were needed to how Amazon’s internal teams worked in order to make it work. The idea of “two-pizza teams” was at the core – a team supporting a particular microservice should not be bigger than the number of developers who could eat two pizzas. This usually meant no more than 10 technical folks to such a team – a perfect number for a team that could do work without needing complex meetings to bring everybody up-to-date on progress. Teams chose the technology stack they would use for a particular microservice. Another critical concept was the idea of “you build it, you run it”. These small teams were in charge with development and operations for their service (they were doing devops before it was actually cool). Amazon now had hundreds of such teams working on the amazon.com site.
- Things were going well initially but they realized after a while that the rate of progress and productivity was slowing down. A more careful analysis of the situation showed that these teams were now spending close to 70% of their time doing operations work – making sure that their services would be operational according to the standards for high availability required for amazon.com. Engineers were solving the same problems over and over on their own because they had no common internal infrastructure resources they could use.
- This is when the idea of infrastructure on demand started to come up – the beginning of the AWS cloud operations. First, object storage (S3) … then compute (EC2) and on they went from there. Somehow along the way these internal elastic ‘cloud’ capabilities were exposed to external customers and the rest is history.
It’s indeed a fascinating inside look at how the AWS cloud was born. If you’ve wondered how come Amazon, an online book retailer, ended up being a cloud computing powerhouse then this video will give you some of the answers.