Day two of AWS re:Invent is officially in the books (not counting the afterparties). The night was capped off with Tuesday Night Live, a keynote speech by Peter DeSantis, the VP of Global Infrastructure at AWS. As Peter mentioned at the beginning of his talk, he wasn’t able to come through with beers for the crowd, so if you left early, here’s a recap of his presentation.
Innovation at scale
The theme of the night was “innovation at scale,” which makes sense given the massive global footprint of AWS. We often spend so much time thinking about the future that the past can get brushed aside, but Peter brought up a couple interesting stats to start things off – after five years of operation, AWS only offered four regions of service. After ten years, they had increased to 11 regions. But – back to thinking about the future – over the course of 2016-2018, they’ll be adding 11 more, including their first Middle Eastern region in Bahrain.
Regions are key to building highly available infrastructure, but there are plenty of factors to consider in this expansion, with the main one being energy consumption. AWS has considered this, of course, in choosing their expansion locations. Their new region in Sweden, for example, was chosen for the country’s use of up to 50% renewable energy, which will be a great help in mitigating the carbon footprint that comes with running the world’s biggest computing platform.
Amazon’s regional expansion has big implications for everyone: higher availability, more redundancy, and – let’s face it, this is probably the most immediately noticeable – potentially lower network latency when connecting to your EC2 instances. Speaking of which…
Computing at scale
Usage of “normal” EC2 vCPUs has grown at a pretty steady clip over the last three years or so. But the growth of enhanced chips, GPUs and FPGAs has been exponential. The implications for this are also massive, and Peter brought AWS’s GM of Artificial Intelligence, Dr. Matt Wood, to the stage to discuss these changes at length.
We’re in a machine learning renaissance, to borrow a phrase from Dr. Wood. Tu Simple, a Chinese startup, recently automated a truck on a 500-mile drive. What’s impressive about this is that it was a level 4 automation – there was a human behind the wheel to intervene if the truck deemed itself unable to detect the road around it – which is just one step away from the “gold standard” of level 5, or total automation. Another example is Clemson University (an institution we’re proud to help with their cloud training), who provisioned 1.1 million vCPUs with spot instances to catalog a half million journal articles nearly overnight. Clearly, the real world applications are tremendous.
What makes this possible is the NVIDIA Tesla V100 GPU, an absolute beast for deep learning. AWS uses 8 of these bad boys in their P3 instance type, along with 128 GiB GPU memory, and 40,960 CUDA cores to enable enormous amounts of parallel processing. Even with mixed precision floating points (32-bit and 64-bit), the P3 instance offers up to a petaflop of performance.
EC2 under the hood
For everyday tasks, most of us take EC2 for granted, but Peter pulled back the covers and talked a bit about what goes into this powerful service’s architecture. I won’t attempt to cover its history as comprehensively as he did, but here’s the gist.
EC2 is where computing happens, and if you’ve used other hosting providers, the architecture probably isn’t too unfamiliar. You have customer instances virtualized by a hypervisor, which has to integrate with the host hardware, which includes storage, network components, and what Peter calls “management services,” meaning the actual services you interact with in the AWS console.
Virtualization does come with overhead, however, and AWS wanted to solve that problem to make EC2 feel more like bare metal servers (no spoilers yet, but you probably see where this is going!). In order to do this, they abstracted the storage, networking, and other components into what they call the Nitro Architecture, which leverages improved host software to handle data packets more efficiently, among other things.
Eventually, however, AWS reached a limit, and they had yet to make EC2 as “bare metal” as they wanted. They had to decide to either stick with commercially available hardware, fill in the gaps with FPGAs (field programmable gate arrays), or invest in their own custom ASICs (application-specific integrated circuits). You can probably guess what they did – they acquired Annapurna Labs in 2015 and that investment has finally paid off.
Earlier this month, AWS announced C5 instances, with a new EC2 hypervisor based on KVM. This instance type uses the first chip developed by Annapurna since their acquisition and marks the latest in what’s sure to be an ongoing journey to create a better and better compute platform.
EC2 Bare Metal
Most of us were probably waiting for a big announcement tonight, and here’s the first one: AWS will now be offering bare metal EC2 instances! This has big implications for enterprise users running workloads that require a specific hypervisor (or aren’t virtualized at all) or that come with restrictive licensing terms.
The announcement of bare metal EC2 is awesome news for those who need (or want) direct access to the hardware running their infrastructure. More details can be found here.
Load balancing at scale
Another key part of AWS infrastructure and its massive scale is load balancing. As an aside, this isn’t something I often think about in terms of hardware, but Peter covered it in greater detail than I’ve ever heard before, so this was a really interesting segment.
Early load balancing on AWS was almost silly by today’s standards. The original load balancer was a pretty basic machine, and to put it shortly, had far less power than the smartphones most of us carry on a daily basis. As load balancers entered a “golden age” they became a higher percentage of costs, and that was the challenge Amazon set out to solve.
Today, we have the S3 load balancer, which handles around 37 terabits per second – and that’s just the number from one data center. To build on that success, AWS built an internal service, Hyperplane, which is used under the hood of several other services, including EFS, Managed NAT, Network Load Balancer, and Private Link.
Security at scale
Any of these topics could have warranted a keynote of its own, but security is especially critical to cloud infrastructure. And at AWS, it’s the top concern when designing any new service or feature. Stephen Schmidt, VP and Chief Information Security Officer at AWS, took the stage to talk a bit more about it.
Most security issues are caused by misconfiguration, and the way to address that, according to AWS, is through better tooling. Ideally, you want to keep humans away from the data because they’re the ones who create these issues. But security doesn’t happen in a vacuum – it’s something that is applied to systems created by developers. These developers have goals of their own – get things done on time – and the very nature of those goals sometimes results in security problems. Even great automation isn’t quite complete (yet) and we still need humans to calculate risk when applying security best practices. But wait…if the goal is to keep humans away from the data, but we need them to apply security to it, what’s the solution? (Side note: these kinds of questions are why I’m not a security engineer)
To borrow words from Stephen Schmidt: “Bricks are nice but walls are better.” AWS customers wanted a complete security solution, not just primitives and tools to build their own (think IAM roles, et cetera).
Earlier this year, AWS launched Macie, a service that does two main things. First, it allows you to understand the data you have, and second, it helps you understand how people are using that data. Macie is undoubtedly a step in the right direction, but AWS wanted to build on those principles, which leads us to our next big announcement of the night…
No sense in trying to be reserved – the new security service, AWS GuardDuty, looks absolutely incredible. GuardDuty is a threat detection intelligence service. Enabling it just involves a few clicks, and it imposes no overhead on the rest of your infrastructure. You turn it on, and everything “just works.” But what exactly does it do?
GuardDuty offers continuous account and network monitoring through machine learning. Over time, it picks up on how you use AWS, and if something happens that’s out of the ordinary, it lets you know. Actionable security insights are probably the biggest feature offered by GuardDuty, but it comes with a whole slew of others as well, including a threat detection feed created by AWS security engineers and their partners.
AWS security is always a hot topic (and probably always will be) and GuardDuty looks to make that process a whole lot simpler. You can find out more on its service page, and even start using it right away.
Peter DeSantis threw a lot at us in his keynote, from AWS history lessons to a couple big announcements, but it’s all very welcome. Part of what makes cloud computing so powerful is how fast it changes, and AWS is doing more than their fair share of innovation, as they showed us tonight.
Tomorrow morning’s keynote will feature Andy Jassy, CEO of Amazon Web Services, so be sure to tune in to the live stream to see what’s next for the cloud. This is assuming you’re not attending re:Invent, of course – if you are, enjoy the week and stop by our booth for some swag and great cloud discussion!