AWS customers can add graphic acceleration to instances, but with little flexibility. To change that, the cloud provider has finally fulfilled a promise from early last year, with Elastic GPUs that fit enterprise needs.
Developers attach Elastic GPUs to Elastic Compute Cloud (EC2) instances to boost graphics performance in applications for intermittent spikes in workloads. EC2 Elastic GPUs are network-attached compute power available in sizes ranging from 1 GB to 8 GBs.
GPU users were previously limited to spinning up a G2 or G3 instance. But those require investment in a full physical GPU, which overshoots some business needs, resulting in costly and wasteful resource usage. Teams can use Elastic GPUs at a lower price than G2 and G3 instances, using just a portion of the physical GPU for graphics-intensive apps.
Elastic GPUs also help customers that need graphics acceleration without being restricted to a particular instance type. They choose another instance type – such as memory- or storage-optimized – and attach an Elastic GPU to it.
Busy month for AWS
August was a busy month for AWS, with updates from both the AWS Summit in New York and VMworld in Las Vegas.
AWS and VMware finally released their hybrid cloud service nine months after they unveiled the partnership. Enterprises were particularly interested in pricing and functionality details, while small businesses might not be a fit for the service.
At the AWS Summit, AWS unveiled new services for migration and security, a variety of new features for Elastic File System (EFS), Config and CloudTrail, and an upgrade to CloudHSM. And AWS Glue, a service revealed at last year’s re:Invent, is now generally available.
More new features and support
- DynamoDB adds VPC Endpoints. Amazon DynamoDB offers more secure network traffic via a free Virtual Private Cloud (VPC) Endpoints feature, which is now generally available. VPC Endpoints keeps traffic within the AWS cloud instead of exposed in the public internet, in line with businesses’ strict compliance needs.
- More HIPAA eligibility. A new AWS Quick Start helps healthcare enterprises automate a deployment based on a CloudFormation customizable template that adheres to HIPAA regulatory requirements. Additionally, Amazon Cloud Directory implemented new controls to help teams build and run apps that meet HIPAA and PCI DSS guidelines. As with all HIPAA-eligible services, an AWS user must first execute a Business Associate Agreement before building an app that achieves compliance.
- Develop serverless functions locally. A new beta Command Line Interface tool, AWS Serverless Application Model (SAM), enables dev teams to test and debug AWS Lambda functions on premises. Developers can write functions in Node.js, Java, and Python, choose an integrated development environment, and simulate function triggers and make calls via Amazon API Gateway to invoke functions.
- AWS Marketplace adds functionality, new region. Users can now visualize, analyze and control their AWS Marketplace spending via new integration with several existing cost management tools: AWS Cost Explorer, AWS Cost and Usage Report and AWS Budgets. In addition, the AWS Marketplace also is now available in the AWS GovCloud region for public sector customers.
- New capabilities for Simple Email Service. A new Reputation Dashboard helps Amazon Simple Email Service (SES) users track bounce and compliant rates for an account, and act on sending failures. Amazon SES also added dedicated IP pools so an AWS customer can send emails from a specific IP address, or organize IP addresses into configurable pools for large email sends. SES also added capabilities that enable businesses to track and optimize email recipient engagement.
- AWS adds global edge locations. AWS added three new edge locations for its Amazon CloudFront CDN service: Chicago (now home to two edge locations), Frankfurt (six locations) and Paris (three locations). In all, AWS has 93 global edge locations.
- Amazon RDS SQL Server quadruples max database size. Database instances for SQL Server on Amazon Relational Database Service (RDS) now range up to 16 TB of storage, four times higher than the previous maximum of 4 TB. The range for IOPS to storage also increased five times, from 10:1 to 50:1. With these new limits, available on Provisioned IOPS and General Purpose storage types in all regions, databases and data warehouses can support larger workloads without additional RDS instances.
- New CodeCommit features. Amazon’s code repository service, AWS CodeCommit, added several new features and integrations. The service now sends repository state changes to Amazon CloudWatch Events, which enables developers to trigger workflows based on those changes. CodeCommit users can now view, change and save preferences to customize the service’s dashboard presentation. Finally, CodeCommit added a Git tags view that eases code repository navigation.
- EFS adds more permissions. Amazon EFS added support for special permissions, enabling administrators to customize granular access permissions for directories. EFS now supports setgid, which applies ownership of new directory files to the group associated with the directory, and sticky bit special permissions, which restrict file deletion or renaming to either the file or directory owner or to the root user. EFS users can now also manage access to executable files so that end users can launch them but not read or write them.
- CloudTrail supports Lex. Amazon CloudTrail now integrates with Amazon Lex to track application programming interface (API) calls to and from the conversational interface app.
- New render management tool. AWS’ new render management system, Deadline 10, is now available, allowing developers to launch and manage rendering fleets.
- Amazon Cloud Directory boosts search performance. Amazon Cloud Directory users can now optimize searches by defining facets of schema to limit queries to subsets of a directory. A schema contains multiple attributes called facets, which help create different object classes and enable multiple apps to share one directory.
Amazon is upgrading its compute power to court more cloud-hosted graphics-intensive workloads, seeking to benefit from the high cost customers pay for that heavy compute power.
AWS has added a new G3 instance to its graphics-optimized Elastic Compute Cloud (EC2) instances, to power 3-D rendering or visualization, computer-aided design, video encoding augmented / virtual reality workloads. While the hardware upgrade could entice enterprises, IT teams should be wary of high costs and processing times with the instances.
The largest of the three G3 instances contains twice the CPU processing power and eight times the memory of the previous G2 generation. The instances, which provide enhanced video encoding and networking features, run on Intel Xeon E5-2686 v4 (Broadwell) processors and backed by NVIDIA Tesla M60 GPUs.
AWS customers can launch EC2 instances from the AWS Management Console, AWS software development kits, AWS Command Line Interface and other libraries.
New features and support
- Amazon Inspector adds triggers. The Amazon Inspector service, which assesses security vulnerabilities in AWS deployments, can launch automatic scans through integration with CloudWatch Events. With Assessment Events, a customer can create event rules in CloudWatch that notify Inspector to run an assessment on a cloud environment. Users can also schedule recurring assessments and monitor other services to look for event triggers. Inspector displays Assessment Events in its console so a user can see all the triggers assigned to an assessment.
- Visualize resource configurations. A dashboard for AWS Config summarizes account resources and makes configuration history easily accessible. The dashboard displays the number of resources in an account and resources by type, so an administrator can quickly identify resources that fail to comply with AWS Config Rules.
- CloudWatch gains speed. Amazon CloudWatch now supports high-resolution custom metrics and alarms,enabling SysOps to monitor deployments in seconds. Metrics publish in as little as one second and alarms occur in as few as 10 seconds, for more immediate and granular visibility into a cloud environment. The support also includes dashboard widgets.
- Spot Fleets improve tagging. Users can now apply up to 50 tags to EC2 instances launched in a Spot Fleet, to quickly identify specific instances and improve access control, compliance protocols and cost accounting for those compute resources. SysOps defines which tags they want to apply to Fleets, which apply those tags to individual instances. The tagging feature is available in all regions.
- New HIPAA eligibility. Two Amazon services gained HIPAA eligibility and PCI compliance. Amazon WorkSpaces is a desktop as a service that enables administrators to deploy HIPAA-compliant work environments for employees. The service also adheres to Payment Card Industry (PCI) Security Standards, which lets applications and files safely interact with data from card holders. Amazon WorkDocs, a file sharing and collaboration service, can safely handle sensitive health or cardholder information with HIPAA eligibility and PCI DSS compliance. Both updates help AWS customers, particularly in the healthcare field, conform to strict compliance standards.
- Lambda@Edge goes GA. Eight months after its unveiling, the AWS Lambda@Edge service is generally available for developers who want to run Node.js-based Lambda functions across AWS edge locations. Developers upload code to Lambda and configure it to trigger CloudWatch Events. AWS then routes the request to the edge location that’s geographically closest to the customer and executes it. For example, an IT team can create custom web pages and logic at lower latencies for individual Lambda requests based on their geographic origins.
- Reduce unwanted email. An added flow rules feature in Amazon Workmail enables an IT team to filter inbound email traffic to reduce unwanted email messages from specific senders, route email to junk folders and ensure delivery of priority email. Rules can apply to individual email addresses and entire email domains that AWS hosts.
This is a guest blog post by Bob Reselman, a nationally known developer, system architect, writer and editor. You can read more of his work at DevOpsAgenda.com.
Serverless computing is all the rage among developers, and with good reason.
A serverless environment is the new vista in modern application development. AWS has Lambda; Microsoft has Azure Functions; Google has Cloud Functions. These technologies are not going away. In fact, we’ll see a lot more work take place to create, build and test code in which the function is the unit of deployment.
Serverless-based applications are easy to architect and easy to deploy. A developer decides the services he needs, wires them up in a script, hits the deploy button and runs some tests — that’s it. Developers don’t need to worry about hardware, capacity or scalability; the serverless provider takes care of all that. Just pay the bill for the resources you use.
It couldn’t be simpler, right? Well, maybe not.
The architecture of a serverless environment with a simple REST API architecture implemented in AWS is fairly straightforward. A set of RESTful endpoints uses Amazon API Gateway and wires each endpoint to some AWS Lambda functions. One Lambda function uses Simple Storage Service (S3) as a data store, and the others store data in an Amazon DynamoDB database.
The API Gateway provides a way to get data in and out of the application; the functions handle computation, while S3 and DynamoDB provide the data storage. What’s not to like? AWS will scale up your application as needed. All you need to do is pay the bill.
So, let’s talk about that bill. Let’s use Will, a systems engineer, as an example.
Will is a low-level engineer who works on content delivery networks for a major telecom. He works closely with bare metal, well below the surface of the average developer’s day-to-day dealings with the cloud. In Will’s world, memory allocation counts.
Over the years, with the growing popularity of higher-level languages such as C# and Java, the common Linux command malloc, which requests memory from the operating system, has become hidden in the language runtime engines, including the common language runtime for .NET and the Java VM. But memory has to be allocated no matter what, and the way you get memory is via the operating system using malloc:
str = (char *) malloc(15);
Here is where it interesting: the efficiency of malloc varies depending on your implementation. Standard malloc is inefficient in situations with a high degree of concurrency in multiprocessor environments, so Will won’t use it. It locks up memory — used or unused — and places extra burden on the CPU. Will prefers tcmalloc, created by Google, which exposes configuration capabilities that allow memory allocation to work more efficiently. And it avoids wasteful CPU cycling.
So, what does a memory allocation binary have to do with your AWS bill? It actually has a lot to do with it.
AWS makes money on Lambda by billing you for the time it takes to execute code, which translates into CPU utilization — though you also get billed by your request volume. Thus, every piece of code in your Lambda function that declares a variable is subject to the memory allocation executable, which is most often malloc. That means you might have created code that runs squeaky clean on your local machine or even in a private cloud. But when it gets to AWS, it kills the CPUs.
The provider’s memory allocation infrastructure might not be optimized, so wasteful cycles get spun and you get billed. It’s just like giving a package to a messenger and letting him determine the best route, which might include a lot of stop lights. You pay for the messenger’s time no matter the route efficiency.
Of course, I am not saying AWS is a nefarious agent; quite the opposite. But the serverless environment is theirs to run, and the IT shop doesn’t have a lot say in the matter other than region selection.
Without the ability to optimize a serverless environment to accommodate computationally intensive applications, there is a real financial risk for enterprise IT teams. Hopefully, the major players realize that user optimization for cloud services offers a competitive advantage and more granular capabilities. Otherwise, engineers will fly blind without the aid of instruments on the control panel. And, as we’ve learned on the terrain, when disaster looms, you can’t fix what you can’t see.
Edge computing and IoT continue to infiltrate the enterprise, prompting AWS to release several services at re:Invent 2016. One service, AWS Greengrass, enables AWS Lambda functions on devices and ties together IoT and serverless technologies. Seven months after pulling back the curtains on the service, Greengrass is generally available in the US-East and US-West regions.
As enterprises invest more heavily in IoT-connected devices, they want more connectivity and compute capabilities associated with them. Greengrass delivers limited AWS programming to groups of devices, enabling them to respond to real-world circumstances, such as a faulty internet connection.
AWS Greengrass enables a device to perform functions on data and securely transmit that data to the cloud for additional analytics and storage. Developers can combine Lambda with the AWS Greengrass Core SDK to execute serverless functions locally, establish secure connections from the core device to the cloud and support MQTT messaging on devices.
The service also opens up hybrid cloud possibilities — another recent area of emphasis for AWS. Greengrass is one of few Amazon products that can run in on-premises. Greengrass can run on very lightweight or more intricate computing systems, enabling IT administrators to use the AWS programming model locally, if they choose.
Developers can access Greengrass from the AWS Management Console, API or AWS Command Line Interface, and then define and manage Greengrass groups — devices connected to each other.
In addition to Greengrass, AWS added several features and support this month, including plans for a new data center region. Here’s what you might have missed.
New AWS features and support
- DAX also goes GA. Amazon DynamoDB Accelerator (DAX), a caching service for eventually consistent, read-heavy workloads on DynamoDB, is generally available. DAX reportedly improves DynamoDB performance up to 10 times, and it is both fully-managed and compatible with existing DynamoDB API calls, lowering the barrier for developers to roll it into their deployments. DAX is available in the US-East-1 (Northern Virginia), US-West-1 (Northern California), US-West-2 (Oregon), EU-West-1 (Ireland) and Asia Pacific-Northeast-1 (Tokyo) regions.
- New region in Hong Kong. AWS will add a new geographic region to Hong Kong in 2018. The region, AWS’ eighth in Asia Pacific, appeals to local public and private sector clients as well as Asia-based businesses building multi-zone fault-tolerant applications. The region expands AWS’ global footprint to 20 regions; the public cloud provider will open other regions and availability zones in China, France and Sweden in 2017 and 2018.
- Rekognition adds region, feature. Amazon Rekognition, an image recognition and management service, is available in the AWS GovCloud (US) region. A new celebrity recognition feature enables the service to identify an image of a famous person by comparing it to a global list of thousands of celebrities across politics, entertainment, business, sports and media. The feature expands facial recognition capabilities for developers, who could roll the technology into mobile applications. It keeps pace with a similar tool within the Microsoft Cognitive Services portfolio.
- X-Ray expands latency monitoring. Two features in the AWS X-Ray service will analyze and debug distributed applications. The Visual Node and Edge latency distribution graphs, accessible in the Service Details sidebar, visualize and track latency among services; they also show current latency from the perspectives of clients, services and microservices. Developers can access the features via API call or the X-Ray console.
- Device authentication for Amazon Workspaces. AWS’ desktop as a service offering, Amazon Workspaces, added device authentication for users in BYOD work environments. Administrators establish policies to manage devices and client access, and digital certificates grant or block access to certain operating systems.
- AWS WAF adds more IP address control. AWS Web Application Firewall (WAF), a service that protects web-based apps from common malicious attacks, added a rate-based rules feature. Previously, security ops pros define rules for requests with certain criteria, such as IP address or the size of the request, and choose to allow, block or count those requests. Rate-based rules expand the controlled response to include a large number of requests for a particular IP address, which could signal a DDoS attack or something more benign, such as a software integration that cannot connect to the app. SecOps teams use rules to add or remove an IP address from a blacklist, set higher request rates for technology partners and set CloudWatch metrics — including alarms that can fire off AWS Lambda functions — to monitor each rule. SecOps can also combine rate-based rules with other WAF conditions to establish more sophisticated rate-based policies.
- Additional AWS Direct Connect locations, monitoring capabilities. AWS Direct Connect, which targets hybrid clouds, establishes secure, dedicated network connections from on-premises resources to the AWS cloud — with increased bandwidth and reduced network costs compared to web-based connections. The list of available locations for AWS Direct Connect now totals 60 — with new ones across North America and Europe. This was the service’s second expansion this year. Admins also can now add Amazon CloudWatch monitoring to all locations (except China), to monitor physical connections to the cloud and set up alarms and triggers through Amazon Simple Notification Service (SNS).
- Lightsail available in nine new regions. Launched at re:Invent in 2016 in just the US-East region, AWS expanded Lightsail, its Virtual Private Server service, to nine more regions across the United States, Europe and Asia Pacific. Lightsail offers simplified servers with managed infrastructure for businesses with more basic computing needs or limited budgets.
- CloudTrail improves API tracking. Admins use AWS CloudTrail to monitor AWS API calls, and AWS recently added to those tracking capabilities. The CloudTrail console’s API Activity History page now includes API calls to CloudWatch Events, Elastic Compute Cloud (EC2), DynamoDB, Cognito, Kinesis, CloudHSM and Storage Gateway. This addition centralizes API logs and removes the need to retrieve CloudWatch Events APIs from Simple Storage Service (S3) buckets.
- EC2 Systems Manager integrates with S3. Developers can query and visualize inventory data across multiple regions and accounts with AWS’ new integration between Amazon EC2 Systems Manager and S3. Developers enable an S3 bucket to automatically collect inventory data, which eliminates the need to create custom scripts. They can then use Amazon Athena to query the data or Amazon QuickSight to visualize it.
- Convert legacy data warehouses to AWS. The AWS Schema Conversion Tool added more support for legacy data warehouses. IT teams can now export data to Amazon Redshift from Teradata (versions 13 and above) and Oracle Data Warehouse (versions 10g and above).
- AppStream added user management, web portals. Amazon AppStream 2.0 now enables admins to create and manage users without an identity federation tool. Admins grant user access with the User Pools tab in the AppStream console. Users log in via a web portal to choose which approved applications to use.
In the fast-paced world of public cloud, if AWS is the hare, AWS GovCloud is the tortoise.
GovCloud launched in 2011 to meet stricter regulatory requirements for federal, state and local government. Since then, AWS has added dozens of new services and nine new private-sector regions across the globe. But AWS GovCloud was slow to incorporate new services, and it existed as only a single West Coast region – until now.
AWS will add a second GovCloud region in the East Coast in 2018. This comes on the heels of the public cloud market leader’s increased efforts to meet regulatory standards and improve feature parity among its commercial and public sector offerings.
And while AWS GovCloud might only serve as a curiosity to the private sector, its continued expansion speaks to a broader trend of the public cloud as an accepted place for workloads of all kinds.
“From a technology perspective, [GovCloud] has grown leaps and bounds, even over the last two or three years,” said Tim Israel, director of cloud engineering at Enlighten IT Consulting, a GovCloud reseller that works primarily with the Department of Defense.
AWS GovCloud has seen 185% compounded annual growth rate since it opened in 2011, according to Amazon. Some of the most important additions include new instance types already available on the general site and the addition of services such as AWS Lambda, which was added in May, more than two years after the service was first rolled out. There’s also a growing list of accreditations for various services that are often more important to regulated IT shops than the services themselves.
The real potential benefit of the new East Coast region — one that regular AWS users have had access to since 2015 — is disaster recovery across regions. Currently, AWS GovCloud users can replicate data across data centers within the region, but that’s probably not enough redundancy for mission-critical applications.
For example, users of the standard AWS public cloud, which incorporates regional failover, saw services remain uninterrupted when the US-East 1 region went down earlier this year. Those lacking cross-region replication couldn’t access applications housed in US-East 1 for up to four hours.
Still, AWS has a long way to go before there’s true parity between the two iterations of its cloud. Only 35 of its 92 services are available on GovCloud. The private cloud that AWS built specifically for the CIA is believed to have an even small feature set. All other U.S. regions offer at least 50 services, and across AWS’ global footprint, only the China region, which is operated by Sinnet, has fewer available services.
According to Amazon, the services available in AWS GovCloud align with the needs of government, as indicated by public sector customers.
AWS GovCloud is also generally more expensive than the commercial version. Comparable compute resources cost more in that region than they do in standard AWS regions; it’s also more expensive to transfer data out of the cloud.
Despite those limitations, AWS GovCloud does have benefits for its targeted audience. It meets certain regulatory standards that other regions do not. It’s also maintained only by U.S. citizens and provides encrypted access that meets federal guidelines.
Unlike the private sector, government agencies have to go through a competitive bidding process that puts roughly two years between when a project’s conception and when the actual purchase is made. And given that two years ago was about the time when enterprises really started to embrace the public cloud, AWS GovCloud could also be gaining steam at just the right time.
Trevor Jones is a news writer with SearchCloudComputing and SearchAWS. Contact him at email@example.com.
Price reductions are less common for compute resources than they were in the early days of cloud computing. But AWS customers can still find value in occasional Elastic Compute Cloud (EC2) price cuts.
In May, AWS reduced prices on a slew of one-year standard and three-year convertible EC2 Reserved Instances. AWS customers can save 9% to 17% on standard Reserved Instances, depending on the region, operating system and instance type — discounts apply to C4, M4, R4, I3, P2, X1 and T2 types. Discounts for convertible instance types range up to 21%.
Convertible Reserved Instances allow a user to change the instance family as application needs evolve, to provide an extra level of workload flexibility but still locked into a contract for instance capacity.
AWS also introduced no-upfront pricing for three-year Reserved Instances, a feature previously reserved for one-year terms. It also lowered On-Demand and Reserved pricing for M4 Linux instances.
New features and support
- Cost allocation tags for Elastic Block Store snapshots. Users can assign costs to a particular project or department via cost allocation tags. Navigate the AWS Management Console’s tag editor feature to find the necessary snapshot, or backup of an Elastic Block Store (EBS volume), and apply tags. Users can also create tags via a script command or function call. Manage and activate cost allocation tags in the billing dashboard, then monitor tagged snapshots in the Cost Explorer feature.
- Lambda support for AWS X-Ray. The AWS X-Ray service, which analyzes performance of microservices-based applications, now supports AWS Lambda. Developers can enable active function tracing within Lambda to activate X-Ray or update functions in the AWS Command Line Interface. AWS X-Ray processes traces of functions between services and generates visual graphs to ease debugging.
- New features for IAM policy summaries. Administrators can evaluate and troubleshoot AWS Identity and Access Management (IAM) permissions with three new IAM policy summary features. The new resource summaries display resource types, regions and account IDs to provide a full list of defined resources for each policy action. Admins can evaluate which services or actions a policy denies, and see which possible actions remain. They also can identify typos and other errors in policies by seeing which services and actions IAM fails to recognize.
- Support for SAP clusters. AWS unveiled extended support for larger SAP clusters at SAP’s Sapphire Now conference. In addition to expanding its X1 instance type to accommodate larger SAP applications, AWS revealed plans to expand RAM on its virtual servers to better support SAP workloads in the near future.
AWS has made it a priority to win over customers in the database market, specifically Oracle shops. And the public cloud provider has a new weapon in that battle — an upgraded primary database conversion tool.
The AWS Database Migration Service (DMS) now supports NoSQL databases, enabling developers to move databases from the open source MongoDB platform onto DynamoDB, Amazon’s native NoSQL database service. AWS DMS also supports migrations to and from Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, SAP ASE and SQL Server as database sources. The cloud provider could target other NoSQL database providers for support in the future.
In addition to homogenous migrations, the AWS Schema Conversion Tool converts database schema to enable migration from a disparate database platform to a target on Amazon Relational Database Service, such as from Oracle to Amazon Aurora.
AWS also recently added support for data lake conversions from Oracle and Taradata to Amazon Redshift, a swift response to an Oracle licensing update that hiked fees for Oracle cloud users.
Despite the potential of lock-in, enterprises are interested in the ability of the DynamoDB platform to integrate database information with other AWS tools. And AWS is happy to beat its chest over winning these database customers — it passed 22,000 database migrations in late March, AWS CEO Andy Jassy claimed on Twitter.
It’s getting crowded in the AWS toolbox
Among AWS’ slate of recent service and tool updates, here are several other noteworthy tidbits:
- A Resource Tagging API. IT teams can now apply tags, remove tags, retrieve a list of tagged resources with optional filtering and retrieve lists of tag keys and values via API. The new API enables developers to code tags into resources instead of doing it from the AWS Management Console. The Resource Tagging API is available through the newest versions of AWS SDKs and the AWS Command Line Interface. The new API functions apply across dozens of resource types and services. The cloud provider also added the ability to specify tags for Elastic Compute Cloud instances and Elastic Block Store volumes within the API call that creates them.
- Support for CloudWatch Alarms on Dashboard Widgets. Added functionality of CloudWatch Alarms for Dashboard Widgets provides AWS users with at-a-glance visibility into potential performance issues. SysOps can view CloudWatch metrics and Alarms in the same widget, and view widgets that display metrics according to number (value of a metric), line graph or stacked area graphs (layering one metric over another).
- Cross-region, cross-account capabilities for Amazon Aurora. IT teams can copy automatic or manual snapshots from one region to another and create read replicas of Aurora clusters in a new region. These features can improve disaster recovery posture or expand read operations to users in geographically-close regions. Additionally, users can share encrypted snapshots across AWS accounts, which enables them to copy or restore a snapshot depending on encryption configuration. AWS also expanded Aurora availability to the US-West region, and added support for t2.small instances.
- Amazon Elastic MapReduce instance fleets. This addition lets ops specify up to five instance types per fleet with weighted capacities, availability zones and a mix of on-demand or spot pricing. EMR instance fleets enables ops teams to craft a strategy for how they want to provision and geographically place capacity, and how much they want to pay for it. EMR automatically spins up the required capacity to support big data frameworks for Apache Hadoop, Spark or HBase, among others.
AWS attributed Tuesday’s extended disruption to outdated processes and human error, according to a postmortem published Thursday.
The post, which classified the incident as a “service disruption,” states that the problem started at 12:37 p.m. ET when an authorized Amazon Simple Storage Service (Amazon S3) team attempted to resolve an issue that had caused the S3 billing system to behave slower than expected. One of the team members, following AWS guidelines, attempted to execute a command that would remove some of the servers for an S3 subsystem, but incorrectly entered one of the inputs.
As a result, too many servers were taken down, including those that supported two additional S3 subsystems that manage metadata and location information, as well as the allocation of new storage. Compounding problems and creating a Catch-22 scenario, the latter subsystem required the former to be in operation. The capacity removal required a full restart, and Amazon S3 was unable to service requests.
It appears the outage was due to fat fingering under pressure with an arcane, hardly used command, said Mike Matchett, senior analyst and consultant at Taneja Group.
“It need not ever have happened, but was too easy a mistake to make,” he said. “Once made, it cascaded into a major outage.”
The impact spread to additional services in the US East-1 region that rely on Amazon S3, including the S3 console, new Elastic Compute Cloud instance launches, Elastic Block Store volumes and AWS Lambda.
AWS’ system is designed to support the removal of significant capacity and is built for occasional failure, according to the company. It has performed this particular operation since S3 was first built, but a complete restart of the affected subsystems hadn’t been done in one of the larger regions in years, the company said.
It’s surprising that a system like AWS is vulnerable at this scope to manual errors, said Carl Brooks, an analyst with 451 Research. The initial failure is understandable, but the compounding impact shows a larger structural flaw in how AWS manages uptime, he added.
“It says something for one manual process to have that much disruptive effect,” Brooks said. “They claim they’ve been working against [these types of failures] all this time, and clear that work has not been completed.”
Amazon S3 fully recovered by 4:54 p.m. ET, and related services recovered afterward, depending on backlogs.
AWS said it has since modified the tool it uses to perform the debugging task to limit how fast it can remove capacity and to put in additional safeguards to prevent a subsystem from going below its minimum capacity. It’s also reviewing other operational tasks for similar architectural problems, and plans to improve the recovery time of critical S3 subsystems by breaking down services into smaller segments and limit the blast radius of a failure, the company said in the postmortem. That work was planned for S3 later this year but that timeline was pushed up following this incident. Company officials did not comment specifically on when this work would be completed.
Critical operations that involve shutting down key resources should be fully scripted and tested often, and it shows a level of hubris that AWS hadn’t tried this restart in years, Matchett said. He was also critical of having the health dashboard connected to S3, saying the management plane for high-availability workloads should have been on a completely separate one than the resources it was controlling.
“It really looks like AWS has built a bigger house of cards than even they are aware of,” Matchett said.
Changes also have been made to the Service Health Dashboard to run across multiple regions.
Amazon has come far to accommodate customers running workloads on-premises; the same can’t be said, however, for how it addresses its customers using other public clouds.
At re:Invent last week, Amazon punctuated its softened stance on hybrid cloud — at least as a stopgap measure — as it touted its recent partnership with VMware and rolled out a series of services and products for use outside its own data centers. But when the notion of using multiple public clouds was broached, Amazon’s answer was simple: Don’t do it.
It’s hard to gauge how many companies actually deploy workloads in multiple infrastructure as a service public clouds. It’s a practice typically reserved for the larger enterprises that hedge their bets and avoid lock-in by storing backups on services such as Microsoft Azure and Google Cloud Platform, or by forward-leaning users that target specific tools for specific applications. It’s still early days as true competitors start to emerge, but there’s little evidence of workloads spanning across clouds — much how bursting was more hype than reality as hybrid clouds emerged.
Amazon, through a series of subtle and not-so-subtle messages, advised users that the grass is not greener elsewhere. AWS CEO Andy Jassy reportedly dissuaded vendors at the partner summit from a “strategy of hedging.” He didn’t ask partners not to work with other cloud providers, but he did say Amazon would direct business to those willing to do the tightest integration with its platform.* Amazon CTO Werner Vogels laid out a series of new products intended to fill in gaps in its data services, and talked about offering primitives and designing a “comprehensive data architecture” to meet users’ every need.
“We give you the choice of how you really want to do your analytics,” Vogels said in his keynote. “We’ll make sure you can do all of this on AWS — you never have to go outside.”
There were splashy roll-outs aimed at getting as much data as possible out of customers’ data centers and on to its cloud. There was James Hamilton, AWS vice president and distinguished engineer, on stage with a beer toasting to the death of the mainframe as part of a migration service offered by Infosys. The next morning Snowmobile, a 48-foot truck for transferring exabytes of data, was driven on to the convention hall floor to much fanfare.
Amazon is essentially reinventing what it means to be a giant tech vendor, except it’s doing it as a service provider, said Carl Brooks, an analyst with 451 Research. It could lead itself to be the next Oracle, not in terms of culture, but in terms of results and becoming a destination where customers are reliant on it.
“Once you start using Oracle you never stop; once you start using AWS you never stop,” Brooks said.
There also was a big push to expand services and functionality around AWS Lambda, Amazon’s serverless compute service that inherently couples a customer to the platform. Lambda is part of an emerging space of higher-end services that offer exciting new capabilities customers likely couldn’t do on their own, but they come with the tradeoff of rendering migration off AWS prohibitive, at best.
Amazon advises its customers against multiple clouds, and the way to get the most out of the platform is to start looking at services such as Lambda.
“The challenge with a multicloud strategy is customers tend to innovate to what we call the lowest common denominator,” said Jim Sherhart, director of product marketing at AWS.
Of course, cloud vendors have obvious reasons for their current perspectives. Microsoft and Google, both of which regularly mention meeting multicloud demands, need it to be a reality to siphon off business, while Amazon gains little by accommodating rivals that seek to chip away at its market dominance.
It’s also worth noting that Amazon is not alone in pushing higher-level services, nor does it force users to go that route. It also added export capabilities earlier this year to its Snowball hardware device that enables customers to pull data back on-premises.
Despite its public stance, there are indications Amazon would change if it’s thrust upon them. And if its embrace of hybrid cloud — albeit in a very Amazon-centric way — is any indicator, it could eventually be open to multicloud if its customers demand it.
“They have an intense need to stick to the narrative, which is Amazon is great, Amazon is good,” Brooks said. “When they talk about alternatives it’s only in the context of solving a point-specific problem, which is essentially what the VMware partnership is.
“They are able to roll with change, they just don’t want to admit it.”
As for Jassy’s comments, it should be taken more as a call for better understanding of the platform than some kind of threat, said Jeff Cotten, senior vice president of multicloud at Rackspace, which provides managed services for AWS and Azure.
“They genuinely do understand that they can’t create everything, but I do think there’s frustration that there’s not a lot of deep experts and partners,” he said.
In fact, in Rackspace’s experience Amazon has been OK with multicloud, Cotten added.
“They’re also not going to be exclusive with any one partner, so they understand their partners may not be exclusive to them,” he said.
* – Statement changed after initial publication.
Trevor Jones is a news writer with SearchCloudComputing and SearchAWS. Contact him at firstname.lastname@example.org.
The latest AWS updates hit on some common themes to provide customers with more options around compute power and discounted purchasing.
Amazon’s new P2 instance type offers up to 16 graphic processing units (GPUs) and up to eight NVIDIA Tesla K80 Accelerators. GPUs provide massive amounts of compute power and throughput; they were first popular with gaming companies, and are particularly well-suited for cutting-edge workloads in finance and scientific research, according to Amazon.
This latest instance type, however, and the general trend among cloud vendors to offer larger virtual machines, is as much about targeting legacy workloads and the evolving demands from customers, said Carl Brooks, an analyst with 451 Research.
“It reflects the mainstreaming of cloud computing for the bastions of traditional IT where the major reason [to release this instance type is]is big gnarly, hairy systems,” he said.
Customers must use Amazon’s Virtual Private Cloud to use the P2 instances. The instances provide up to 64 vCPUs, 732 gigabytes of memory, 192 gigabytes of GPU memory, nearly 40,000 cores and 20 gigabit network performance when grouped within a single availability zone. They are available in the US East, US West and Europe region as on-demand instances, reserved instances or dedicated hosts.
Amazon isn’t the only company to offer GPU instances. Microsoft’s N-series GPU instance, added in preview in early August, offers up to 24 cores, four NVIDIA Tesla K80 Accelerators, 224 gigabytes of memory and a 1.44 terabyte solid-state drive. Its targeted applications are high-performance computing and visualization workloads.
More ways to reserve Reserved Instances
Amazon also updated its polices around its Reserved Instances, which provide a large discount for customers that reserve EC2 instances in a given availability zone up to three years in advance. It’s been a popular service since its introduction eight years ago and has expanded to include the ability to schedule and modify instances, and to buy and sell these instances on a marketplace.
The pricing model is closer to traditional enterprise IT buying, where a customer can amortize over the life of the contract and treat it more like an investment, compared with the pay-as-you-go philosophy that helped popularize cloud computing.
“The dirty little secret is most customers are not using cloud for elastic workloads but are instead pushing it as a replacement for their data center,” Brooks said.
The latest change targets customers more concerned with price than capacity, by waiving the standard capacity associated with Reserved Instances in exchange for running that instance in any availability zone within a region and automatically applying the discount.
Also new, Convertible Reserved Instances don’t come with the greatest discount on instances but instead allow customers to change the instance type to use a newer iteration or take advantage of different specs, or factor in overall pricing drops on the platform after an instance is initially reserved.
The updates are available across the platform network, with the exception of Convertible Reserved Instances which are unavailable in GovCloud and the China region, though those two are expected to be added soon.
The new regional option will help with automation and flexibility, but Amazon risks making its Reserved Instances too confusing, said Meaghan McGrath, an analyst with Technology Business Research in Hampton, N.H. She pointed to how Google customers appreciate how the vendor automatically calculates sustained usage discounts and applies them to customer instances.
“My feeling is that those customers who understand the system and know how to trade or convert Reserved Instances are the same people who are good at negotiating time-share trades and vacation swaps into nice resorts,” McGrath said. “But the majority of people would really rather just go book a hotel on a discount site that’s going to give them a good deal on the place to stay.”
Trevor Jones is a news writer with TechTarget’s data center and virtualization media group. Contact him at email@example.com.