Spot Instances are great. They are cheap and offer up to 90% savings over On-Demand Instances. By design, they can be taken away at any time. “How would you then run Web / App tier on Spot Instances?” is the million-dollar question that needs an answer.

In this post, I will delve into its details. In the end, I am sure you will appreciate the value that Spot Instances provide and also recognize that they can be used for any kind of workload.

Spot Instances

First, a recap – Spot Instances are spare computing capacity available at deeply discounted prices. AWS allows users to bid on unused EC2 capacity in a region at any given point and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and all users bids that meet or exceed it gain access to the available Spot Instances.

Simple Architecture Diagram

A simple architecture below includes the following components:

External-facing Amazon Virtual Private Cloud (VPC) containing one subnet within a single Availability Zone (AZ)
Auto Scaling Groups (ASG) for the EC2 instances to handle requests
An Elastic Load Balancer (ELB) to route requests across these instances
Standard Identity and Access Management (IAM) roles and instance policies

Credit: Imgur

Sample Request Workflow

A request is made to http://www.example.com through a browser. The browser then contacts Amazon Route 53, highly available and scalable cloud Domain Name System (DNS) web service. It then understands that there is a CNAME record associated it with, of the form example-app-XXXXXXXXXX.ap-southeast-1.elb.amazonaws.com. The request is then forwarded to the ELB which subsequently pushes it to an underlying EC2 instance, say m3.large. That instance processes the request and the response is shown on the user’s browser.

Simple right? Yes, as the implicit assumption is that the instances behind the ELB are On-Demand instances and always available. What if we switch-over to Spot Instances? How would the architecture change?

Complex Architecture Diagram

An advanced architecture below includes, and not limited to, the following components:

External-facing Amazon Virtual Private Cloud (VPC) spread across multiple Availability Zones (AZs) with separate subnets for different applications
Auto Scaling Groups (ASG) for the EC2 instances to handle requests
Elastic Load Balancers (ELB) to route requests across these instances
Standard Identity and Access Management (IAM) roles and instance policies

Credit: AWS Docs

It is a highly available architecture spread across multiple AZs and subnets. This design is common to both On-Demand and Spot Instances but makes more sense to the latter as there is an added complication of instances being taken away. As you can see, a Production VPC has multiple Private Subnets spread across us-east-1b and us-east-1c.

A request to http://www.example.com hits the ELB, which routes it to a Private Subnet in one of the AZs, say us-east-1b and the underlying EC2 instance, say m3.large processes it. If the routing policy is Round Robin, then the next request is forwarded to us-east-1c. If one of the Spot Instances is taken away, then the ASG immediately kicks in and provisions another Spot Instance. Let’s say there was a huge spike in Spot Price for m3.large in us-east-1b. All of our Spot Instances in us-east-1b would be terminated. Every request to http://www.example.com is now routed to us-east-1c and it might overwhelm the associated instances. How should you address these issues?

Best Practices

Top of my head, I can recall a few:

Spread your architecture across multiple AZs
- Consider us-east-1b and us-east-1c. If an entire AZ goes off-the-grid, say us-east-1c due to a natural calamity, then us-east-1b continues to process requests ensuring application availability.

Always have two or more instance types behind an ELB in each AZ
- Consider large and m4.large in us-east-1b. Even though all m3.large are taken away due to a spike in Spot Prices, m4.large continues to process requests in us-east-1b.

Have ASGs associated with every instance type
- Set minimum, desired and maximum counts to ensure that if a few Spot Instances are terminated, new ones can be immediately provisioned.
- Associate CloudWatch metrics with ASGs so that we have the required capacity when there is a traffic spike.

Choose a random bidding strategy
- Consider large and m4.large in us-east-1b with their current Spot Price being 0.1 USD. Set m3.large Bid Price as 0.2 USD and m4.large as 0.4 USD. If the Spot Price changes to 0.3 USD for both the instances, due to market volatility, then m3.large will be taken away and m4.large will continue to process requests.

Some of the best practices mentioned above can be offloaded through the usage of Spot Fleet, as described in our earlier post. Recently, Spot Fleet announced support for Auto-Scaling Groups which further eases the use of Spot Instances while guaranteeing compute capacity. This should suffice for a vast majority of applications most of the times.

But the real-world is the real deal; things can go drastically wrong, such as non-availability of any Spot Instance, and cause application downtime. This is totally unacceptable and a Plan B should be in place to ensure business continuity. Batch.ly follows a Hybrid model, wherein a few On-Demand Instances are launched and the remaining majority are Spot Instances. This ensures both application uptime and cost savings are never compromised. The customers are extremely happy. So are we.

Vijay Olety is a Founding Engineer and Technical Architect at Batch.ly. He likes to be called a “Garage Phase Developer” who is passionate about Cloud, Big Data, and Search. He holds a Masters Degree in Computer Science from IIIT-B.

X-Post from cmpute.io blog

Run your Production Web Application on AWS Spot Instances

Spot Instances

Simple Architecture Diagram

Sample Request Workflow

Complex Architecture Diagram

Best Practices

Leave a comment Cancel reply

My Tweets

Recent Posts

Archives

Categories

Meta

Run your Production Web Application on AWS Spot Instances

Spot Instances

Simple Architecture Diagram

Sample Request Workflow

Complex Architecture Diagram

Best Practices

Share this:

Related

Leave a comment Cancel reply

My Tweets

Recent Posts

Archives

Categories

Meta