5 Little Known Facts About Spot Instances

Picture1

Spot Instances are deeply discounted compute capacity that can provide up to 90% savings over On-demand instances. In other words, for the same budget you get 2-10x times the compute capacity. This is  lucrative for enterprises running compute intensive workloads such as video transcoding, big data analytics (Hadoop, Spark processing), log analysis and other batch jobs.

However, enterprises have not gotten around to using Amazon Spot instances effectively, though they have been around for over 7 years. RightScale published the “State of the Cloud” report recently and “cost optimization” was one of the key challenges reported by enterprise AWS users. Not so surprisingly, only 14% of the enterprises are looking at using Spot Instances as a cost optimization model.

Most of you who have explored spot instances would know that the price determination occurs dynamically based on demand-supply economics. We have listed 5 interesting things that you might not know about Spot:

    1. Spot price differs across AZ’s (Availability Zones) in the same region – Different AZ’s within a region can have completely different pricing for the same instance type. It is important to launch an instance at a specific AZ by monitoring the price whenever you need an instance. This can result in great savings (or losses if done incorrectly)
    2. Spot Blocks are instances which you can block for a finite duration (1-6 hours). These instances provides more reliability than regular spot instances (you are guaranteed availability for the entire duration) but are more pricier than spot (30-45% less than on-demand as against 80-90% savings using spot). These are really good for defined-duration workloads and if used in conjunction with spot instances and reserved instances can be used for other workloads as well (which is what Batchly does)

 

  1. You can run your Big Data/Hadoop Jobs (through Amazon EMR) on spot instances for great savings on large workloads. This however, requires one to bid, manage and monitor the instances (similar to regular spot instances). One best practice is to run the task nodes on spot while running your filesystem (HDFS, Hbase etc) in Master & core nodes on AWS on-demand.
  2. Until recently, spot instance would get taken off the moment you got outbid. This gave no time for workloads to perform any evasive action (to store the state or transfer processed information). This changed last year and now you get a 2 minute warning or “Termination Notice” to help you manage your application and take necessary preventive action.
  3. Spot Instance limits – Just like many other AWS services, Spot instances also has soft limits (only 20 spot instances to begin with). This can however be changed by submitting a request to AWS.

Bonus point: Interrupt tolerant workloads such as batch/cron jobs, load testing, video transcoding, rendering, hadoop workloads are most suited for spot. Even if one/more spot instances get taken away, the same can be processed in another spot/RI/on-demand instance(s)

X-Post from cmpute.io blog

Advertisements
Posted in Uncategorized | Leave a comment

Spot vs On-Demand: Which is better in a highly variable environment?

Be honest: Hasn’t this question crossed your mind often? I am sure it has because it has crossed mine. What’s the answer then, you ask? I would always say: “Go for Spot!”

In this post, I will explain how to make the best choice between on-demand and spot in a highly variable Amazon Web Services (AWS) environment. I hope that after reading this you will be in a better position to judge and make decisions based on your use case, scenario, and workload.

On-Demand Instances

On-Demand Instances are virtual servers that are purchased at a fixed rate per hour. Each On-Demand Instance is billed from the time it is launched until it is stopped or terminated. Partial instance hours are rounded off to the full hour during billing. Once launched, it will continue to run unless you stop it or in the rare cases of it being Scheduled for Retirement/ Unavailable.

These are the most expensive purchasing options.

Spot Instances

Spot Instances, also virtual servers, are spare computing capacity available at deeply discounted prices. AWS allows users to bid on unused EC2 capacity in a region at any given point and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and all users bids that meet or exceed it gain access to the available Spot Instances.

These are, by far, the cheapest purchasing option providing savings of up to 90% over On-Demand costs.

Trend

Picture2

Google Trends, without any doubt, clarifies that Spot Instances are gaining popularity at a fast pace. But we have been in this space for close to a decade and know for a fact that businesses are still wary to make the shift to Spot Instances. Why is there less adoption? It’s because of a catch.

The Spot Instances Catch

Like all good things, Spot Instances, too, come with a catch: It can be taken away from you at any time!

Here’s how it works: You place a bid. As long as your bid price is higher than the current Spot Price, Spot Instances will be available to you. The moment the Spot Price goes higher than your bid price, the instances are taken away from you with a 2-minute warning, known quite aptly as Spot Instance Termination Notice. This gives your applications time to prepare for a graceful shutdown.

Most of the existing applications are designed with the assumption that the servers will never go down. It is thus an additional engineering cost and effort to handle this scenario. This involves snapshotting the application and checkpointing data to persistent storage at regular intervals to resume the operation on another server. Due to this, businesses are skeptical to adopt Spot Instances for their use cases.

 

Typical Usage Scenario

A request for a certain number of instances of a particular instance type is placed. Consider the following example: You run a web-tier application and have a need for 100  m3.large instances. The Spot Price on 04 Oct 2016 at 12:12PM UTC+0530 is 0.0285 USD. What should your bid price be? That question is worth another blog post but say you bid at 0.1 USD. As per Spot availability and your account limits, you will get at most 100 m3.large instances.

Picture3

As you move through the days, you see small Spot Price spikes hovering in the range of 0.08 USD – 0.1 USD. You did good by setting a high bid price! But the real shocker comes on 06 Oct 2016 at 04:57PM UTC+0530. The Spot Price is a massive 1.46 USD. All the 100 m3.large instances are taken away and your web-tier is unreachable. Your application is experiencing an indefinite downtime and you are losing money real quick.

 

Mitigation Strategies

The fault, which is apparent in hindsight, was relying only on 1 instance type. You now know that at least 2 instance types must be used in a Spot-only environment to achieve better availability.

You request for 50 m3.large and 50 m4.large. You also increase your bid price to 0.2 USD, just to be safe.

Picture4

The unexpected has occurred. At 06 Oct 2016 at 08:27PM UTC+0530, the Spot Price is 0.66 USD. Your bid price is low and both m3.large and  m4.large are taken away from you and your application is experiencing downtime again! What did you do wrong?

The mistake, again clearly apparent in hindsight, is that both instance types were launched in the same Availability Zone, us-east-1e.

You now request for 50 m3.large in us-east-1e and 50 r3.large in us-east-1c. You increase your bid price to 0.3 USD. Again, just to be safe.

Picture5

My God, it happened again! At 06 Oct 2016 at 08:27PM UTC+0530, the Spot Price spiked to 1.7500 USD. All your Spot Instances are taken away and your application is witnessing uncertain downtime again! What went wrong now?

In summary, mitigation strategies that we looked into are:

  • Never launch all servers of a single instance type
  • At least 2 instance types are a must for better availability
  • Never launch all in a single Availability Zone

Even with these in place, we still saw a downtime. It is a never ending cat-and-mouse game. No matter how safe you think you are, there is always an edge case that causes havoc.

There are several other mitigation approaches that have to be in place to make Spot Instances truly available. Top of my head, I can recall:

  • Choosing the best bid price for a particular instance type
    • It really is an art!
  • A Spot Availability Predictor which predicts the likelihood of an instance being taken away
    • Having access to historical Spot Prices helps build robust Machine Learning prediction models. This helps you answer questions like:
      • Which is a good instance type to launch?
      • Should I launch it in us-east-1a or us-east-1c?
      • Is it ok to launch now or 10 minutes later for better availability?
  • Falling back to an appropriate similar instance
    • Spot Instances are subject to availability as well as your account limits. Would you rather go for On-Demand or a similar instance on Spot?
  • Inclusion of Spot Fleet
    • Spot Fleet is a boon. It vastly simplifies the management of thousands of Spot Instances in one request. You can choose between lowestPrice and diversified as your allocation strategy to meet your requirements.
  • Spot Fleet has limitations though, to name a few:
    • No native support to attach Spot Fleet instances to an ELB.
    • It cannot intelligently decide which instance type to launch if all the Spot Instances are down.
  • Usage of Spot Blocks
    • Spot Blocks are a good fit for Defined-Duration workloads. Just specify the Block Duration parameter when placing a Spot Instance request. But it is twice as costly as a regular Spot Instance. Why should you go for it?

As you can see, Spot Instances management is very complex. There are 50+ instance types, 11 regions, 35+ Availability Zones, and 600+ bidding strategies.

Back to the Original Question

It really boils down to, “Do you truly want 100% availability at all times OR 100% availability with a teeny-weeny chance of 99.99% availability sometimes, if at all, at a cool savings of up to 90%?” Think about it: The worst case is around 10 minutes downtime in a year and if your annual On-Demand expenditure is 1M USD, then you get the same performance and functionality and similar uptime with savings of up to 900,000 USD.

I completely understand if you say, “Spot management is none of my business.” But, quite frankly, it is ours! Register now for a free 14-day trial of Batchly.

I know you have one final question that still has you scratching your head: “Can I get the best of both worlds?” Yes you can! We also offer a hybrid model of On-Demand and Spot Instances for 100% availability for the really mission-critical workloads.

Remember: A penny saved is a penny earned!

 

Vijay Olety is a Technical Architect at 47Line. Fondly known as “Garage Phase Developer”, he is passionate about Cloud, Big Data and Search. He holds an M.Tech in Computer Science from IIIT-B.

X-Post from cmpute.io blog

Posted in Uncategorized | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 7

In Part 6, we discussed handling failures via Hinted Handoff & Replica Synchronization. We also talked about the advantages of using a Sloppy Quorum and Merkle Trees.

In this last & final part of the series, we will look into Membership and Failure Detection.

Ring Membership

In any production environment, node outages are often transient. But it rarely signifies permanent failure and hence there is no need for repair or rebalancing the partition. On the other hand, manual errors might result in unintentional startup of new DynamoDB nodes. A proper mechanism is essential for the addition and removal of nodes from the DynamoDB Ring. An administrator uses a tool to issue a membership change command to either add / remove a node. The node that picks up this request writes into its persistent store the membership change request and the timestamp. A gossip-based protocol transfers the membership changes and maintains a consistent view of membership across all nodes. Each node contacts a peer chosen at random every second and the two nodes efficiently reconcile their persisted membership change histories. Partitioning & placement information also propagates via the gossip-based protocol and each storage node is aware of the token ranges its peers are responsible for. This allows each node to forward a key’s read/write operations to the right set of nodes directly.

Ring Membership (Credit)

External Discovery

It’s best to explain with an example: An administrator joins node A to the ring. He then joins node B to the ring. Nodes A and B consider itself as part of the ring, yet neither would be immediately aware of each other. To prevent these logical partitions, DynamoDB introduced the concept of seed nodes. Seed nodes are fully functional nodes that are discovered via an external mechanism (static configuration or a configuration service) and are known to all nodes. Since each node communicates with the seed node and gossip-based protocol transfer the membership changes, logical partitions are highly unlikely.

Failure Detection

Failure detection in DynamoDB is used to avoid attempts to communicate with unreachable peers during get() and put() operations and when transferring partitions and hinted replicas. For the purpose of avoiding failed attempts at communication, a purely local notion of failure detection is entirely sufficient: node A may consider node B failed if node B does not respond to node A’s messages (even if B is responsive to node C‘s messages). In the presence of a steady rate of client requests generating inter-node communication in the DynamoDB ring, a node A quickly discovers that a node B is unresponsive when B fails to respond to a message; Node A then uses alternate nodes to service requests that map to B‘s partitions; A also periodically retries B to check for the latter’s recovery. In the absence of client requests to drive traffic between two nodes, neither node really needs to know whether the other is reachable and responsive.

Conclusion

This exhaustive 7-part series detailing every component is sufficient to understand the design and architecture of any NoSQL system. Phew! What an incredible journey it has been these couple of months delving into the internals of DynamoDB. Having patiently read this far, you are amongst the chosen few who have this sort of deep NoSQL knowledge. You can be extremely proud of yourself!

Let’s eagerly await another expedition soon!

Article authored by Vijay Olety.

X-Post from CloudAcademy.

Posted in Technical | Tagged , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 6

In Part 5, we spoke about data versioning, the 2 reconciliation strategies and how vector clocks are used for the same. In this article, we will talk about Handling Failures.

Handling Failures – Hinted Handoff

Even under the simplest of failure conditions, DynamoDB would experience reduced durability and availability if the traditional form of quorum approach is used. In order to overcome this it uses a sloppy quorum; all READ and WRITE operations are performed on the first N healthy nodes from the preference list, which may not be the first N nodes encountered by traversing the consistent hashing ring.

Partitioning & Replication of Keys in Dynamo (Credit)

Consider the above figure: If A is temporarily not reachable during a WRITE operation, then the replica that would have lived on A will be sent to D to maintain the desired availability and durability guarantees. The replica sent to D will have a hint in its metadata which tells who the intended recipient was (in our case, it is A). The hinted replicas are stored in a separate local database which is scanned periodically to detect if A has recovered. Upon detection, D will send the replica to A and may delete the object from its local store without decreasing the total number of replicas. Using hinted handoff, DynamoDB ensures that READ and WRITE are successful even during temporary node or network failures.

Highly available storage systems must handle failures of an entire data center. DynamoDB is configured such that each object is replicated across data centers. In terms of implementation detail, the preference list of a key is constructed such that the storage nodes are spread across multiple data centers and these centers are connected via high speed network links.

Handling Permanent Failures – Replica Synchronization

Hinted Handoff is used to handle transient failures. What if the hinted replica itself becomes unavailable? To handle such situations, DynamoDB implements an anti-entropy protocol to keep the replicas synchronized. A Merkle Tree is used for the purpose of inconsistency detection and minimizing the amount of data transferred.

According to Wikipedia, a “Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels of its children nodes.”. Parent nodes higher in the tree are hashes of their respective children. The principal advantage of a Merkle tree is that each branch of the tree can be checked independently without requiring nodes to download the entire tree or the entire data set. Moreover, Merkle trees help in reducing the amount of data that needs to be transferred while checking for inconsistencies among replicas. For instance, if the hash values of the root of two trees are equal, then the values of the leaf nodes in the tree are equal and the nodes require no synchronization. If not, it implies that the values of some replicas are different. In such cases, the nodes may exchange the hash values of children and the process continues until it reaches the leaves of the trees, at which point the hosts can identify the keys that are “out of sync”.

DynamoDB uses Merkle trees for anti-entropy as follows: Each node maintains a separate Merkle tree for each key range (the set of keys covered by a virtual node) it hosts. This allows nodes to compare whether the keys within a key range are up-to-date. In this scheme, two nodes exchange the root of the Merkle tree corresponding to the key ranges that they host in common. Subsequently, using the tree traversal scheme described above the nodes determine if they have any differences and perform the appropriate synchronization action. The disadvantage with this scheme is that many key ranges change when a node joins or leaves the system thereby requiring the tree(s) to be recalculated.

In the next and the final part of this series, we will discuss Membership and Failure Detection.

Article authored by Vijay Olety.

X-Post from CloudAcademy.

Posted in Technical | Tagged , , , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 5

In Part 4, we talked about partitioning and replication in detail. We introduced consistent hashing, virtual nodes and the concept of coordinator nodes and preference list. In this article, we will discuss Data Versioning.

Data Versioning

Eventual consistency, introduced by DynamoDB, allows for the updates to be pushed to all storage nodes asynchronously. A put operation returns before the update is pushed to all replicas, which results in scenarios where a subsequent get operation may return a value that does not reflect the latest changes. Depending on network partitions and server outages, not all replicas might have the latest updates even after an extended period of time.

But there are certain requirements that do not need latest updates and can still tolerate certain inconsistencies. One such requirement is “Add To Cart” where put operation should always succeed and a get operation returning an old object is still tolerable. If a user makes a change to an earlier version of the shopping cart object, that change is still meaningful and should be preserved at all costs. However, the currently unavailable latest state of the shopping cart can have its own version of updates which should also be preserved. It is evident that data versioning has to be implemented to handle such scenarios.

In order to achieve such guarantees, DynamoDB treats the result of every modification as a new and immutable version of data. This allows for multiple versions of an object to be present in the system at the same time. If the newer version subsumes the earlier one, then the system can automatically determine the authoritative version (syntactic reconciliation). However, in some cases the versions conflict and the client has to manually perform the reconciliation (semantic reconciliation).

Data Versioning (Credit)

DynamoDB uses vector clocks in order to capture causality between multiple versions of an object. A vector clock is a list of (node, counter) pairs. One vector clock is associated with one version of every object. By analyzing the vector clocks, you can find out if the versions have a causal ordering or are on parallel branches. When a client wishes to perform an update, it must specify which version it is updating. This version can be got from an earlier get operation.

Version evolution of an object (Credit)

Let’s understand how vector clocks works: A client writes a new object. Node Sx handles this write and creates a vector clock [(Sx, 1)] for the object D1. If the client now updates the object and node Sx again handles the request, we get a new object D2 and its vector clock [(Sx, 2)]. The client updates the object again and this time node Sy handles the request leading to object D3 with the vector clock [(Sx, 2), (Sy, 1)]. When a different client tries to update the object after reading D2, the new vector clock entry for object D4 is [(Sx, 2), (Sz, 1)] where Sz is the node that handled the request. Now when a new write request is issued by the client, it sees that there are already D3 and D4 objects. If node Sx is handling the request, it performs the reconciliation process and the new data object D5 is created with the vector clock [(Sx, 3), (Sy, 1), (Sz, 1)].

One possible issue with vector clocks is that its size may grow rapidly if multiple servers coordinate the writes to a particular object. However, this issue is unlikely since in production only the nodes from the preference list handle the operation. It is still desirable to limit the size of vector clock. DynamoDB implements the following clock truncation scheme: A timestamp, which indicates the last time that node updated an item, is stored along with (node, counter) pair. When the number of (mode, counter) pairs reaches a threshold (say 15), the oldest pair is removed from the clock.

In the next article, we will look into how to Handle Failures.

Article authored by Vijay Olety.

X-Post from CloudAcademy.com

Posted in Technical | Tagged , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 4

In Part 3, we mentioned the various distributed techniques used while architecting NoSQL data stores. A table nicely summarized these techniques and their advantages. In this article, we will go into the details of partitioning and replication.

Partitioning

As mentioned earlier, the key design requirement for DynamoDB is to scale incrementally. In order to achieve this, there must be a mechanism in place that dynamically partitions the entire data over a set of storage nodes. DynamoDB employs consistent hashing for this purpose. As per the Wikipedia page, “Consistent hashing is a special kind of hashing such that when a hash table is resized and consistent hashing is used, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.”

Consistent HashingConsistent Hashing (Credit)

Let’s understand the intuition behind it: Consistent hashing is based on mapping each object to a point on the edge of a circle. The system maps each available storage node to many pseudo-randomly distributed points on the edge of the same circle. To find where an object O should be placed, the system finds the location of that object’s key (which is MD5 hashed) on the edge of the circle; then walks around the circle until falling into the first bucket it encounters. The result is that each bucket contains all the resources located between its point and the previous bucket point. If a bucket becomes unavailable (for example because the computer it resides on is not reachable), then the angles it maps to will be removed. Requests for resources that would have mapped to each of those points now map to the next highest point. Since each bucket is associated with many pseudo-randomly distributed points, the resources that were held by that bucket will now map to many different buckets. The items that mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other buckets will still do so and do not need to be moved. A similar process occurs when a bucket is added. You can clearly see that with this approach, the additions and deletions of a storage node only affects its immediate neighbors and all the other nodes remain unaffected.

The basic implementation of consistent hashing has its limitations:

  1. Random position assignment of each node leads to non-uniform data and load distribution.
  2. Heterogeneity of the nodes is not taken into account.

To overcome these challenges, DynamoDB uses the concept of virtual nodes. A virtual node looks like a single data storage node but each node is responsible for more than one virtual node. The number of virtual nodes that a single node is responsible for depends on its processing capacity.

Replication

Every data item is replicated at N nodes (and NOT virtual nodes). Each key is assigned to a coordinator node, which is responsible for READ and WRITE operation of that particular key. The job of this coordinator node is to ensure that every data item that falls in its range is stored locally and replicated to N - 1 nodes. The list of nodes responsible for storing a certain key is called preference list. To account for node failures, preference list usually contains more than N nodes. In the case of DynamoDB, the default replication factor is 3.

In the next article, we will look into Data Versioning.

Article authored by Vijay Olety.

X-Post from CloudAcademy.com

Posted in Technical | Tagged , , , , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 3

In Part 2, we looked at Design Considerations of NoSQL and introduced the concept of eventual consistency. In this article, we will introduce the concepts and techniques used while architecting a NoSQL system.

The core distributed systems techniques employed in DynamoDB are – partitioning, replication, versioning, membership, failure handling and scaling. Phew! Did you really think that the internals will be simple? 🙂

DynamoDB Internals

DynamoDB (Credit)

The following table summarizes the list of techniques used in DynamoDB:

Problem Technique Advantage
Partioning Consistent Hashing Incremental Scalability
High Availability for Writes Vector Clocks with reconciliation during reads Version size is decoupled from update rates
Handling temporary failures Sloppy Quorum and Hinted Handoff Provides high availability & durability guarantee when some replicas are not available
Recovery from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background
Membership & Failure Detection Gossip-based membership protocol & failure detection Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information

I know I have covered a lot of lingo and buzz words. If you really need to take a deep breath, now is the time! Fear not, in subsequent articles we will deep-dive into each of the above mentioned techniques. Remember, the devil is in the details!

System Interface

DynamoDB exposes two interfaces: get and put. The get(key) operation locates the object associated with the key and returns it along with a context. The put(key, context, object) operation uses key to determine the associated replicas and writes the object in those replicas. The context information is invisible to the user and contains metadata such as version. DynamoDB applies an MD5 hash on the key to generate a 128-bit identifier, which is used to determine the replicas that are responsible for serving the key.

Article authored by Vijay Olety

X-Post from CloudAcademy.com

Posted in Technical | Tagged , , , , , | Leave a comment