Application Rightsizing on AWS is a platform which reduces your AWS EC2 spend across three main areas:

  1. Instance Optimization – usage of Spot, Reserved and On-Demand instances
  2. Time Optimization – schedule start and stop of instances
  3. Performance Optimization – analyse application performance and rightsize instances

Time-based Optimization is about the ability to schedule workloads (instances and applications) based on some policies (most relevant for dev/test workloads). Instance Optimization is the automatic usage of Spot Instances at all layers of your application – be it web / app tier or API tier as well as optimal utilization of purchased Reserved Instances across multiple accounts.

In this post, I will talk about the third critical area of optimization which is what we call as Performance Optimization or Rightsizing of your application.

Application Infrastructure

One of the reasons why enterprises move to the cloud is the flexibility it offers in terms of infrastructure management and the ease with which it can be changed in an instant. Is your infrastructure over-provisioned? Just do a scale-in. Is it under-provisioned? A simple scale-out should do the trick. What’s the assumption you have made during the scaling process? That the instances are well utilized and is being used to its maximum. This is partly true though when you look at one metric at a time.

Why you ask? You perform a scaling operation when a certain threshold is breached. Let’s take an example: you have configured your scale-out policy to add one instance to the cluster when CPU utilization touches 60%. This makes sense as you need more instances to handle traffic spikes. But what about the instance’s memory utilization? How do we ensure that memory is being utilized efficiently? What if memory is always at 5% usage? Scaling does not help and this is where Rightsizing comes to our rescue.



Rightsizing is the process of analyzing your workloads and recommending the right instance type to minimize wastage. During the planning and provisioning of the underlying infrastructure, we tend to make certain assumptions regarding the application performance. Over time, we capture metrics and validate our assumptions and take necessary actions if needed. But this is a manual and cumbersome Devops exercise. Wouldn’t it be nice if this can be automated completely? takes rightsizing very seriously and guarantees that the application performance is never degraded when we make recommendations.

Recommendation Engine

When customers move their workloads onto our platform, we start orchestration of their infrastructure on RI and Spot Instances to achieve savings of 75% and more. Additionally, we start monitoring the application metrics such as CPUUtilization, memory usage (available via an opt-in during workload migration) and network latency over a two-week window. This data is fed into our proprietary Recommendation Engine.

Our engine first sifts through the data. It plots a time-series graph and calculates the peaks, valleys and averages. It then matches these numbers with in-built breach thresholds. If the instances are being utilized efficiently, then give yourself a pat on your back! You are a DevOps expert and in complete control.

More often than not, our Recommendation Engine sees that application infrastructure is under-utilized and it then proceeds to recommend an appropriate rightsized instance. This is a complex multi-stage process and broadly involves:

  • Narrow down to a subset of “suitable” instances spanning across families and sizes. This is really a big deal: Consider an large instance where CPU is maxed out and memory is under-utilized. If we restrict recommendations within the m4 family, then we can only do so much. But if we were to move to a different family say c4, then we can improve CPU utilization and memory usage as it is a compute-optimized instance.
  • For each of these “suitable” instances, find out if there is Spot capacity available in the Region and Availability Zones where the application is running. Narrow down the list further.
  • For each of these instances, categorize the Spot Instance likelihood of getting terminated as Low, Medium and High. Narrow down the list further to include instances falling into Low and Medium risk.
  • For each of these instances, calculate the Spot price and find out the cheapest one.
  • Recommend this instance to the customer!


Rightsizing in action

We eat our own dog food! We use rightsizing on your own infrastructure and act upon them.


As you can see from the figure above, one of our internal systems is running on m4.xlarge. Our recommendation engine analyzed the application performance metrics and suggested that the most appropriate instance type is r3.large. It also displays information on savings achieved when we make the switch – 95 USD per instance-month and 23% of additional savings.

Register now for a 14-day free trial of CMPUTE.IO, the only unified platform to achieve all the 3 optimizations and see the results instantly.


Vijay Olety is a Founding Engineer and Technical Architect at He likes to be called a “Garage Phase Developer” who is passionate about Cloud, Big Data, and Search. He holds a Masters Degree in Computer Science from IIIT-B.

X-Post from blog

Posted in Uncategorized | Leave a comment

5 Little Known Facts About Spot Instances


Spot Instances are deeply discounted compute capacity that can provide up to 90% savings over On-demand instances. In other words, for the same budget you get 2-10x times the compute capacity. This is  lucrative for enterprises running compute intensive workloads such as video transcoding, big data analytics (Hadoop, Spark processing), log analysis and other batch jobs.

However, enterprises have not gotten around to using Amazon Spot instances effectively, though they have been around for over 7 years. RightScale published the “State of the Cloud” report recently and “cost optimization” was one of the key challenges reported by enterprise AWS users. Not so surprisingly, only 14% of the enterprises are looking at using Spot Instances as a cost optimization model.

Most of you who have explored spot instances would know that the price determination occurs dynamically based on demand-supply economics. We have listed 5 interesting things that you might not know about Spot:

    1. Spot price differs across AZ’s (Availability Zones) in the same region – Different AZ’s within a region can have completely different pricing for the same instance type. It is important to launch an instance at a specific AZ by monitoring the price whenever you need an instance. This can result in great savings (or losses if done incorrectly)
    2. Spot Blocks are instances which you can block for a finite duration (1-6 hours). These instances provides more reliability than regular spot instances (you are guaranteed availability for the entire duration) but are more pricier than spot (30-45% less than on-demand as against 80-90% savings using spot). These are really good for defined-duration workloads and if used in conjunction with spot instances and reserved instances can be used for other workloads as well (which is what Batchly does)


  1. You can run your Big Data/Hadoop Jobs (through Amazon EMR) on spot instances for great savings on large workloads. This however, requires one to bid, manage and monitor the instances (similar to regular spot instances). One best practice is to run the task nodes on spot while running your filesystem (HDFS, Hbase etc) in Master & core nodes on AWS on-demand.
  2. Until recently, spot instance would get taken off the moment you got outbid. This gave no time for workloads to perform any evasive action (to store the state or transfer processed information). This changed last year and now you get a 2 minute warning or “Termination Notice” to help you manage your application and take necessary preventive action.
  3. Spot Instance limits – Just like many other AWS services, Spot instances also has soft limits (only 20 spot instances to begin with). This can however be changed by submitting a request to AWS.

Bonus point: Interrupt tolerant workloads such as batch/cron jobs, load testing, video transcoding, rendering, hadoop workloads are most suited for spot. Even if one/more spot instances get taken away, the same can be processed in another spot/RI/on-demand instance(s)

X-Post from blog

Posted in Uncategorized | Leave a comment

Spot vs On-Demand: Which is better in a highly variable environment?

Be honest: Hasn’t this question crossed your mind often? I am sure it has because it has crossed mine. What’s the answer then, you ask? I would always say: “Go for Spot!”

In this post, I will explain how to make the best choice between on-demand and spot in a highly variable Amazon Web Services (AWS) environment. I hope that after reading this you will be in a better position to judge and make decisions based on your use case, scenario, and workload.

On-Demand Instances

On-Demand Instances are virtual servers that are purchased at a fixed rate per hour. Each On-Demand Instance is billed from the time it is launched until it is stopped or terminated. Partial instance hours are rounded off to the full hour during billing. Once launched, it will continue to run unless you stop it or in the rare cases of it being Scheduled for Retirement/ Unavailable.

These are the most expensive purchasing options.

Spot Instances

Spot Instances, also virtual servers, are spare computing capacity available at deeply discounted prices. AWS allows users to bid on unused EC2 capacity in a region at any given point and run those instances for as long as their bid exceeds the current Spot Price. The Spot Price changes periodically based on supply and demand, and all users bids that meet or exceed it gain access to the available Spot Instances.

These are, by far, the cheapest purchasing option providing savings of up to 90% over On-Demand costs.



Google Trends, without any doubt, clarifies that Spot Instances are gaining popularity at a fast pace. But we have been in this space for close to a decade and know for a fact that businesses are still wary to make the shift to Spot Instances. Why is there less adoption? It’s because of a catch.

The Spot Instances Catch

Like all good things, Spot Instances, too, come with a catch: It can be taken away from you at any time!

Here’s how it works: You place a bid. As long as your bid price is higher than the current Spot Price, Spot Instances will be available to you. The moment the Spot Price goes higher than your bid price, the instances are taken away from you with a 2-minute warning, known quite aptly as Spot Instance Termination Notice. This gives your applications time to prepare for a graceful shutdown.

Most of the existing applications are designed with the assumption that the servers will never go down. It is thus an additional engineering cost and effort to handle this scenario. This involves snapshotting the application and checkpointing data to persistent storage at regular intervals to resume the operation on another server. Due to this, businesses are skeptical to adopt Spot Instances for their use cases.


Typical Usage Scenario

A request for a certain number of instances of a particular instance type is placed. Consider the following example: You run a web-tier application and have a need for 100  m3.large instances. The Spot Price on 04 Oct 2016 at 12:12PM UTC+0530 is 0.0285 USD. What should your bid price be? That question is worth another blog post but say you bid at 0.1 USD. As per Spot availability and your account limits, you will get at most 100 m3.large instances.


As you move through the days, you see small Spot Price spikes hovering in the range of 0.08 USD – 0.1 USD. You did good by setting a high bid price! But the real shocker comes on 06 Oct 2016 at 04:57PM UTC+0530. The Spot Price is a massive 1.46 USD. All the 100 m3.large instances are taken away and your web-tier is unreachable. Your application is experiencing an indefinite downtime and you are losing money real quick.


Mitigation Strategies

The fault, which is apparent in hindsight, was relying only on 1 instance type. You now know that at least 2 instance types must be used in a Spot-only environment to achieve better availability.

You request for 50 m3.large and 50 m4.large. You also increase your bid price to 0.2 USD, just to be safe.


The unexpected has occurred. At 06 Oct 2016 at 08:27PM UTC+0530, the Spot Price is 0.66 USD. Your bid price is low and both m3.large and  m4.large are taken away from you and your application is experiencing downtime again! What did you do wrong?

The mistake, again clearly apparent in hindsight, is that both instance types were launched in the same Availability Zone, us-east-1e.

You now request for 50 m3.large in us-east-1e and 50 r3.large in us-east-1c. You increase your bid price to 0.3 USD. Again, just to be safe.


My God, it happened again! At 06 Oct 2016 at 08:27PM UTC+0530, the Spot Price spiked to 1.7500 USD. All your Spot Instances are taken away and your application is witnessing uncertain downtime again! What went wrong now?

In summary, mitigation strategies that we looked into are:

  • Never launch all servers of a single instance type
  • At least 2 instance types are a must for better availability
  • Never launch all in a single Availability Zone

Even with these in place, we still saw a downtime. It is a never ending cat-and-mouse game. No matter how safe you think you are, there is always an edge case that causes havoc.

There are several other mitigation approaches that have to be in place to make Spot Instances truly available. Top of my head, I can recall:

  • Choosing the best bid price for a particular instance type
    • It really is an art!
  • A Spot Availability Predictor which predicts the likelihood of an instance being taken away
    • Having access to historical Spot Prices helps build robust Machine Learning prediction models. This helps you answer questions like:
      • Which is a good instance type to launch?
      • Should I launch it in us-east-1a or us-east-1c?
      • Is it ok to launch now or 10 minutes later for better availability?
  • Falling back to an appropriate similar instance
    • Spot Instances are subject to availability as well as your account limits. Would you rather go for On-Demand or a similar instance on Spot?
  • Inclusion of Spot Fleet
    • Spot Fleet is a boon. It vastly simplifies the management of thousands of Spot Instances in one request. You can choose between lowestPrice and diversified as your allocation strategy to meet your requirements.
  • Spot Fleet has limitations though, to name a few:
    • No native support to attach Spot Fleet instances to an ELB.
    • It cannot intelligently decide which instance type to launch if all the Spot Instances are down.
  • Usage of Spot Blocks
    • Spot Blocks are a good fit for Defined-Duration workloads. Just specify the Block Duration parameter when placing a Spot Instance request. But it is twice as costly as a regular Spot Instance. Why should you go for it?

As you can see, Spot Instances management is very complex. There are 50+ instance types, 11 regions, 35+ Availability Zones, and 600+ bidding strategies.

Back to the Original Question

It really boils down to, “Do you truly want 100% availability at all times OR 100% availability with a teeny-weeny chance of 99.99% availability sometimes, if at all, at a cool savings of up to 90%?” Think about it: The worst case is around 10 minutes downtime in a year and if your annual On-Demand expenditure is 1M USD, then you get the same performance and functionality and similar uptime with savings of up to 900,000 USD.

I completely understand if you say, “Spot management is none of my business.” But, quite frankly, it is ours! Register now for a free 14-day trial of Batchly.

I know you have one final question that still has you scratching your head: “Can I get the best of both worlds?” Yes you can! We also offer a hybrid model of On-Demand and Spot Instances for 100% availability for the really mission-critical workloads.

Remember: A penny saved is a penny earned!


Vijay Olety is a Technical Architect at 47Line. Fondly known as “Garage Phase Developer”, he is passionate about Cloud, Big Data and Search. He holds an M.Tech in Computer Science from IIIT-B.

X-Post from blog

Posted in Uncategorized | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 7

In Part 6, we discussed handling failures via Hinted Handoff & Replica Synchronization. We also talked about the advantages of using a Sloppy Quorum and Merkle Trees.

In this last & final part of the series, we will look into Membership and Failure Detection.

Ring Membership

In any production environment, node outages are often transient. But it rarely signifies permanent failure and hence there is no need for repair or rebalancing the partition. On the other hand, manual errors might result in unintentional startup of new DynamoDB nodes. A proper mechanism is essential for the addition and removal of nodes from the DynamoDB Ring. An administrator uses a tool to issue a membership change command to either add / remove a node. The node that picks up this request writes into its persistent store the membership change request and the timestamp. A gossip-based protocol transfers the membership changes and maintains a consistent view of membership across all nodes. Each node contacts a peer chosen at random every second and the two nodes efficiently reconcile their persisted membership change histories. Partitioning & placement information also propagates via the gossip-based protocol and each storage node is aware of the token ranges its peers are responsible for. This allows each node to forward a key’s read/write operations to the right set of nodes directly.

Ring Membership (Credit)

External Discovery

It’s best to explain with an example: An administrator joins node A to the ring. He then joins node B to the ring. Nodes A and B consider itself as part of the ring, yet neither would be immediately aware of each other. To prevent these logical partitions, DynamoDB introduced the concept of seed nodes. Seed nodes are fully functional nodes that are discovered via an external mechanism (static configuration or a configuration service) and are known to all nodes. Since each node communicates with the seed node and gossip-based protocol transfer the membership changes, logical partitions are highly unlikely.

Failure Detection

Failure detection in DynamoDB is used to avoid attempts to communicate with unreachable peers during get() and put() operations and when transferring partitions and hinted replicas. For the purpose of avoiding failed attempts at communication, a purely local notion of failure detection is entirely sufficient: node A may consider node B failed if node B does not respond to node A’s messages (even if B is responsive to node C‘s messages). In the presence of a steady rate of client requests generating inter-node communication in the DynamoDB ring, a node A quickly discovers that a node B is unresponsive when B fails to respond to a message; Node A then uses alternate nodes to service requests that map to B‘s partitions; A also periodically retries B to check for the latter’s recovery. In the absence of client requests to drive traffic between two nodes, neither node really needs to know whether the other is reachable and responsive.


This exhaustive 7-part series detailing every component is sufficient to understand the design and architecture of any NoSQL system. Phew! What an incredible journey it has been these couple of months delving into the internals of DynamoDB. Having patiently read this far, you are amongst the chosen few who have this sort of deep NoSQL knowledge. You can be extremely proud of yourself!

Let’s eagerly await another expedition soon!

Article authored by Vijay Olety.

X-Post from CloudAcademy.

Posted in Technical | Tagged , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 6

In Part 5, we spoke about data versioning, the 2 reconciliation strategies and how vector clocks are used for the same. In this article, we will talk about Handling Failures.

Handling Failures – Hinted Handoff

Even under the simplest of failure conditions, DynamoDB would experience reduced durability and availability if the traditional form of quorum approach is used. In order to overcome this it uses a sloppy quorum; all READ and WRITE operations are performed on the first N healthy nodes from the preference list, which may not be the first N nodes encountered by traversing the consistent hashing ring.

Partitioning & Replication of Keys in Dynamo (Credit)

Consider the above figure: If A is temporarily not reachable during a WRITE operation, then the replica that would have lived on A will be sent to D to maintain the desired availability and durability guarantees. The replica sent to D will have a hint in its metadata which tells who the intended recipient was (in our case, it is A). The hinted replicas are stored in a separate local database which is scanned periodically to detect if A has recovered. Upon detection, D will send the replica to A and may delete the object from its local store without decreasing the total number of replicas. Using hinted handoff, DynamoDB ensures that READ and WRITE are successful even during temporary node or network failures.

Highly available storage systems must handle failures of an entire data center. DynamoDB is configured such that each object is replicated across data centers. In terms of implementation detail, the preference list of a key is constructed such that the storage nodes are spread across multiple data centers and these centers are connected via high speed network links.

Handling Permanent Failures – Replica Synchronization

Hinted Handoff is used to handle transient failures. What if the hinted replica itself becomes unavailable? To handle such situations, DynamoDB implements an anti-entropy protocol to keep the replicas synchronized. A Merkle Tree is used for the purpose of inconsistency detection and minimizing the amount of data transferred.

According to Wikipedia, a “Merkle tree is a tree in which every non-leaf node is labelled with the hash of the labels of its children nodes.”. Parent nodes higher in the tree are hashes of their respective children. The principal advantage of a Merkle tree is that each branch of the tree can be checked independently without requiring nodes to download the entire tree or the entire data set. Moreover, Merkle trees help in reducing the amount of data that needs to be transferred while checking for inconsistencies among replicas. For instance, if the hash values of the root of two trees are equal, then the values of the leaf nodes in the tree are equal and the nodes require no synchronization. If not, it implies that the values of some replicas are different. In such cases, the nodes may exchange the hash values of children and the process continues until it reaches the leaves of the trees, at which point the hosts can identify the keys that are “out of sync”.

DynamoDB uses Merkle trees for anti-entropy as follows: Each node maintains a separate Merkle tree for each key range (the set of keys covered by a virtual node) it hosts. This allows nodes to compare whether the keys within a key range are up-to-date. In this scheme, two nodes exchange the root of the Merkle tree corresponding to the key ranges that they host in common. Subsequently, using the tree traversal scheme described above the nodes determine if they have any differences and perform the appropriate synchronization action. The disadvantage with this scheme is that many key ranges change when a node joins or leaves the system thereby requiring the tree(s) to be recalculated.

In the next and the final part of this series, we will discuss Membership and Failure Detection.

Article authored by Vijay Olety.

X-Post from CloudAcademy.

Posted in Technical | Tagged , , , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 5

In Part 4, we talked about partitioning and replication in detail. We introduced consistent hashing, virtual nodes and the concept of coordinator nodes and preference list. In this article, we will discuss Data Versioning.

Data Versioning

Eventual consistency, introduced by DynamoDB, allows for the updates to be pushed to all storage nodes asynchronously. A put operation returns before the update is pushed to all replicas, which results in scenarios where a subsequent get operation may return a value that does not reflect the latest changes. Depending on network partitions and server outages, not all replicas might have the latest updates even after an extended period of time.

But there are certain requirements that do not need latest updates and can still tolerate certain inconsistencies. One such requirement is “Add To Cart” where put operation should always succeed and a get operation returning an old object is still tolerable. If a user makes a change to an earlier version of the shopping cart object, that change is still meaningful and should be preserved at all costs. However, the currently unavailable latest state of the shopping cart can have its own version of updates which should also be preserved. It is evident that data versioning has to be implemented to handle such scenarios.

In order to achieve such guarantees, DynamoDB treats the result of every modification as a new and immutable version of data. This allows for multiple versions of an object to be present in the system at the same time. If the newer version subsumes the earlier one, then the system can automatically determine the authoritative version (syntactic reconciliation). However, in some cases the versions conflict and the client has to manually perform the reconciliation (semantic reconciliation).

Data Versioning (Credit)

DynamoDB uses vector clocks in order to capture causality between multiple versions of an object. A vector clock is a list of (node, counter) pairs. One vector clock is associated with one version of every object. By analyzing the vector clocks, you can find out if the versions have a causal ordering or are on parallel branches. When a client wishes to perform an update, it must specify which version it is updating. This version can be got from an earlier get operation.

Version evolution of an object (Credit)

Let’s understand how vector clocks works: A client writes a new object. Node Sx handles this write and creates a vector clock [(Sx, 1)] for the object D1. If the client now updates the object and node Sx again handles the request, we get a new object D2 and its vector clock [(Sx, 2)]. The client updates the object again and this time node Sy handles the request leading to object D3 with the vector clock [(Sx, 2), (Sy, 1)]. When a different client tries to update the object after reading D2, the new vector clock entry for object D4 is [(Sx, 2), (Sz, 1)] where Sz is the node that handled the request. Now when a new write request is issued by the client, it sees that there are already D3 and D4 objects. If node Sx is handling the request, it performs the reconciliation process and the new data object D5 is created with the vector clock [(Sx, 3), (Sy, 1), (Sz, 1)].

One possible issue with vector clocks is that its size may grow rapidly if multiple servers coordinate the writes to a particular object. However, this issue is unlikely since in production only the nodes from the preference list handle the operation. It is still desirable to limit the size of vector clock. DynamoDB implements the following clock truncation scheme: A timestamp, which indicates the last time that node updated an item, is stored along with (node, counter) pair. When the number of (mode, counter) pairs reaches a threshold (say 15), the oldest pair is removed from the clock.

In the next article, we will look into how to Handle Failures.

Article authored by Vijay Olety.

X-Post from

Posted in Technical | Tagged , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 4

In Part 3, we mentioned the various distributed techniques used while architecting NoSQL data stores. A table nicely summarized these techniques and their advantages. In this article, we will go into the details of partitioning and replication.


As mentioned earlier, the key design requirement for DynamoDB is to scale incrementally. In order to achieve this, there must be a mechanism in place that dynamically partitions the entire data over a set of storage nodes. DynamoDB employs consistent hashing for this purpose. As per the Wikipedia page, “Consistent hashing is a special kind of hashing such that when a hash table is resized and consistent hashing is used, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.”

Consistent HashingConsistent Hashing (Credit)

Let’s understand the intuition behind it: Consistent hashing is based on mapping each object to a point on the edge of a circle. The system maps each available storage node to many pseudo-randomly distributed points on the edge of the same circle. To find where an object O should be placed, the system finds the location of that object’s key (which is MD5 hashed) on the edge of the circle; then walks around the circle until falling into the first bucket it encounters. The result is that each bucket contains all the resources located between its point and the previous bucket point. If a bucket becomes unavailable (for example because the computer it resides on is not reachable), then the angles it maps to will be removed. Requests for resources that would have mapped to each of those points now map to the next highest point. Since each bucket is associated with many pseudo-randomly distributed points, the resources that were held by that bucket will now map to many different buckets. The items that mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other buckets will still do so and do not need to be moved. A similar process occurs when a bucket is added. You can clearly see that with this approach, the additions and deletions of a storage node only affects its immediate neighbors and all the other nodes remain unaffected.

The basic implementation of consistent hashing has its limitations:

  1. Random position assignment of each node leads to non-uniform data and load distribution.
  2. Heterogeneity of the nodes is not taken into account.

To overcome these challenges, DynamoDB uses the concept of virtual nodes. A virtual node looks like a single data storage node but each node is responsible for more than one virtual node. The number of virtual nodes that a single node is responsible for depends on its processing capacity.


Every data item is replicated at N nodes (and NOT virtual nodes). Each key is assigned to a coordinator node, which is responsible for READ and WRITE operation of that particular key. The job of this coordinator node is to ensure that every data item that falls in its range is stored locally and replicated to N - 1 nodes. The list of nodes responsible for storing a certain key is called preference list. To account for node failures, preference list usually contains more than N nodes. In the case of DynamoDB, the default replication factor is 3.

In the next article, we will look into Data Versioning.

Article authored by Vijay Olety.

X-Post from

Posted in Technical | Tagged , , , , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 3

In Part 2, we looked at Design Considerations of NoSQL and introduced the concept of eventual consistency. In this article, we will introduce the concepts and techniques used while architecting a NoSQL system.

The core distributed systems techniques employed in DynamoDB are – partitioning, replication, versioning, membership, failure handling and scaling. Phew! Did you really think that the internals will be simple? 🙂

DynamoDB Internals

DynamoDB (Credit)

The following table summarizes the list of techniques used in DynamoDB:

Problem Technique Advantage
Partioning Consistent Hashing Incremental Scalability
High Availability for Writes Vector Clocks with reconciliation during reads Version size is decoupled from update rates
Handling temporary failures Sloppy Quorum and Hinted Handoff Provides high availability & durability guarantee when some replicas are not available
Recovery from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background
Membership & Failure Detection Gossip-based membership protocol & failure detection Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information

I know I have covered a lot of lingo and buzz words. If you really need to take a deep breath, now is the time! Fear not, in subsequent articles we will deep-dive into each of the above mentioned techniques. Remember, the devil is in the details!

System Interface

DynamoDB exposes two interfaces: get and put. The get(key) operation locates the object associated with the key and returns it along with a context. The put(key, context, object) operation uses key to determine the associated replicas and writes the object in those replicas. The context information is invisible to the user and contains metadata such as version. DynamoDB applies an MD5 hash on the key to generate a 128-bit identifier, which is used to determine the replicas that are responsible for serving the key.

Article authored by Vijay Olety

X-Post from

Posted in Technical | Tagged , , , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 2

In Part 1, we introduced you to NoSQL, spoke about the CAP theorem and certain assumptions that need to be made while designing NoSQL data stores. Let’s dive deeper!

Design Considerations

Traditional commercial systems and applications perform data replication in a synchronized manner. The advantage of this approach is that data is always consistent. But the down side is that the system might itself not be available (CAP theorem). To put it simple: the data is unavailable until it is absolutely certain that it is replicated across all nodes correctly.

Alas! The Web world lives in its own perceived reality. 🙂 Systems go down and network fails regularly. Availability is the single largest factor which makes / breaks a company. It is thus imperative that we handle such scenarios. Netflix’s Chaos Monkey helps us architect our product to take into account these failures. In order to ensure availability at all costs, optimistic asynchronous replication strategies can be put in place. The drawback, however, is that it leads to conflicting changes to data which must be detected and resolved. The process of conflict resolution introduces 2 new problems: when to resolve them and who resolves them. DynamoDB introduces the novel concept of eventually consistent data store; that is all updates reach all nodes eventually.

Deciding when to perform the conflict resolution is a primary design consideration. We can perform it during the READ operation or WRITE operation. Many legacy data stores chose to do the conflict resolution during the WRITE operation. In such systems, WRITEs will be rejected if data is not replicated across all nodes. In large e-commerce companies such as Amazon, rejecting WRITEs is not an option as it leads to revenue loss and poor customer experience. Hence, DynamoDB does the complex conflict resolution during READs.

Let’s take an example to understand it better. Consider a system with 3 nodes: NODE1, NODE2 and NODE3. In a traditional system, a WRITE to NODE2 must be replicated to NODE1 and NODE3 and only then is the WRITE operation considered successful. This synchronized replication takes time to complete during which time the system is NOT available. But systems using DynamoDB have the option to defer this update in exchange for higher availability. So a WRITE to NODE2 is considered successful as long as NODE2 is able to honor that request. NODE2 eventually replicates it to NODE1 and NODE3. DynamoDB usually takes a second (or a maximum of a couple of seconds) to achieve consistency across all nodes.
Note: In case your product, like ours, needs a strongly consistent read just set the value of the attribute ConsistentRead to true.

Another very important design consideration is who performs the conflict resolution. It can either be done by the data store (DynamoDB in our case) or the application. The data store usually employs simple policies and rules such as “last WRITE wins”, which is pretty good in majority of the cases. If the application wishes to have complex rules and implement its own conflict resolution mechanisms, then it is free to do so.

A couple of other design considerations are as follows:

  1. Incremental Scalability: The data store should be able to scale-out 1 node at a time, with minimal or no impact on the system itself.
  2. Symmetry: All nodes in the data store are peers, i.e. all nodes are equal and share the same set of responsibilities.
  3. Decentralization: With a central authority, the most common problem faced is “single point of failure”. Decentralization helps us mitigate this and keep the system simple, more scalable and more available.
  4. Heterogeneity: Different nodes in the data store might have different configurations. Some nodes might be optimized for storage and some might be plain commodity hardware. The data store should take into account this heterogeneous mix of nodes to distribute tasks proportional to its capabilities.

In the next blog post, we will look into System Architecture.

Article authored by Vijay Olety

X-Post from

Posted in Technical | Tagged , , , | Leave a comment

DynamoDB: An Inside Look Into NoSQL – Part 1

In our earlier posts (here and here), we introduced the Hadoop ecosystem & explained its various components using a real world example of the retail industry. We now possess a fair idea of the advantages of Big Data. NoSQL datastores are being used extensively in real-time & Big Data applications, which is why a look into its internals would help us make better design decisions in our applications.

NoSQL datastores provide a mechanism for retrieval and storage of data items that is modeled in a non-tabular manner. Simplicity of design, horizontal scalability and control over availability form the motivations for this approach. NoSQL is governed by the CAP theorem in the same way RDBMS is governed by ACID properties.

NoSQL Triangle (Credit)

From the AWS stable, DynamoDB is the perfect choice if you are looking for a NoSQL solution. DynamoDB is a “fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. Its guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile and many other applications.” Since it is a fully managed service, you need not worry about provisioning & managing of the underlying infrastructure. All the heavy-lifting is taken care for you.

Majority of the documentation available on the Net are how-to-get-started guides with examples of DynamoDB API usage. Let’s look at the thought process and design strategies that went into the making of DynamoDB.

“DynamoDB uses a synthesis of well known techniques to achieve scalability and availability: Data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. DynamoDB employs a gossip based distributed failure detection and membership protocol. Dynamo is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from DynamoDB without requiring any manual partitioning or redistribution.” You must be wondering – “Too much lingo for one paragraph”. Fret not, why fear when I am here 🙂 Let’s take one step at a time, shall we!

Requirements and Assumptions

This class of NoSQL storage system has the following requirements –

  • Query Model: A “key” uniquely identifies a data item. Read and write operations are performed on this data item. It must be noted that no operation spans across multiple data items. There is no need for relational schema and DynamoDB works best when a single data item is less than 1MB.
  • ACID Properties: As mentioned earlier, there is no need for relational schema and hence ACID (Atomicity, Consistency, Isolation, Durability) properties are not required. The industry and the academia acknowledge that ACID guarantees lead to poor availability. Dynamo targets applications that operate with weaker consistency if it results in high availability.
  • Efficiency: DynamoDB needs to run on commodity hardware infrastructure. Stringent SLA (Service Level Agreement) ensure that latency and throughput requirements are met for the 99.9% percentile of the distribution. But everything has a catch – the tradeoffs consist of performance, cost, availability and durability guarantees.

In subsequent articles, we will look into Design Considerations & System Architecture.

Article authored by Vijay Olety



X-Post from

Posted in Technical | Tagged , , , | Leave a comment