- Non relational
- Cluster friendly
- Schema-less : This changes the table for developing the application with relational databases. This improves flexibility in someway. You can add unstructured data in a NoSQL data-store.
- No Joints
Data-models for NoSQL
They use different data model of NoSQL databases (4 Chunks):
- Key/Value data model – You have a key and asks it to grab a value linked to the key. The database knows nothing about that the “value” of the value store. This allows you to save metadata and improved indexes on metadata values. This can be a hash map but is persistent in the disk. They have no set schema
- Document data model – The data is saved in a complex structure as a document .The best usage is JSON based structured database. They have no set schema. We can query inside this document structure. Their is an ID for indexing.
# The difference b/w last two models is a bit hazy. We can call them aggregate oriented database i.e all the data store has all the data in it without any set schema. In reality, the difference between them don’t matter that much.
# In relational DB we save the aggregate in terms of many tables as it has set schema, without the schema we cannot add a value to the database. In NoSQL we can save the whole complex structure as a data object. In relational we have many aggregate (ex: line item) that asks a object ( Order) i.e a whole unit in it self. Now in NoSQL we are just moving aggregate i.e Value in Key/value store and document in document data store. Conclusion – we have more flexibility while scaling the application layer.
- Column family data model – We have a row key that can store multiple column key and column value. It gives you advantage to pull more information from a data query. This is also schema less.
#Aggregate oriented data model is useful if you want to give and take same aggregate again and again. Its not very useful if you want to slice and dice the database (better use relational database)
- Graph databases – The notable examples are Neo4J. They break a data into many components and handle them very carefully. This is very different from all the three aggregated oriented databases. This is also schema less. It has a awesome query language.
RDBMS == ACID | NoSQL == BASE
# atomicity, consistency, integrity, and durability
#ACID is consistency and people don’t believe that NoSQL is consistent.
Problem : Suppose you have a single unit of information and when you wrote half the data,someone else reads it and vice versa. This would mess things up! We need acid updates to solve this transnational issues.
Solution : Graph databases do use ACID. Aggregated -oriented database don’t actually require ACID. Keep the transactions/ACID in a aggregate limitations i.e any aggregate update is ACID in nature.
Problem : Two users for same app is connecting to front-end to change values of a data store. if they do it at the same time, how would it work ? Since if we allow changes in same time for same piece of information – we would be having issues of maintaining consistency .
Solution : in Relational we have transaction that is typically queued for every user. It solves consistency but is not solution for all the systems. We can have “offline Lock” i.e give each aggregate data a version stamp and when user one pushes updates, user two when finishes can be used to solve the inconsistency. ACID transactions are not the same in NOSQL.
Types of Consistency :
- Logical – Sharding(use one piece of data and put on multiple nodes i.e breakdown).
- Replication – Replicate the same data object among multiple nodes. Now you have more data objects to solve this consistency issue in case of node failure.
Problem : user A & B want to book a hotel room. Both are geographically varied. The system has to decide who to give the ticket. Imagine if the communication between two nodes (Country one node & other country node) are down. In this case the system may not be connected hence booking can be made on both the sides creating confusion and issues in real world. How to solve this consistency problem :
Solution : One solution is no bookings until connection is up and other is going even though line is up. So the inconsistency can be solved by business logic in case of choice two. DynamoDB wanted shopping cart to be always live and had many business issues. So the solution to manage these inconsistency is by business logic.
Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system. A quorum-based technique is implemented to enforce consistent operation in a distributed system
RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.
A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which — depending on your settings — RYW consistency may or many not be assured.
The core ideas of RYW consistency, as implemented in various NoSQL systems, are:
- Let N = the number of copies of each record distributed across nodes of a parallel system.
- Let W = the number of nodes that must successfully acknowledge a write for it to be successfully committed. By definition, W <= N.
- Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R <= N.
- The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.
- As long as R + W > N, you are assured of RYW consistency.
Example: Let N = 3, W = 2, and R = 2. Suppose you write a record successfully to at least two nodes out of three. Further suppose that you then poll all three of the nodes. Then the only way you can get two values that agree with each other is if at least one of them — and hence both — return the value that was correctly and successfully written to at least two nodes in the first place.
In a conventional parallel DBMS, N = R = W, which is to say N-R = N-W = 0. Thus, a single hardware failure causes data operations to fail too. For some applications — e.g., highly parallel OLTP web apps — that kind of fragility is deemed unacceptable.
On the other hand, if W< N, it is possible to construct edge cases in which two or more consecutive failures cause incorrect data values to actually be returned. So you want to clean up any discrepancies quickly and bring the system back to a consistent state. That is where the idea of eventual consistency comes in, although you definitely can — and in some famous NoSQL implementations actually do — have eventual consistency in a system that is not RYW consistent.
When to use NoSQL ?
Its a self perception but some main drivers are :
- If you have larger data and or is unstructured. Easy to query and program.
- People want to program easily for natural aggregate data objects.
- People use them for Agile analytics opposite to data warehousing concept. Most people use Graph databases for it.