As I have started to learn and post about the Apache Cassandra so in this post, I am going to share some of the important Key concepts of the Cassandra Architecture.
Cassandra supports high availability by implementing the Data Replication.
One logical database is spread among the multiple Nodes of the Cluster and it creates different replicas of the Nodes.
If one Node goes down, another Node is available with the data so it avoids a single point of failure.
Consistent Hashing Algorithm:
Actually, there are main two problems with distributed database system, The first is, every time determining a Node with the specific data set and The second is, require to reduce the data movement when adding or removing Nodes.
The Consistent Hashing Algorithm achieves this problem by storing Cassandra row keys to physical nodes.
As data is replicated across the different nodes, it ensures that data is synchronized across all the replicas. Before any operation on the nodes, it checks the last update value and version of the data.
It also provides the tunable consistency in which user can determine the consistency level.
Cassandra is using Gossip Protocol in which all Nodes discover information about the other nodes by exchanging information with each other.
Because of the network traffic, Nodes cannot exchange the information of all nodes, but it can exchange information maximum of 3 nodes.
Snitches and Replication Strategies:
Cassandra uses snitches and replication strategies to determine how data is replicated across all data centres and nodes.
The Snitches determine the how each node uses near to each other in the ring and Replication strategies this information to determine location information for each copy of the data.
A bloom filter tests the existence of a data structure and it checks that items are available in the set or not. It also reduces CPU I/O operation because it is very fast.
It is one type of hash tree and it uses for finding the differences between the Nodes.
A Sorted String Table is ordered key value map which is storing large sorted data.
Write Back Cache:
A Write Back Cache is only for writing operation in which write operations can use dedicated cache.
A memtable is residing in the memory and manage the data of Write Back Cache.
It is similar to RDBMS schema or database. It is container of all types of data and objects.
Cassandra Column Family:
It is similar to RDBMS table and it contains column and data related information.