Cassandra Architecture, Data Structure and Basic operations PART: 1


What does facebook use as its Data storage? It was the first question when my team lead introduced me with Cassandra, a misterious character of Greek mythology (if you are so interested about her click here). Just joking. Let's be serious about "Cassandra". Apache Cassandra, a ranked Apache project developed at Facebook and built based on Amazon’s Dynamo and Google’s BigTable, in a distributed manner for managing large amounts of structured data across many commodity servers. As well as it provides a highly available service without single point of failure.

Architecture and Data Structure

Cassandra’s architecture is designed with keeping some major concepts in mind, such as scalability, performance, and offer continuous uptime. Rather than using a sharded architecture, Cassandra has a masterless “ring” design. Which is elegant, easy to setup and to maintain.Cassandra has peer-to-peer distributed system across its nodes. Some key features of these nodes are

  1. All the nodes in a cluster play identical role. No dependecy among the nodes, but they are interconnected as well.

  2. All the nodes accept read and write requests, regardless of where the data is actually located in the cluster.

  3. When a node goes down, other nodes can serve read/write requests in the network.

Key Components of Cassandra are

  1. Node − placeholder of data storage.

  2. Data center − collection of related nodes.

  3. Cluster − component that contains one or more data centers.

  4. Commit log − a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.

  5. Mem-table − a memory-resident data structure. After commit log, the data will be written here.

  6. SSTable − a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.


Download the latest Java version, set PATH and JAVA_HOME variable to your ~/.bashrc or ~/.zshrc file, and load the settings with sourse ~/.bashrc command

Then Download Cassandra

        deb http://www.apache.org/dist/cassandra/debian 3.0 main deb-src http://www.apache.org/dist/cassandra/debian 3.0 main`

After apt-get update you may encounter this error

    GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 0353B12C

to overcome this add public key then

       gpg --keyserver pgp.mit.edu --recv-keys <public key> gpg --export --armor <public key> | sudo apt-key add -

get the list of

from here

Finally run

       sudo apt-get update sudo apt-get install cassandra

Don't forget to update the cassandra.yaml file if you need any customization which resides in /etc/cassandra directory

Some Basic operations

   |_.Operation|_. Syntax |_. Example |
   | Create Keyspace |
   { 'class' : 'SimpleStrategy', 'datacenter1' : 3 } | CREATE KEYSPACE test
   WITH REPLICATION = { 'class' : 'SimpleStrategy', 'datacenter1' : 3 } |
   | Use Keyspace | USE keyspace_name | USE test |
   | Drop Keyspace| Drop KEYSPACE keyspace_name | Drop KEYSPACE test |
   | Create Table | CREATE TABLE <tablename>('<column-definition>' ,
   (WITH <option> AND <option> | Create tabletest( col_1 text,
   col_2 text,
   col3 int PRIMARY KEY)|
   | Update Table| UPDATE <table name>
   SET <coloumn value> = new value
   WHERE Condition| UPDATE tabletest SET col_1='Framgia',
   WHERE col_3=2; |
   | Drop Table| Drop TABLE table_name | Drop TABLE tabletest |

All Rights Reserved

Let's register a Viblo Account to get more interesting posts.