An Introduction to NoSql databases (MongoDb)

mongo.png

Abstract

Ever since Computing came into existence, it has been a challenge for developers and commercial institutes to deliver a scalable, consistence and reliable database to handle their humongous data and as well as performance efficiency. There are various databases used today such as SQL, MySQL, ORACLE and so on but there is one that is getting recognition as well as acknowledgments in the industries today and this database is “MongoDB”.

Introduction

MongoDB (NoSQL family of database). The NoSQL is getting a lot of attraction this days but in reality, it goes way back. The term NoSQL was first used in the year 1998 by Carlo Strozzi as the name of a file based database he was developing then. However this was not the NoSQL we know today but was in fact a relational database without an SQL interface. Not until 2009 was the term used again and this time it was Eric Evans who used the term to name the current surge in non-relational databases such as of today’s. NoSQL wasn’t really a hype after then until 2000 to 2005 did it really begin to pick up some momentum and popularity and databases such as CouchDB and Google’s BigTable was released. The document based database “MongoDb” was started in 2007 as part of an open sourced cloud computing stack and it wasn’t until 2009 was the first standalone released.

What is MongoDb?

MongoDb is a cross-platform document-oriented database, unlike the traditional table-based relational database structures, MongoDB uses json like documents with dynamic schemas. Written in C, C++ and JavaScript, It was first developed in the year 2007 by 10gen (now known as MongoDB Inc.) to be a backend database capable of handling large amount of data at little expense. Major Websites such as The New York Times, eBay, Foursquare, Craigslist and so on now uses mongo as their database.

Features of MongoDb

MongoDb embraces some very effective techniques and feature, these includes:

Flexibility: Data are stored in json documents. Json provides a rich data model that seamlessly maps to native programming languages types, and the dynamic schema makes it easier to evolve your data model than with a system with enforced schemas such as a RDBMS.

Sharding: Database systems with large data sets and high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage capacity of a single machine. Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives. To address these issues of scales, MongoDB uses sharding for storing data across multiple machines.

Power: MongoDB provides a lot of the features of a traditional RDBMS such as secondary indexes, dynamic queries, sorting, rich updates, upserts (update if document exists, insert if it doesn’t), and easy aggregation. This gives you the breadth of functionality that you are used to from an RDBMS, with the flexibility and scaling capability that the non-relational model allows. Take for example the New York Times posts over 600 pieces of content every day, often putting links to those pieces of content on twitter which gets rebroadcasted across twitter an average of about 25,000 times a day. Using mongodb about 100 GB of data are stored each month with no problem.

Scaling: MongoDb keeps related data together in documents, queries are executed faster unlike the relational database where related data is stored into multiple tables and then needs to be joined later. Mongodb also makes it easier to scale out your database. Autosharding allows you to scale your cluster linearly by adding more machines. Capacity can be increased without any downtime, which is very important on the web when load can increase suddenly and bringing down the website for extended maintenance can cost your business large amounts of revenue.

Replication: A replica set consists of two or more copies of the data. Each replica set member may act in the role of primary or secondary replica (also called masters and slave) at any time. MongoDB provides high availability with replica sets. The primary replica performs all writes and reads by default. Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can also perform read operations, but the data is eventually consistent by default.

Other Features includes File Sharing and Load Balacing where MongoDB uses sharding to scale horizontally. So why should IT companies switch to NoSql databases? Below is advantages and disadavntage of chosing MongoDB over traditional databases.

Advantages of MongoDB/NoSql

  1. Elastic Scaling

  2. Flexible data models

  3. Superior Performance

  4. Embraces ORP (Object Oriented Programing) which is more flexible

  5. Cheaper Maintianace and Integrated Caching facilities

However there still exists some limitations in and downfalls when using mongoDB over relational databases such as MySql.

Limitations of MongoDB/NoSql

1. Maturity: MongoDB and NoSQL alternatives and solutions are still in nascent and pre-production stages and many key features are yet to be implemented. It’s still a work in progress and further improvement is needed unlike the RDBMS systems such as SQL that has been around much longer.

2. Support: RDMS vendors offers great reassurance such that if a system failure occurs they offer competent support. The high level of enterprise support is yet to be matched by Mongo or any other NoSQL alternatives.

3. Expertise: Most developers all around the world maybe in the field of IT and every business segment are familiar with RDBMS concepts and programming. This is due to the fact that it’s been around for quite some time now and this is like I said a major advantage of RDMS over NoSQL which however, cannot boast the same. NoSQL developers are in learning for now but as time goes this situation will be naturally addressed.

Conclusion

Both Relational Database and NoSQL have been great database inventions for distributed systems over time and used in storing data and retrieval optimized and smooth. Currently one cannot say one is better than the other for they both have their respective advantage over the other and are simply the choice of the developer to decide which applies most for which situation. Though NoSQL databases are becoming important part of the database landscape, however, enterprise are proceeding with caution being aware about the legitimate limitations associated with these databases. One thing however is for sure, NoSQL databases such as MongoDB are the future of DBMS and the majorities of company switching to this proves this. However it’s still a work in progress and needs to further involve before it can relinquish RDMS and claims the best database used in Distributed systems.