Relational to NoSQL: A Terminological Comparison of Oracle and MongoDB

Start Here

Get in touch with a
TriCore Solutions specialist

Blog | Oct 23, 2015

Relational to NoSQL: A Terminological Comparison of Oracle and MongoDB


 I recently started working on an interesting mobile application called TaskBud. This app will help users create and share tasks and domestic accounts in a small and new type of social network for the time being referred to as TaskBuddy Network.


Introduction:

I recently started working on an interesting mobile application called TaskBud
with my techie friend. This app will help users create and share tasks and domestic accounts in a small and new type of social network for the time being referred to as TaskBuddy Network. We started on Android platform and chose a Mobile Backend as a service platform. In this case- Parse. We chose this service as it saved us a lot of initial effort by avoiding the need for creating web services and costs since the initial usage was free up to a limit which was enough for us to start the app and add more than a thousand users.

Since I was responsible for the overall idea and data model of the project, I started designing the data model after a high level discussion with my partner. Being an Oracle guy and having worked with relational databases for many years, it was pretty obvious that I visualized the model in the form of tables. Just as my friend began coding the application, we hit the roadblock of a unique sequence generation. Parse doesn’t support unique sequence unlike Oracle (Sequences) and MySQL (Auto Increment column).

Having no idea why this feature was missing, I started to research and discovered that Parse actually uses MongoDB as its backend. So here I was an Oracle guy having been assigned a NoSQL project triggering my journey to the NoSQL world. To learn more on this technology, I opted for two online MongoDB DBA courses that helped me in implementing my knowledge on the subject in practice. In this article (first in its series) I’ve discussed some of the terminological differences which an Oracle DBA will find new while working with the MongoDB.

1. Rows and Columns change to Documents and Fields 

MongoDB is a document-oriented database which means everything goes into the database and is retrieved from the database in the form of documents unlike the relational databases where it is in the form of rows. Documents in MongoDB are not the usual word or text files but JSON formatted files. While internal representation is called BSON (binary-encoded serialization of JSON to support various data types), externally everything is similar to JSON.

e.g a simple two field JSON document will look like: 

{

          "groupName":"MyTasks",

          "status":"Join the taskbuddy network"

}

2. Tables change to Collections 

What we call Table in a relational database is called Collection in MongoDB. A single database can have many collections just like in relational databases where a single database can have many tables. One collection is unrelated to the other at least in terms of constraints. There is no foreign key binding the collections in MongoDB. So, there are no “joins”. Rather the joins are avoided for a special reason and a substitute called Embedding is provided for some cases. Embedding allows for de-normalizing and putting related information under one document which otherwise in normalized form demands more than one table.

e.g.  TaskGroups in our application can have more than one member. In relational world, we would have used two tables: TaskGroup (for holding the information about the group) and TaskMembers (for holding the member info) while in case of MongoDB we can do it as follows: 

{

          "groupName":"TriCoreDBA",

          "status":"Join the taskbuddy network"

          "groupMembers":["702088269902011","984425404949386","911805848878198"]

}

Always remember while it may seem fascinating to put everything into just one document, it is better to go through the basics of MongoDB and your data model before taking any decision. Use embedding only when you feel that the data will also be queried as one block. Excessive normalization and de-normalization can wreak havoc.

3. Schema-less notion 

During the development of TaskBud, we felt the need and use of this mostly talked about feature. Though you can never start totally without a schema, but schema-less actually means that you can make changes to your schema on-the-fly. Since each document in MongoDB can have different number of fields, adding new fields will not affect the old documents unless you want to it otherwise. When we started working on TaskBud, it was actually meant only for the collaboration of our roommates, but eventually we added lots of fields and features and still many are under proposal. If this is the case with relational databases and application is in production, it will be very difficult to make any changes and might require downtime and DBA intervention. While schema-less notion is an advantage of MongoDB, the stakeholders should still try to carve out proper data model and schema design for better performance.

4. Transactions 

MongoDB doesn’t have any magical wands for scaling to hundreds of nodes while providing every feature that a relational database provides. So, the notion of transactions has been avoided in the beginning. Transactions are the major bottleneck while scaling. Although MongoDB supports single document atomic transactions, yet these don’t prove very helpful for transactions such as the financial applications. For implementing transactions between two or more documents a two-phase commit mechanism can be used on the application layer.

Read more about two phase commit on http://docs.mongodb.org/master/tutorial/perform-two-phase-commits/

5) Generating a unique ID

In Oracle, we use sequences to generate continuous unique values for the fields. Similarly in MySQL, we have a column property called AUTO INCREMENT which helps in generating those unique values. On the other hand, MongoDB doesn’t provide any such feature to generate a continuous number sequence. You might do it by application code, but not on the database level. However, MongoDB does uniquely identify documents within a collection in its own way using a field called _id. You can specify a unique value by yourself or let MongoDB put it.

It uses ObjectID, a 12-byte BSON type which is constructed using: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and a 3-byte counter, starting with a random value. 

e.g : { "_id" : ObjectId("55fc2be7077ef816ca81d2c7"), "a" : "1" } 

Although this is sufficient to provide thousands of unique values per second, in case of doubts, you can go through the link below for further explanation. 

http://stackoverflow.com/questions/4677237/possibility-of-duplicate-mongo-objectids-being-generated-in-two-different-colle

6. Query language

NoSQL doesn’t mean not using SQL. This term was coined just to organize a meeting of the next generation databases under one hash tag. But MongoDB doesn’t actually use SQL. It uses Javascript like shell commands for various operations that we perform using SQL.

e.g An update query will look like:

Suppose our database name is taskbud and collection name is taskgroup and we have to update status field where _id is 1.

On Mongo Shell:

use taskbud

db.taskgroup.update(  { _id: 1 },   {  $set: { "status":"Walk an Extra Mile" }   }  )

7. High Availability and Clustering

In Oracle, we use dataguard and create Standby Databases to keep an identical copy of data as primary. In Mongo, we call it a Replica Set. Since MongoDB was meant to scale out on low cost hardware nodes which are prone to failure, it is very common to see two or more replica sets for each primary node. Creating a replica set is very easy as compared to standby in Oracle.

Oracle and MongoDB differ a lot in the way they scale out. While Oracle uses RAC (Real Application Clusters) where each RAC node provides additional memory and CPU resources but shares the same data from the storage, MongoDB uses the concept of Sharding where each node has its different set of local data based upon the Shard Key.

Say for example, TaskBud has grown to more than 10000 users and the node that we started on can handle only 8000 efficiently, then we can use sharding and divide the data on the basis of a shard key say userCreatedOnDate in this case, so, users created before a particular date go to one shard and after that date go to the other. Thus some queries will hit shard 1 while others may hit shard 2 depending on the data they need. In contrast to this, in case of Oracle all data could be accessed from any of the two nodes since same shared storage is used.

Theoretically Oracle RAC can scale more than 100 nodes, it is very uncommon to find such a high number of nodes in practice because the inter-node communication grows tremendously and creates a bottleneck. But MongoDB can scale to hundreds or more nodes easily because there is less inter-communication of nodes reasons being:

i) transactions between multiple documents are not supported

ii) each node has its own data

 
 
relational-1
 

This article outlines few differences and related concepts in Oracle and MongoDB. In the next part, we will discuss more details on these features.

References:

TaskBud: https://www.facebook.com/taskbud

Mongo Docs: http://docs.mongodb.org/


If you have any questions for Ankur then click below:

 Ask Ankur