Introduction: MongoDB Tips

Start Here

Get in touch with a
TriCore Solutions specialist

Blog | Aug 3, 2017

Introduction: MongoDB Tips

MongoDB repair command goes through every document it can find and makes a clean copy of it. This takes a long time, a lot of disk space (an equal amount to the space currently being used), and skips any corrupted records. Learn few tips from our new blog post by Pavan Paramathmuni on MongoDB.

Getting started with MongoDB is easy, but once you’re building applications with it more complex questions emerge.

  • How should I re-sync a replica member in replica set? How can we recover MongoDB after a crash?
  • When to use GridFS? How to fix corrupted data? etc., 

Here are some of the tips which you can go through when using MongoDB which can be quite helpful in day to day tasks when working with it.

Tip 1:  Do Not Depend on Repair to Recover Data

If your database crashes and you were not running with –journal, do not use that server’s data.

MongoDB repair command goes through every document it can find and makes a clean copy of it. This takes a long time, a lot of disk space (an equal amount to the space currently being used), and skips any corrupted records.

Remember that you must wipe the possibly corrupt data before re-syncing; MongoDB’s replication cannot “fix” corrupted data.

Tip 2: Re-sync Mongo Replica Member in Replica Set

  1. Make sure at least one Secondary and Primary are up and running.
  2. Stop the Mongo Services using ORACLE user.
  3. Login into Mongod user and move all the data files in backup folder. You are moving the files in the backup folder with the idea of restoring the file in case of any issues. If the old files are present in backup folder they can be removed. Datafile location can be checked from /etc/mongod.conf file.
  4. Start the Mongo service using ORACLE user.
  5. Login in the Database to validate, database authentication will not be required until member is synced in Replica Set.
  6. Once replication is completed, status will be changed to SECONDARY from STARTUP2. 

Tip 3:  Don’t Use GridFS for Small Binary Data

GridFS requires two queries: one to fetch a file’s metadata and one to fetch its contents. Thus, if you use GridFS to store small files, you are doubling the number of queries that your application has to do.
GridFS is basically a way of breaking up large binary objects for storage in the database.

GridFS Interface

Source: www.slideshare.net

GridFS is for storing big data—larger than will fit in a single document. As a thumb rule, anything that is too big to load all at once on the client is probably not something you want to load all at once on the server.

Therefore, anything you’re going to stream to a client is a good candidate for GridFS. 

Tip 4: Minimize Disk Access 

Simple logic that we all know is that accessing data from RAM is fast and accessing data from a disk is slow. 

Basically minimizing the amount of disk accesses is a great optimization technique. But, how do you minimize disk accesses? There are two simple ways for achieving it. See below:

Use of SSD (Solid State Drives) :

SSD disk space

Source : http://www.serverintellect.com


SSD’s are much faster than traditional HDD’s in many things, however they are often smaller and more expensive. They really work very well with MongoDB. 

  • Add more RAM :

Adding more RAM means you are hitting your disk less. However, adding more RAM will only take you so far and at some point your data is not going to fit in your RAM

So the question is how do we store terabytes or petabytes of data on disk and program an application that will mostly access the frequent data in memory and move data from disk to memory as infrequently as possible?

If you literally access all of your data randomly in real time, you’re just going to need a lot of RAM. However, most applications don’t: recent data is accessed more than older data, certain users are more active than others, and certain regions have more customers than others. Applications like these can be designed to keep certain documents in memory and go to disk very infrequently. 

Tip 5: Start MongoDB Normally After a DB Crash 

If you were running journaling and your system crashes in a recoverable way, you can restart the database normally. Make sure you’re using all of your normal options, especially -- dbpath (so it can find the journal files) and --journal, of course.
MongoDB will take care of fixing up your data automatically before it starts accepting connections. This can take a few minutes for large data sets, but it shouldn’t be anywhere near the time that people who have to run repairs on large data sets are familiar with (probably five minutes or so).

Journal files are stored in the journal directory. Do not delete these files. 

Tip 6: Compact Database Using Repair 

Repair basically does a mongodump and then a mongorestore, making a clean copy of your data and, in the process, removing any empty “holes” in your data files. 

Repair will block operation and will take twice the disk space that your database is current running, but if you have another machine, you could do it manually with mongodump and mongorestore. 

Example:

Step 1: Step down Hyd1 machine and fsync and lock: 

rs.stepDown()

db.runCommand({fsync : 1, lock : 1}) 

Step 2: Dump file to Hyd2 

Hyd2$ mongodump --host Hyd1 

Step 3: Delete data file in Hyd1 (make a copy in case …), restart Hyd1 with empty data 

Step 4: Then restore it from Hyd2

Hyd2$ mongorestore --host Hyd1 --port 10000 # specify port if it's not 27017 

Conclusion:

These are some of the changes that we noticed in our environment that have provided a great boost to our MongoDB performance. So bookmark this article, and check off each tip the next time when you’re ready to use MongoDB. In my next blog, I hope to share some tips where you can properly design, optimize and implement certain useful features of MongoDB in large enterprises. For any questions on these click below:

Ask Pavan