“Data is a precious thing and will last longer than the systems themselves.” -Tim Berners-Lee.
Data is very critical for any organization, be it BFSI, FMCG, Manufacturing, Pharmaceuticals or any other industry or sector to name. It is hard to imagine an organization which does not care about its data. However, data cannot be a singular focus, it needs to be incorporated into the strategy of the business? For this reason, organizations hire the services of experts who can protect their data and make it available to the core business. Technology helps to facilitate the business. These experts are called Database Administrators or DBAs. If you are not a DBA, like Niranjan who is a Web Programmer, it is hard to understand this. He is unable to digest how a DBA can spend their entire day writing commands less than a line and sometimes not typing at all. So, he sits with a DBA to try get a better understanding of what a typical day would be for a DBA.
The conversation goes like this:
Niranjan: I’m a programmer, I develop software and applications at least eight hours a day and five days a week. Can you explain to me what job do you do in a nutshell but please avoid using jargon?
DBA: I take care of the data that your application generates, the data that your organization uses and the data which your bank uses. I work when nobody else does, sacrificing the weekends, nights and the family time to make sure your data is safe and available as it should be for you.
Niranjan: What do you mean take care? We program our applications to insert the data into the database which is a complete product in itself. Why do we need a DBA for that?
DBA: See Niranjan, a database management system whether it is relational or document oriented, SQL or NoSQL is a complex software. Consider Oracle, MySQL or MongoDB; all are complex products where lot of administration work needs to be done.
Niranjan: And what would administration work be like?
DBA: Like backups, performance tuning, high availability, scalability, disaster management, proactive monitoring, patching and upgrading, refresh and clones, replication etc.
Niranjan: Is taking data back up a complex task? Could you elaborate a bit?
DBA: Yes, sure! See every organization has its own backup policy which is designed keeping in view every possible failure that can occur and still the data needs to be protected. As they say “There are only two types of disks in this world; those which have failed and those which are about to fail”. Thus we ensure that even if the disk fails, we have a contingency plan to tackle the failure.
Niranjan: Contingency Plan! This reminds me of the dialogue in the movie Abraham Lincoln “A wise man once told me always have a contingency plan.” And that wise man would be you now. (Laughing…)
DBA: (Laughing…) Yes correct! So, backups are taken on disks and tapes at regular intervals. However, protecting the data is not enough. Suppose our client has a database of several terabytes which almost every client at TriCore Solutions has and it gets corrupted somehow due to any hardware failure, logical corruption or other natural disaster. Our job here is not only to restore the database to original state but also to minimise the downtime during that restore. Because if the production database fails, the unit of our client organization which depends on that database also fails. Nothing can be produced on the production line unless it is recorded in the database. Our job also involves designing backups and disaster management strategies for the data in order to meet the strict SLAs.
Niranjan: (Laughing…) Disaster management! It’s a big term. What do you do… jump into water in a flooded area to rescue machines like National Disaster Response Force guys do to rescue people?
DBA: I think we are on the same page now (with a smile on the face). But we don’t actually get the time or the opportunity to jump into the water in the flood situation. If it is production time and our services are unavailable for even an hour, the things get heated up right from the technical team level to the top management. Say there is flood situation in Somerville and my data center goes down, we already have a contingency plan in place what we call a “standby database” in Chicago which our proactive monitoring team ensures is always in sync with the primary database in Somerville. Thus, we get everything up and running in no time. Same is true with the more than 5000 databases that our company manages and ranges from the SMEs to the Fortune 500 clients. We ensure database and application availability of more than 99.9% which means that the database is running as it should be that percentage of the time.
Again it is just small commands which help us get these things done, but it requires undeterred attention 24 hours a day and 7 days a week like the firefighting services in a town. You know that fire is not going to break out every day but still you keep a task force at hand well prepared to manage it when it does.
Niranjan: Smart! A standby database. You used the term “proactive monitoring”. How do you manage that?
DBA: Proactive monitoring is one of the most important tasks in the life of a DBA. Let’s talk about TriCore Solutions, we have hundreds of clients and each client has several hundred to thousands of managed targets. Managed targets include everything from host machines to operating systems, to applications and middleware and databases and their sub systems. By monitoring proactively we mean we have tools in place for each target which help us to detect an incident when it is small to contain it before it becomes a critical one.
Niranjan: What tools do you generally use?
DBA: We have a wide variety of tools. For example, let’s use Oracle, we generally have clients using Oracle Enterprise Manager (OEM). If the client does not opt for an OEM, and for even those who are using OEM, we have our monitoring scripts (TriCore Toolkit Scripts) installed on servers as a standby monitoring method. Be it OEM or TriCore’s Toolkit, the main job is to alert us in case any incident occurs. For example, we have a server where a file system mount point is more than 90% full and as it reaches 100%, the database will stop working, something we do not want. In this case, 90% will be our warning threshold and once this limit is achieved, we will be alerted and we take appropriate actions immediately to avoid a critical situation. Similarly there are hundreds of other parameters which need to be monitored and taken care off.
Niranjan: Lot of stuff! I heard this term scalability when you started. How do you deal with it?
DBA: Yes, scalability is an important aspect of the IT applications and solutions TriCore provides. As the users are growing, we recommend several solutions for them as per their needs. First, we see if the current system is being utilized to maximum. We check and test various performance parameters what we call performance tuning. However if we find that the existing system is incapable of handling the load to the required levels, we then figure out the feasibility of scaling up and scaling out.
Niranjan: Scale up and Scale out… how do they differ?
DBA: Scale up means adding more resources to an existing machine e.g adding more CPU, or memory etc. while scale out means to add more machines or in Oracle terms nodes or in a MongoDB, we add shards to an existing system. Oracle provides Real Application Clusters (RAC) features for adding more nodes to a database. It recommends Maximum Availability Architecture (MAA) where we have multiple nodes at the instance level which helps in load balancing besides providing availability; if one node goes down another is still up, thus eradicating SPOF.
DBA: Single Point of Failure. We try to remove SPOF at an instance level as discussed, at database level by using Oracle Dataguard (standby database), at storage level by using Oracle ASM (Automatic Storage Management), at file system level by using technologies like RAID and ZFS and a lot more.
Niranjan: Whoa! That’s too much. I thought you guys kept sitting idle there.
DBA (Laughing…): Ha! Ha! We haven’t even covered one tenth of the work yet. Like patching, upgrading, replication, refresh and cloning. Plus, the major part of troubleshooting. To add to this, we have yet discussed only the core DBA tasks. There are different tasks that an Apps DBA performs.
Niranjan (Laughing…): May be I should take a day off and meet you at your office.
DBA: Sure… You are most welcome.
Till the time Niranjan comes back to learn more about other DBA stuff, readers are requested to stay tuned on the TriCore Solutions’ blog section. For any comments and queries click below: