Exadata Disk Layers and Architecture

Start Here

Get in touch with a
TriCore Solutions specialist

Blog | Oct 18, 2016

Exadata Disk Layers and Architecture

Exadata has a different disk structure in comparison to a normal database disk and it can often confuse the Oracle DBAs when they start working on Exadata. In Exadata, there are four layers of disks I will discuss each one of them in this blog.

Introduction:

In this blog we will be discussing about Exadata disk layers and its architecture. Exadata has very unique storage architecture and DBAs need to have knowledge in detail to manage Exadata databases. Exadata has a different disk structure in comparison to a normal database disk and it can often confuse the Oracle DBAs when they start working on Exadata. In Exadata, there are four layers of disks I will discuss each one of them in this blog. Each Cell comes with 12 SAS Harddisks (600 GB each with High Performance resp. 2 TB each with High Capacity).

oracle exadata structure

Image source: https://docs.oracle.com/cd/E50790_01/doc/doc.121/e50471/concepts.htm
#SAGUG20341

Each Cell has four Flashcards built in that are divided into four Flashdisks each that, which contributes to Flashdisks in each Cell that deliver by default 384 GB Flash Cache. At this stage, the first layer of abstraction comes:

1) Physical Disks

Physical Disks can be either a Hard disk or a Flashdisk. You cannot create or drop them. The only administrative task on this layer is to turn the LED at the front of the Cell on before you replace a damaged Harddisk to be sure you pull out the right one, with a command like

CellCLI> alter physicaldisk  serviceled on

This can be done with using ILOM and you will need to register a call log with Oracle Support. 

2)  Logical Unit Number (LUNs)

LUNs are the second layer of abstraction. Each hardidisk has a total of 12 LUNs. The first two Harddisks in every Cell are different than the other 10 as they contain the Operating System (Oracle Enterprise Linux). About 30 GB has been carved out of the first two Harddisks for that purpose. The reason for redundancy is – the Cell can still operate if only one of the first two Harddisks fail. If you investigate the first two LUNs, you will see the mirrored OS Partitions. The LUNs are equally sized on each Harddisk, however the usable space (for Celldisks resp. Griddisks) is about 30 GB less on the first two each. The purpose of the LUN is to present capacity available to cell disks; on the first two disks, the LUN maps to the extents not used for the System Area, and on the remaining ten disks, the LUN represents the entire physical disk. 

First two LUNs of first harddisk need to be taken on regular basis using new Linux features of Exadata.

As an Administrator, you need not bother about the Lun Layer except looking at it with commands like:

CellCLI> list lun 

3) Celldisks

Celldisks is the third layer of abstraction. This was introduced to enable interleaving in the first place. Typically, Oracle Support creates all the Celldisks for you without interleaving with a command like:

CellCLI> create celldisk all harddisk

When you investigate your Celldisks you would see something similar shared below:

CellCLI> list celldisk attributes name,interleaving where disktype=harddisk

         CD_disk01_cell1         none

         CD_disk02_cell1         none

         CD_disk03_cell1         none

         CD_disk04_cell1         none

         CD_disk05_cell1         none

         CD_disk06_cell1         none

         CD_disk07_cell1         none

         CD_disk08_cell1         none

         CD_disk09_cell1         none

         CD_disk10_cell1         none

         CD_disk11_cell1         none

My Celldisk #12 is not showing because I have dropped it to show the alternative creation with interleaving:

CellCLI> create celldisk all harddisk interleaving='normal_redundancy'

CellDisk CD_disk12_cell1 successfully created

In a real world configuration, every Celldisk (on every Cell) would have the same interleaving (none, normal_redundancy or high_redundancy). The interleaving attribute of the Celldisk determines the placement of the later created Griddisks on that Celldisk.

4) Griddisks

Griddisks are the fourth layer of abstraction, and they will be the Candidate Disks to build your ASM diskgroups from. By default (interleaving=none on the Celldisk layer), the first Griddisk that is created upon a Celldisk is placed on the outer sectors of the underlying Harddisk. It will have the best performance therefore. If we follow the recommendations, we will create three Diskgroups upon our Griddisks: DATA, RECO and SYSTEMDG.

DATA is supposed to be used as the Database Area (DB_CREATE_FILE_DEST=’+DATA’ on the Database Layer), RECO will be the Recovery Area (DB_RECOVERY_FILE_DEST=’+RECO’) and SYSTEMDG will be used to hold Voting Files and OCR files. It makes sense that DATA has a better performance than RECO, and SYSTEMDG can be placed on the slowest (inner) part of the Harddisks.

With interleaving specified at the Celldisk layer, this is different: The Griddisks are then being created from outer and inner parts of the Harddisk, leading to equal performance of the Griddisks and also then of the later created Diskgroups. This option was introduced for customers who want to provide different Diskgroups for different Databases without preferring one Database over the other.

You will need to take Griddisks out of the 10 non System Harddrives of each Cell in the size of about 30 GB to build the Diskgroup SYSTEMDG upon. That leaves us with the same amount of space left on each of the 12 Harddrives for the DATA and RECO diskgroup. You may wonder why the SYSTEMDG Diskgroup gets relatively large with that approach? Even much larger than the space that is required by Voting Files and OCR. Actually, that space gets used if you establish a DBFS filesystem with a dedicated DBFS database that uses that SYSTEMDG diskgroup as the Database Area. In this DBFS filesystem, you may store flat files in order to process them with External Tables (or SQL*Loader) from your productive Databases.

As an Administrator, you can (and will, most likely) create and drop Griddisks; typically three Griddisks are carved out of each Celldisk, respectively. The first two contain the OS. Assuming we have High Performance Disks:

CellCLI> create griddisk all harddisk prefix=temp_dg, size=570G

This command will create 12 Griddisks, each of 570G in size from the outer (fastest) sectors of the underlying Harddisks. It fills up the first two Celldisks entirely, because they have just 570G space free – the rest is already consumed by the OS partition.

CellCLI> create griddisk all harddisk prefix=systemdg

This command creates 10 Griddisks for the systemdg diskgroup, consuming all the available (30G) space remaining on the 10 non system Harddisks. They will be on the slowest part without interleaving.

CellCLI> drop griddisk all prefix=temp_dg

Now we have dropped that griddisks, leaving the faster parts empty for the next two Diskgroups:

CellCLI> create griddisk all harddisk prefix=data, size=270G

It is best practice to give the name of the future diskgroup as a prefix for the Griddisks. We have now 12 Griddisks for the future DATA diskgroup on the outer sectors created. The remaining space (300G) will be consumed by the reco Griddisks:

CellCLI> create griddisk all harddisk prefix=reco

We are now ready to continue on the Database Layer and create ASM Diskgroups from that Layer. Griddisks just look like ASM (Candidate) Disks.

Conclusion:

All the various Disk Layers in Exadata are there for a good reason. As an Administrator, you will probably only deal with Griddisks, though. There are multiple Griddisks carved out of each Celldisk->Lun->Physical Disk. On the Database Layer, Griddisks look and feel like ASM Disks that you use for your ASM Diskgroups. For any questions on the topic click below:
Ask Jay