Monitoring Linux Software RAID

Last Updated On23/03/2021

Introduction

RAID is an acronym of ‘Redundant Array of Independent Disks’. It basically is a virtual device created from physical drives or partitions. With Linux, it is possible to use RAID without a need to have a hardware RAID controller, with both software and hardware RAID supported. Most RAID levels will allow some degree of drive failure, which is useful to protect important data.

Linux Supports the following software RAID levels: RAID 0 (No Redundancy), RAID 1, RAID 4, RAID 5, RAID 6, RAID 10. The levels of RAID are explained in detail here: https://www.enterprisestorageforum.com/management/raid-levels-explained/ However the most popular ones for redundancy are RAID1 (Mirroring) and RAID 10 (Striping and Mirroring).

Checking RAID configuration and status

All essential information about RAID devices are stored in ‘/etc/mdadm.conf’ file, which looks similar to the following:

[root@server ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/boot level=raid1 num-devices=2 UUID=f8ce33fe:afd0b1b5:7aedbcf9:13e967d3
ARRAY /dev/md/root level=raid1 num-devices=2 UUID=a7866d4d:ec7a94a9:dc0e1c5a:76848e6b
ARRAY /dev/md/swap level=raid1 num-devices=2 UUID=faca3cc3:7da4398b:5b42d265:3d14b0e2

Note: If you are using Debian operating system the mdadm.conf file will be located at: /etc/mdadm/mdadm.conf

To check the health status of your RAID array, you can simply run the following command:

[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active raid1 sdb1[1] sda2[0]
15623168 blocks super 1.2 [2/2] [UU]

md126 : active raid1 sdb2[1] sda3[0]
975872 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdb3[1] sda5[0]
471641088 blocks super 1.2 [2/2] [UU]
bitmap: 2/4 pages [8KB], 65536KB chunk

unused devices: <none>

If you need more details about the specific RAID device, just run the following command, replacing /dev/md126 with the name of the device, you wanted to check:

[root@server ~]# mdadm --detail /dev/md126
/dev/md126:
Version : 1.2
Creation Time : Tue Mar 23 01:37:13 2021
Raid Level : raid1
Array Size : 975872 (953.00 MiB 999.29 MB)
Used Dev Size : 975872 (953.00 MiB 999.29 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Mar 23 15:24:09 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

Name : localhost:boot
UUID : f8ce33be:afd0b1b5:7aedbcf9:13e967d3
Events : 72

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 18 1 active sync /dev/sdb2

Setting up Email Alerts for RAID Monitoring

It is very simple and very useful at the same time to setup email alerts, so if there is something wrong with RAID setup, you will receive an email. At Fraction Servers we do not monitor customers RAID arrays and recommend any customer using software RAID setups monitoring of their RAD arrays.

To set this up, simply edit /etc/mdadm.conf ( or /etc/mdadm/mdadm.conf if it is Debian) file and add the following line:

MAILADDR you@yourdomain.com

obviously replacing you@yourdomain.com with your email address.
Then save the file and restart mdadm by executing the command:

/etc/init.d/mdadm restart

Now in case there is something wrong with your RAID setup you will receive an email alert similar to this one:

From: mdadm monitoring <root@server.example.com>
To: you@yourdomain.com
Subject: DegradedArray event on /dev/md1:server.example.com


This is an automatically generated mail message from mdadm
running on server.example.com

A DegradedArray event had been detected on md device /dev/md1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid0] [raid1]
md1 : active raid1 sda2[2] sdb2[1]
487853760 blocks [2/1] [_U]
[>....................] recovery = 4.3% (21448384/487853760) finish=114.3min speed=67983K/sec

md0 : active raid1 sda1[0] sdb1[1]
530048 blocks [2/2] [UU]

unused devices: <none>