Replacing failed OS RAID disks

RAID1 is automatically configured on the server. However, in some cases, you may need to remove, recover, and add new devices to RAID.

In most cases, this requires removing the failing disk and installing a new one.

While you can do this while powered up, if the system allows you to power down, do so.
  1. To check the status of the RAID1 disk, type:
    cat /proc/mdstat

    When RAID1 is working correctly, the terminal prints, for example:

    Personalities : [raid1]
    
    md126 : active raid1 sda[1] sdb[0]
    
          125032448 blocks super external:/md127/0 [2/2] [UU]
    
    md127 : inactive sdb[1](S) sda[0](S)
    
          4520 blocks super external:imsm
    • [UU] indicates that both disks are operational.

    • If there is a problem with one of the disks, the [UU] string is [_U] or [U_].

  2. Check if disk sda or sdb has failed.
  3. To remove the disk from RAID configuration, type:

    Take care when removing the failing disk.

    Remove only the disk identified as failing. In the following example, it is sdb.

    mdadm --manage /dev/md/imsm0 --remove /dev/sdb

    The terminal prints:

    mdadm: hot removed /dev/sdb from /dev/md/imsm0
  4. Power down computer, replace the failing disk, and reboot
  5. To create the partition on the replacement disk, type:
    sfdisk -d /dev/sda | sfdisk /dev/sdb
  6. To verify the partition, type:
    fdisk –l
  7. To add a new disk to the raid array, type:
    mdadm --manage /dev/md/imsm0 --add /dev/sdb
  8. To check the recovery process, type:
    cat /proc/mdstat

    The terminal prints:

    [root@wes-install ~]# cat /proc/mdstat
    
    Personalities : [raid1]
    
    md126 : active raid1 sdb[2] sda[1]
    
          125032448 blocks super external:/md127/0 [2/2] [UU]
    
    md127 : inactive sdb[1](S) sda[0](S)
    
          4520 blocks super external:imsm
    
     unused devices: <none>