Replace a failed Hard Drive and rebuild the RAID1 Array Print

  • 0

 
In a Linux RAID array (Software RAID), when a failed hard disk replacement occurs or you need to remove a failed hard disk and want to add a new hard disk to the RAID array without losing data, your RAID array is left with one functioning disk in Software RAID and one empty disk.
 
This is also the same as if you have accidentally deleted the partition tables or have altered the partition settings and corrupted the data.
 
In such case, you need to copy the intact Partition Table of the functioning disk to the new empty disk and then rebuild the software RAID array with the help of the “mdadm” command - additional instructions may need to be applied if you have corrupted the data.
 
Advisory: Please remember the correct drive/partition and don’t forget it, otherwise you may wipe both drives and suffer a serious loss of data and I cannot be held accountable.
 
Step 1:
 
Login to your server via SSH.
 
$ ssh root@yourserver.domain
 
Step 2:
 
Safety first! Your data is important, backup your data - you have been warned.
 
This is very dangerous and one small mistake will cost you your life!
 
Step 3:
 
If you don’t have “mdadm”, install it depending on your package manager:
$ yum install mdadm
$ apt-get install mdadm
$ aptitude install mdadm

Or, you may use whatever command you’re comfortable with installing packages under your Linux System (such as RPM).
 
Step 4:

Check the status of the RAID array using the command:
 
$ cat /proc/mdstat

This returns the output of multi-disk status.

Below is an example of two functioning healthy RAID arrays.
 
[root@local~]# cat /proc/mdstat
Personalities : [raid1]

md2 : active raid1 sdb2[2] sda2[0]
      20478912 blocks [2/2] [UU]

md3 : active raid1 sdb3[2] sda3[0]
      1902310336 blocks [2/2] [UU]

unused devices: <none>

In the above example, it shows two healthy 
RAID 1 arrays.
 
As each array has a status of  [2/2] and [UU], this means that out of 2 partitions in the array, both the two partitions are functional.
 
So you can better understand, sda is disk one, and sda is disk two. You then have the partition table numbers, e.g (1,2,3,4):
 
$ fdisk -l oparted -l

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  21.0GB  21.0GB  ext3            primary  raid
 3      21.0GB  1969GB  1948GB  ext3            primary  raid
 4      1969GB  2000GB  31.5GB  linux-swap(v1)  primary
 
As you can see in the above example, we are going to be linking sdb2 ( primary raid partition ) to the matching sda2 ( primary raid partition ) in the array, ignoring the number 1 bios and number 4 swap partitions and again with sdb3 ( primary raid partition ) to the matching sda3 ( primary raid partition ) - recovering your data to the new disk, sdb.
 
Step 5:
 
In our case, the /dev/sdb has already been replaced and /dev/sda is functioning well, otherwise you would need to remove the failed drive from the array before you continue to reboot the server and replace the drive:
 
$ mdadm --manage /dev/md2 --remove /dev/sdb2

$ mdadm --manage /dev/md3 --remove /dev/sdb3
 
In the below output of a degraded array, the partitions are not listed, so you will not need to remove them from the array.
 
[root@local~]# cat /proc/mdstat
Personalities : [raid1]

md2 : active raid1 sda2[0]
      20478912 blocks [2/1] [U_]

md3 : active raid1 sda3[0]
      1902310336 blocks [2/1] [U_]

unused devices: <none>
 
The output above shows instead of one of the partitions in each array being marked as healthy (as in the above example), they are not listed at all - ( sdb2 ) are ( sdb3 ) are missing from ( md2 ) and ( md3 ).
 
Step 6:

Now, issue “Fdisk -l ” command to list all the partitions of both disks.
Important:
 
If “fdisk -l ” returns the following message:
 
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
 
Your disk’s Partition Table in not MBR.
 
Instead, it’s GPT (GUID Partition Table).
 
In that case, you use “parted -l ” command.

If “fdisk -l ” displays “Disk /dev/sdb doesn’t contain a valid partition table ” for the failed disk /dev/sdb, it’s fine. 
If “fdisk -l ” or "parted -l" lists partitions on the failed disk /dev/sdb, you need to enter into fdisk / parted and delete the partitions - Google how to use parted, gdisk or fdisk if you are not familiar with the process.
 
Once you have removed the partitions, reboot the server to re-read the partition tables.
 
Execute either command “reboot” or “shutdown -r now”.
 
You do not always need to reboot the server, and could try to reload the partition table using a general hard-disk utility such as:
 
partprobe
 

# partprobe /dev/sdb

 
Hdparm
 

hdparm -z /dev/sdb

 
Step 7:
 
Now replicate the partition by copying the Partition Table of the healthy disk (/dev/sda) to the empty disk (/dev/sdb).
 
Please be extra careful by providing the right disk names, otherwise it will wipe out the data in functioning healthy drive.
 
For MBR disks (replaced sdb drive) execute:
 
$ sfdisk -d /dev/sda | sfdisk /dev/sdb
 
Sometimes you may encounter an error such as " sfdisk: ERROR: sector 0 does not have an msdos signature "
 
In that case, execute the command with “–force” option:
 
$ sfdisk -d /dev/sda | sfdisk --force /dev/sdb
 
For GPT disks (replaced sdb drive) execute:
 
# sgdisk --backup=table /dev/sda
# sgdisk --load-backup=table /dev/sdb
# sgdisk -G /dev/sdb

The partition tables should now match both sda and sdb.
 
Step 8:

Now, we are going to use “mdadm” command to get detailed information on the status of the RAID arrays, execute:
 
$ mdadm --misc --detail /dev/md2

$ mdadm --misc --detail /dev/md3
 
The output may look like below:
 
# mdadm --misc --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Tue Dec 30 00:01:43 2014
     Raid Level : raid1
     Array Size : 20478912 (19.53 GiB 20.97 GB)
  Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent
    Update Time : Fri Mar 30 18:34:21 2018
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
           UUID : c1d4fc4e:649242d5:a4d2adc2:26fd5302
         Events : 0.714840
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      removed

Notice that, “State :” could be anything like “active, degraded” or “clean, degraded” for failed disks and “active” or “clean” for functioning disks. Also, take notice of the last three lines.
 
Step 9:
 
To find out which partition should be added to which array, execute:
 
"cat /etc/mdadm.conf” or "cat /etc/mdadm/mdadm.conf”.

[root@localhost ~]# cat /etc/mdadm.conf

ARRAY /dev/md2 UUID=c1d4fc4e:649242d5:a4d2adc2:26fd5302
ARRAY /dev/md3 UUID=af5d17c4:9a30e18f:a4d2adc2:26fd5302
 
Step 10:
 
The /dev/sdb should now have been replaced, and you need to add the /dev/sdb partitions to the correct arrays.
 
The output from the Step 9 states that /dev/sdb2 should be added to the /dev/md2 array, so execute:
 
$ mdadm /dev/md2 --manage --add /dev/sdb2
 
Check the RAID array status by issuing the “cat /proc/mdstat ” command.
 
The correct partition should have been added to the array, data should begin copying over to the new drive and rebuilding of /dev/sdb1 will occur.
 
Once the rebuilding process is done the output of “ mdadm –misc –detail /dev/md2 ” should display:
 
# mdadm --misc --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Tue Dec 30 00:01:43 2014
     Raid Level : raid1
     Array Size : 20478912 (19.53 GiB 20.97 GB)
  Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent
    Update Time : Fri Mar 30 18:34:21 2018
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
           UUID : c1d4fc4e:649242d5:a4d2adc2:26fd5302
         Events : 0.714840
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
 
You have now activated raid on partition number 2 of sdb - all the data from partition 2 of the old disk is now being copied over.
 
Do the same for the /dev/sdb3 partition by executing:
 
$ mdadm /dev/md3 --manage --add /dev/sdb3

The following output will be displayed if you have been successful:
 
 mdadm --misc --detail /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Tue Dec 30 00:01:44 2014
     Raid Level : raid1
     Array Size : 1902310336 (1814.18 GiB 1947.97 GB)
  Used Dev Size : 1902310336 (1814.18 GiB 1947.97 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent
    Update Time : Fri Mar 30 19:00:43 2018
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
 Rebuild Status : 3% complete
           UUID : af5d17c4:9a30e18f:a4d2adc2:26fd5302
         Events : 0.3799693
    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       2       8       19        1      spare rebuilding   /dev/sdb3
 
cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1] sda2[0]
      20478912 blocks [2/2] [UU]
md3 : active raid1 sdb3[2] sda3[0]
      1902310336 blocks [2/1] [U_]
      [>....................]  recovery =  3.1% (59272512/1902310336) finish=2234.9min speed=13743K/sec
unused devices: <none>
 
Step 11:
 
Now that both the partitions /dev/sdb2 and /dev/sdb3 are recovered and added to correct arrays and the arrays are rebuilt, you need to enable the swap partition for the new drive.
 
To verify the swap partitions, execute the command:
 
$ cat /proc/swaps

Filename                                Type            Size    Used    Priority
/dev/sda4                               partition       30718972        49244   -1
 
As you can see, the swap partition number 4 on sdb has not been created.
 
To enable the swap partition for /dev/sdb, issue the commands:
$ mkswap /dev/sdb4
$ swapon -p 1 /dev/sdb4

(or)


$ swapon -a

Step 12:

Issue a final “fdisk -l" or "parted -l" to verify the partitions match both of the disks.
# parted -l

Model: ATA HGST HUS724020AL (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  21.0GB  21.0GB  ext3            primary  raid
 3      21.0GB  1969GB  1948GB  ext3            primary  raid
 4      1969GB  2000GB  31.5GB  linux-swap(v1)  primary

Model: ATA HGST HUS724020AL (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system     Name     Flags
 1      20.5kB  1049kB  1029kB                  primary  bios_grub
 2      2097kB  21.0GB  21.0GB  ext3            primary  raid
 3      21.0GB  1969GB  1948GB  ext3            primary  raid
 4      1969GB  2000GB  31.5GB  linux-swap(v1)  primary
 
I hope you're excited, and hopefully you managed to survive a very stressful situation.
 
If you require any additional support, or would like to ask a question - before you wipe your data, please comment below or open a support ticket at https://hydrovps.com/
 
Regards,

Was this answer helpful?

« Back

Powered by WHMCompleteSolution