How to replace soft RAID1 hard drive (Hetzner)
Out Of Date Warning
This article was published on 10/07/2014, this means the content may be out of date or no longer relevant.
You should verify that the technical information in this article is still up to date before relying upon it for your own purposes.
Running your own metal (unmanaged) means, it is, to some degree, your responsibility to fix, if a hardware failure happens. We have been using Hetzner as a host for Empfehlungsbund.de for almost 2 years now, but already experienced 2 individual failures of a hard drive. Neither was a real problem, because both ran on RAID1 and were able to be easily replaced. This time, I want to document the steps I took, in the hope of saving myself and other customers time in the future.
Disclaimer: In case of problems, I take no responsibility for any damage. If you don't know what to do, take a managed option or ask a real sysadmin.
Table of Contents
Receiving DegradedArray Event e-Mails
Normally, you will receive an E-Mail to your admin/root account:
This is an automatically generated mail message from mdadm running on server.example.com.
A DegradedArray event had been detected on md device /dev/md0.
First thing is to log in as root and check, which hard drives and RAID arrays are affected:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb3[1]
723658368 blocks [2/1] [_U]
md1 : active raid1 sdb2[1] sda2[0]
524224 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
8388544 blocks [2/2] [UU]
Things we see:
- There are 3 RAIDs (
md0
,md1
andmd2
) which running in raid1. md1
andmd0
are run onsda1
andsdb1
and are operational ([UU]
).- hard drive
sda3
is not visible onmd2
anymore, and the hard drive is missing in the array ([_U]
denoted by the underscore).
So, for the reset of the guide, we assume: sda
is the broken drive, md2
the broken RAID array.
You can get more information about the RAID with:
mdadm --detail /dev/md2
Running a quick smart-check displayed a lot of errors at our case:
smartctl /dev/sda
Preparing change
Backups! Also prepare for a failover if you have the resources. The server has to shut down for at least a couple of minutes. In the worst case, the server might not boot instantly and you have to book with a rescue console.
Remove broken hard drive completely from all Arrays
If only one RAID is broken, removing the hard drive will only work, if you fail it on the other RAID partitions too:
mdadm --manage /dev/md1 --fail /dev/sda2
mdadm --manage /dev/md0 --fail /dev/sda1
# not needed, because md2 failed for us
# mdadm --manage /dev/md2 --fail /dev/sda2
Now you can remove it:
mdadm /dev/md0 -r /dev/sda1
mdadm /dev/md1 -r /dev/sda2
Install GRUB
For us, /dev/sda
was broken, so we decided to install GRUB boot loader onto sdb
:
sudo grub-install /dev/sdb
Seemed to work, because after the change the server came back without problems.
Changing hard drive
Hetzner has a special support form for hard drive change. They ask for 2 things:
- A full SMART LOG
- The serial number of the broken drive or the serial number of the functional one (if the broken drives serial number can’t be retrieved).
1. SMART log
smartctl -x /dev/sda > smart.log
# Or send yourself a mail if you have sendmail/nullmailer/..
smartctl -x /dev/sda | mail -s 'SMART Log' myself@server.com
2. Serial Number
/sbin/udevadm info --query=property --name=sda | grep ID_SERIAL
## or
hdparm -i /dev/sda | grep SerialNo
3. Do the replacement
- Fill out form, make an appointment
- Hope the server will come back
After server restart
Copy the boot sector back to the new hard drive:
sfdisk -d /dev/sdb | sfdisk /dev/sda
Put the drive back in the RAID arrays:
mdadm /dev/md0 -a /dev/sda1
mdadm /dev/md1 -a /dev/sda2
mdadm /dev/md2 -a /dev/sda3
grub-mkdevicemap -n
Wait for resync - took 6 hours for us. NERD-Cinema:
watch cat /proc/mdstat
Resources
- http://wiki.hetzner.de/index.php/Festplattenaustausch_im_Software-RAID (German)
- http://anton.dollmaier.name/2013/03/17/hdd-tausch-mit-software-raid1-bei-hetzner/ (German)
- http://www.joachim-neu.de/post/140/software-raid-mdadm-festplattenwechsel/ (German)
Image credit: Wikicommons