Live Server Hard Disc Drive Upgrade Process
This documents the procedure for changing hard discs on a live dedicated server to increase total system capacity in a server utilising RAID 1 technology.
Also see Ext3 to LVM conversion process
Prerequistes
Commands needed:
- mdadm
- rsync
- sshd
- pivot_root
- chroot
Procedure
- Obtain new hard discs of appropriate type, capacity and speed. For calculating capacity needed take into account existing hard drive space usage (user data, OS (software + swap space)).
- Assign Asset ID to new discs. Label discs with 'AID' and the asset ID on top of the drive in texter only.
- HOT SWAP DISCS ONLY: Find spare drive cages that match the dedicated server that the new discs are for. Place new discs into the drive cages.
- Format and check new discs: On a spare managed server:
- Place new discs in dedicated server and power-on server.
- SCSI discs:
- For Adaptec SCSI BIOS, press Control-A on bootup to go into SCSI BIOS.
- For each new drive:
- Perform low-level drive format.
- Perform surface scan.
- IDE discs:
- For each drive: badblocks -vw /dev/hdX
- Create emergency floppy boot disc for dedicated server: Refer to Bootdisc_creation procedure. Set root password to something.
- Contact client at least 24 hours prior to altering hardware configuration and do not perform any work until work is approved. Work MUST be done outside business hours.
- Start a temporary text file on your workstation. Copy current partitioning layout, MD device layout, and mount points.
- SWAP on RAID 1 is not entirely stable. First check if there is
sufficient free RAM on machine: eg running free
total used free shared buffers cached Mem: 523864 520752 3112 0 59788 395192 -/+ buffers/cache: 65772 458092 Swap: 1991928 1088 1990840
In this example there is 458092 KiB of memory that can be used if there is no caching/buffers. There is 1088 KiB of swap currently being used. The amount of free memory MUST be at least twice the amount of swap being used. If there is sufficient free memory then swap must be disabled to ensure machine stability during the disc enlargement:swapoff -a (Hot swap SCSI Only) vi /etc/fstab - KILL smartd . KILL IT AGAIN. If smartd is running when you try to remove the scsi device from the kernel, nothing will happen, everything will think you still have the old device in, and you'll think you've swapped disks on the wrong machine.
- Replace 1 hard disc. Start with last hard disc (eg hdb, SCSI ID # 1) First ensure that disc chosen is in a good state: badblocks -vv /dev/X Hot swap SCSI Only:
- Set RAID members on second disc as failed and remove from array.
DRIVE=sdb MD_DEVICES=`awk '/^md[0-9]+/ { print $1 }' /proc/mdstat` for md_device in $MD_DEVICES do partition=`grep ^$md_device /proc/mdstat | sed -e "s/.*\(${DRIVE}[0-9]\+\).*/\1/"` mdadm --fail /dev/$md_device /dev/$partition mdadm --remove /dev/$md_device /dev/$partition done Confirm that drive is not being used:
grep $DRIVE /proc/mdstat mount | grep $DRIVE- Find out SCSI controller, channel, device, and logical unit number (LUN) via kernel boot up messages (/var/log/dmesg) and /proc/scsi/scsi
Tell Linux to remove SCSI disc list of drives on bus
blockdev --flushbufs /dev/X echo "scsi remove-single-device $controller $channel $device $lun" >/proc/scsi/scsiConfirm that drive is removed from list of devices
cat /proc/scsi/scsi
- Physically remove disc now. Label old drive. Place old hard disc in anti-static bag and place in equipment storage area for wiping/testing queue.
- Place new hard disc in machine.
- KILL smartd . KILL IT AGAIN. If smartd is running when you try to remove the scsi device from the kernel, nothing will happen, everything will think you still have the old device in, and you'll think you've swapped disks on the wrong machine.
Tell Linux that disc is attached to SCSI bus now.
echo "scsi add-single-device $controller $channel $device $lun" >/proc/scsi/scsi
(this will take a while (1-2 minutes) to return)Confirm that drive is now in list of devices
cat /proc/scsi/scsi
- Set RAID members on second disc as failed and remove from array.
- Partition new disc to needs of customer. Consult partitioning procedures. fdisk /dev/whatever (NB: Remember to use fd for raid autodetect) cat /proc/partitions
- Create failed RAID array from new partitions
- Check /proc/mdstat and find available MD device minor number
Create new MD device with a failed member:
mdadm --create --level=raid1 --raid-devices=NUMBER_OF_DISCS \ /dev/md${FREE_MINOR} $partition [$other_partition..] missingNUMBER_OF_DISCS is what final value will be (typically 2).- Confirm with /proc/mdstat that new MD device is active
Make new filesystems/swap
for md_device in \$new_md_devices do mke2fs -j -L \$label /dev/\$md_device # or mkswap /dev/whatever doneNB: Ensure labels are different to old ones if using labels in /etc/fstab or /boot/grub/grub.conf Take note of new labels.- Mount new MD partitions
- mkdir /newroot
Mount new root:
mount /dev/md${MINOR_OF_NEWROOT} /newrootMount all other points:
cd /newroot for mount_point in $new_mount_points do mkdir /newroot/$mount_point # Check permissions are correct on mount point! (eg chmod 1777 tmp) mount /dev/md${WHATEVER} /newroot/$mount_point done
Copy data across
cd / for mount_point in $old_mount_points do cp -ax $mount_point /newroot/ doneShutdown all services except SSH
echo "Doing drive upgrade" > /etc/nologin
Manually shutdown all services except SSH via:
ps aux and service whatever stop
check with:
netstat -ln
andps aux
rsync all data
for mount_point in $old_mount_points do rsync -avnxX --delete $mount_point/ /newroot/$mount_point/ # Check that it does what you think it is doing rsync -avxX --delete $mount_point/ /newroot/$mount_point/ donepivot_root to new root:
mkdir /newroot/oldroot cd /newroot pivot_root . oldroot exec chroot . /bin/bash <dev/console >dev/console 2>&1if pivot_root not available need to do boot loader and rebootRemoval of processes in old root
telinit u (make exec chroot if pivot_root if 2.2.x kernel) service sshd restart login via SSH and logout old window # Obsolete echo $hex_value_of_new_root_dev > /proc/sys/kernel/real-root-devunmount old root
cd /oldroot # To find any strays. There will probably be a whole bunch of kernel threads. fuser -mv /oldroot cat /proc/mounts for info mdadm -S /dev/$old_md_devices mount /dev/whatevr /oldroot -o remount,ro # If can't unmount oldroot- fix /etc/fstab to match new devices being used. fix /etc/raidtab
- fix /etc/mtab to show new devices as being mounted (needed to fool mkinitrd)
- Swap to new drive If you were able to unmount /oldroot then, remove old drive using step 9, otherwise you will need to fix boot loader and reboot in order to remove drive.
- boot loader
If grub being used
cd /boot/grub vi device.map vi grub.conf $ grub grub> device (hd0) /dev/X # Set first BIOS drive to Linux device /dev/X. SCSI BIOS assigns order based on SCSI ID. # Not sure if there is a standard for IDE BIOS. grub> root (hd0,0) # /boot is on first partition of first BIOS disk. grub> setup (hd0) # Install the boot loader into MBR of first BIOS disk. grub> quit check boot loader is installed: dd if=/dev/X bs=512 count=1 | strings # this will have GRUB in the output- If LILO is being used grrrr. Much pain. Need to play with root,boot, and device/bios directives. check boot loader is installed:
dd if=/dev/X count=1 | strings
(this will have GRUB or LILO in the output)
- mkinitrd Modern versions of RHL should have no need to modify initrd. mkinitrd says 'All of your loopback devices are in use!' with old versions and /tmp being /tmpfs run script from post-install of kernel
copy partition table:
sfdisk -d /dev/sdb | sfdisk /dev/sda
hotadd new partitions
mdadm -a /dev/mdWHATEVER /dev/sdWHATEVER echo 10000000 > /proc/sys/dev/raid/speed_limit_maxBoot loader again: NB Wait until /boot partition RAID has finished syncing. Mkinitrd reboot for a final check fix up bootdisc rm /etc/nologin rmdir /oldroot /newroot- If kernel upgrades necessary, upgrade, as per instructions in procedures/kernel-upgrade-checklist.
- If the partition layout has changed, and the machine is being backed up to Ark, correct the disklist to reflect the new partitioning layout for
this machine.BR
