2006-08-20

got grub?

found this little gem in my inbox this morning:
This is an automatically generated mail message from mdadm
running on obiwan

A Fail event had been detected on md device /dev/md0.

Faithfully yours, etc.

oh yay! failed drives! luckily obiwan is still the "sandbox system" for now - it was supposed to be turned into my main externally-facing server once i was done with openwrt/dmz setup/etc. so much for good intentions - i'll never get this shit done.

so, i at least had the forethought to mirror the drives - it's dual 60GB ATA100 drives - good ol' hda and hdb. on each drive, i created two partitions - the first partition is /boot and the other is half of md0 - a raid1 device. i then built on md0 some logical volumes with LVM2, i usually name them /dev/linux/root, /dev/centos/usr, /dev/obiwan/home, or something like that. as far as the other partition, i thought i was doing the right thing by performing:
rsync -av --delete /boot /boot2
... to sync the kernel/initrd after a yum update included a kernel update, but that's only 1/2 of it. in today's failed case, it was hda that failed, which brings us to the crux of the problem - where's your bootloader now, eh? basically, nowhere. i'm screwed. so, i broke out the knoppix dvd and get to installing a bootloader on the second drive so i could bring the system up. how could i have prevented this from happening?

well, i think i have it worked out:
  1. edit /boot/grub/device.map. make sure there's an entry for the second device there. in my case, it would be:
    (hd1) /dev/hdb
  2. since grub-install likes to install in /boot of the grub root (very different from the system root - "/"), i gave it a little symlink hack:
    cd /boot; ln -s . boot
  3. clean-up! get rid of all those old kernels that were installed with yum update:
    rpm -e kernel-old-version-blah
  4. re-sync everything:
    rsync -av --delete /boot /boot2
  5. now install grub on the second drive:
    grub-install --root-directory=/boot2 /dev/hdb


i think that should do it. i'm going to see if there's a way i can test this - maybe i'll pull some of the really 2GB drives out of the closet and get them in the test system to simulate failure.

Update: so much for that... i just got:
This is an automatically generated mail message from mdadm
running on obiwan

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.


ding-dong, the system's dead. if i'm gonna be using knoppix so much, maybe i should re-download the latest dvd. sigh

Labels:

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home