thr3ads.net - freebsd stable - A nasty ataraid experience. [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Bruce M Simpson

2009-Jan-23 10:26 UTC

A nasty ataraid experience.

Hi there,

I had a bit of a nasty experience this week with ataraid(4).

I thought I would summarize the issues I ran into so hopefully others 
can benefit from my nasty experience, should they experience catastophic 
failure of a Pseudo-RAID. I was surprised that I was unable to find much 
in the way of similar experience in forums.

Luckily, I didn't lose any data, thanks to mr ports/sysutils/scan_ffs.

Bog simple setup. Gigabyte VM900M motherboard, Intel Core Duo T4300. 
JMicron JMB363 controller, two SATA ports, RAID mode. Two Seagate 160GB 
drives.

I'll skip the stuff about the strange loop I got into with having to 
recreate the MBR and disklabels after they got trashed -- suffice to 
say, BACK THEM UP... BACK THEM UP...

---

1. atacontrol rebuild.

There are a few issues here. I'm partly to blame here for not reading 
the documentation in the handbook about the rebuild process -- however 
-- hilarity ensued.

Following the rebuild procedure in the Handbook, if you try to run 
"atacontrol rebuild" from the FreeBSD 7.1 LiveFS, it'll break. I
ran it
thinking that it had some kind of magic in it which I couldn't achieve 
using dd alone, which is partly true, but also partly not true.

It has a hardcoded path to /usr/bin/nice, which it runs using the 
system() libc call, and unfortunately, the LiveFS is rooted at /mnt2. It 
does this after it issues an ioctl() to tell the ATA driver to copy and 
rewrite the meta-data to the new "spare" drive.

Ooops.

At this point the state of the array is inconsistent. "atacontrol 
status" will report the array as REBUILDING, despite the fact that the 
dd job never kicked off. Because the metadata has now been rewritten, 
and the ataraid driver has entered a REBUILDING state, you can't stop 
it, and it won't rebuild.

I also found that the default dd block size it uses, 1m, didn't work 
with my drives -- I had to dd manually with a 64KB block size to get 
things to work, otherwise I got lots and lots of ATA read/write errors 
related to trying to write beyond the last part of the disk. The drives 
themselves are fine, though.

HOMEWORK: "atacontrol rebuild" needs to be taught to play nice when
run
in a catastrophic recovery situation; the path stuff needs to be fixed, 
and perhaps some magic should be added to allow the metadata to be 
zapped when Really Bad Stuff happens.

---

2. raid metadata, and drive sizes.

OK, the tradeoff with ataraid is that it is pseudo-raid. That's 
understood, however, it's easy for the metadata to be downright out of 
sync with

After my bad experience with "atacontrol rebuild" from the LiveFS, to 
trick FreeBSD back into understanding that the array was in fact 
degraded, I had to read the ataraid driver code to figure out which LBA 
it was expecting to see the metadata at, and then wipe that out with dd.

It doesn't help that the drives themselves are of different sizes.
So. Imagine the hilarity when I just swap the drives and try to rebuild 
the array. Ooops.

HOMEWORK: Is there a way to use the system partition stuff e.g. ATA SET 
MAX ADDRESS to get around this?

Obviously it would mean losing a few sectors at the end of the drive, 
but it's a small price to pay for sanity with Pseudo-RAID.

---

2. RAID BIOS.

I have been using a JMicron 36x PCI-e controller. Unfortunately, when 
stuff like the MBR is broken, it says nothing informative -- it will 
just skip to the next INT 19h handler. This is more something which 
should be thrown at the BIOS vendors -- I don't believe there isn't 
enough space in there to print a message which says "The drive geometry 
is invalid".

HOMEWORK: Someone needs to throw a wobbly at the vendors.

---

3. fdisk and drive geometry.

The root cause of my boot failure, it turned out, was that the drive 
geometry was "wrong" for the JMicron SATA RAID BIOS.

It turns out sysinstall's guess at the drive geometry is "right"
for LBA
mode (C/H/S n/255/63), and thus "right" for my SATA controller BIOS. 
fdisk, however, wants to use a C/H/S of n/16/63 by default.

Profanity ensues.

HOMEWORK: Why does fdisk still assume 16 heads... ? Perhaps we should 
have a switch to tell it to use the LBA-style C/H/S converted geometry?

---

Redux

I understand why organisations pay good money for hardware RAID 
controllers, but given that time has moved on, this is kinda effed up. I 
shouldn't, in an ideal world, have to bust hours of ninja admin moves 
just to get a single RAID-ed home server back up and running.

I also now understand that I can't rely on RAID alone to keep the 
integrity of my own data -- there is no substitute for backups, I just 
wish there were realistic backup solutions for individuals trying to do 
things with technology right now, without paying over the odds, or being 
ripped off.

A "good enough" cheap solution is what individuals often end up using,
to get things going in life with minimal resources or wastage.  I hope 
others benefit from my experience.

cheers
BMS

P.S. Big thanks to Doug White for helping me to get /etc recreated after 
bits of it got trashed.

Michel Talon

2009-Jan-23 12:43 UTC

head link

A nasty ataraid experience.

Bruce M Simpson wrote:
> Following the rebuild procedure in the Handbook, if you try to run 
> "atacontrol rebuild" from the FreeBSD 7.1 LiveFS, it'll
break. I ran it
> thinking that it had some kind of magic in it which I couldn't achieve 
> using dd alone, which is partly true, but also partly not true.
> 
> It has a hardcoded path to /usr/bin/nice, which it runs using the 
> system() libc call, and unfortunately, the LiveFS is rooted at /mnt2. It 
> does this after it issues an ioctl() to tell the ATA driver to copy and 
> rewrite the meta-data to the new "spare" drive.
I always use the LiveFS cdromin the following way:

chroot /mnt2
set -o emacs
mount -t devfs devfs /dev
export PAGER=more

so that i am exactly in the same situation as on a real machine.
> HOMEWORK: Why does fdisk still assume 16 heads... ? Perhaps we should 
> have a switch to tell it to use the LBA-style C/H/S converted geometry?
FreeBSD fdisk is a calamity.

> I also now understand that I can't rely on RAID alone to keep the 
> integrity of my own data -- there is no substitute for backups, I just 
> wish there were realistic backup solutions for individuals trying to do 
> things with technology right now, without paying over the odds, or being 
> ripped off.

A solution to keep backups without "paying over the odds" is to backup
your data to another hard disk. This doesn't mean using RAID mirror, it
means using rsync or similar to copy data regularly. It is preferable
that this occurs on another machine, and even better in another
geographic location. But even if the backup disk is on the same machine,
this protects againts inadvertent deletions of a file or RAID
misbehaviors. The risk being that some hardware problem simultaneously
corrupts the main storage and the backup. Modern features such as
UFS snapshots or better ZFS snapshots allow to produce better backups.

-- 

Michel TALON

Oliver Fromme

2009-Jan-23 14:44 UTC

head link

A nasty ataraid experience.

Bruce M Simpson wrote:
 > [...]
 > I also now understand that I can't rely on RAID alone to keep the 
 > integrity of my own data -- there is no substitute for backups,

That's 100% true.  RAID -- even true hardware RAID -- is
*never* a substitute for backup.  Consider:
 - Fire.
 - Theft.
 - Lightning strike or power surge.
 - The cleaning woman knocks the tower over.
 - ...
In all of those cases, chances are that both disks in the
RAID mirror die.  Also, as you mentioned, it doesn't
protect agains human errors ("rm *" in the wrong directory
and similar things).

Backups should always be made to media that can be taken
offline and stored in a safe place:  Tape, optical disks,
hard disks in swappable drive trays, or external drives.

 > I just wish there were realistic backup solutions for individuals
 > trying to do things with technology right now, without paying over
 > the odds, or being ripped off.

I think the best solution is to use standard hard disks
in external enclosures (USB, Firewire, eSATA).  They're
fairly cheap these days and easy to handle.  Furthermore
they're quite fast.

For a very simple backup solution I recommend to buy at
least two USB disks and use them alternating, so you
still have a good backup on the shelf when something
catastrophic happens while the other one is connected
to your computer.

As for the backup software, I simply use "cpdup" (from
ports/sysutils/cpdup) to duplicate the file systems.
It's fast (copies only changed files), and you can
easily restore files after an "rm *" by simply copying
them back with cp.  (The only gotcha is that cpdup
doesn't preserve "holes" in sparse files, but cases
are rare where you need that.)

Of course you can also treat the disks like tapes and
dump(8) to the raw device (or tar, cpio, whatever),
if you prefer.

YMMV, of course.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch?ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n-
chen, HRB 125758,  Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"We will perhaps eventually be writing only small modules which are identi-
fied by name as they are used to build larger ones, so that devices like
indentation, rather than delimiters, might become feasible for expressing
local structure in the source language." -- Donald E. Knuth, 1974

Bruce Simpson

2009-Jul-23 18:57 UTC

head link

ataraid's revenge! (Was: Re: A nasty ataraid experience.)

6 months on, ataraid(4) did it again.

    This time, I was lucky -- I caught in in time, but the damage to the 
filesystem meant having to use fsdb to NULL out the affected inodes; 
mounting read-only, tarring, and untarring across the network, after a 
newfs, let me save the affected partition.
    All I was doing at the time was srm'ing a few sensitive files; all 
the processes wedged in WCHAN getblk. It seems ataraid(4) is not robust 
against temporary drive/controller problems. The SMART logs on the 
affected array drives all check out just fine, there are no bad block 
remaps.

So, time to either buy a hardware RAID controller, or move to ZFS...

freebsd stable - Jan 2009 - A nasty ataraid experience.

A nasty ataraid experience.

A nasty ataraid experience.

A nasty ataraid experience.

ataraid's revenge! (Was: Re: A nasty ataraid experience.)