thr3ads.net - freebsd stable - confusing status/log messages for degraded array [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Joe Rhett

2004-Dec-12 21:26 UTC

confusing status/log messages for degraded array

Soren, I have a quick question.  Am I misreading this, or is it misleading?

During testing of a sil3114 controller (sorry, I know you hate this) 
I pulled out drive 4, which was part of a mirror:

	sandbox# atacontrol list
	ATA channel 0:
	    Master:      no device present
	    Slave:       no device present
	ATA channel 1:
	    Master: acd0 <CD-540E/1.0A> ATA/ATAPI revision 0
	    Slave:       no device present
	ATA channel 2:
	    Master:  ad4 <ST380013AS/3.19> Serial ATA v1.0
	    Slave:       no device present
	ATA channel 3:
	    Master:  ad6 <ST380013AS/3.19> Serial ATA v1.0
	    Slave:       no device present
	ATA channel 4:
	    Master:  ad8 <ST380013AS/3.18> Serial ATA v1.0
    	    Slave:       no device present
	ATA channel 5:
	    Master:      no device present
	    Slave:       no device present

Okay, ad10 is offline.  However atacontrol reports:

	sandbox# atacontrol status 0
	ar0: ATA RAID1 subdisks: ad6 DOWN status: DEGRADED

Does this mean ad6 is up, and the other disk is down?  That would make
sense, but it would be nice to know which disk that is in case one has a
faulty memory or just plain too many systems to manage ;-)

But it gets worse.  So I reinstall the disk and 

	sandbox# atacontrol detach 5
	sandbox# atacontrol attach 5

	Master: ad10 <ST380013AS/3.18> Serial ATA v1.0
	Slave:       no device present

	sandbox# atacontrol rebuild 0

Check the logs and find 

	Dec 12 21:13:39 sandbox kernel: ad10: 76319MB <ST380013AS/3.18>
[155061/16/63] at ata5-master SATA150
	Dec 12 21:13:39 sandbox kernel: Opened disk ad10 -> 6

Again, this doesn't tell me which array, and also reads to me as if it is 
rebuilding from ad10 to ad6...  am I reading this wrong?

Just to shorten the thread, I think that clarity would be achieved with:

	sandbox# atacontrol status 0
	ar0: ATA RAID1 subdisks: ad6 ad10(DOWN) status: DEGRADED

	Dec 12 21:13:39 sandbox kernel: Rebuilding array ar0: ad6 -> ad10

-- 
Joe Rhett
Senior Geek
Meer.net

Joe Rhett

2004-Dec-12 21:42 UTC

head link

drive failure during rebuild causes page fault

And another, I can now confirm that it is fairly easy to kill 5.3-release
during the rebuilding process.  The following steps will cause a kernel
page fault consistently:

atacontrol create RAID0 ad6 ad10
atacontrol detach 5
	log: ad10 deleted from ar0 disk1
	log: ad10 WARNING - removed from configuration
atacontrol addspare 0 ad8
	log: ad8 inserted into ar0 disk1 as spare
atacontrol rebuild 0
atacontrol detach 4
	log: ad8 deleted from ar0 disk1
	log: ad8 WARNING - removed from configuration

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x10
....
current process = 1063 (rebuilding ar0 1%)
trap number = 12
panic: page fault

(tell me if you want or need anything I skipped above.  Got lazy cause I
had to type it in by hand...)

-- 
Joe Rhett
Senior Geek
Meer.net

Doug White

2004-Dec-12 21:59 UTC

head link

drive failure during rebuild causes page fault

On Sun, 12 Dec 2004, Joe Rhett wrote:
> And another, I can now confirm that it is fairly easy to kill 5.3-release
> during the rebuilding process.  The following steps will cause a kernel
> page fault consistently:
>
> atacontrol create RAID0 ad6 ad10
> atacontrol detach 5
> 	log: ad10 deleted from ar0 disk1
> 	log: ad10 WARNING - removed from configuration
> atacontrol addspare 0 ad8
> 	log: ad8 inserted into ar0 disk1 as spare
> atacontrol rebuild 0
> atacontrol detach 4
> 	log: ad8 deleted from ar0 disk1
> 	log: ad8 WARNING - removed from configuration
>
> Fatal trap 12: page fault while in kernel mode
> fault virtual address = 0x10
Thats a nice shotgun you have there.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org

asym

2004-Dec-15 16:16 UTC

head link

drive failure during rebuild causes page fault

At 18:57 12/15/2004, Gianluca wrote:>actually all the data I plan to keep on that server is gonna be backed up, 
>either to cdr/dvdr or in the original audio cds that I still have. what I 
>meant by integrity is trying to avoid having to go back to the backups to 
>restore 120G (or more in this case) that were on a dead drive. I've done
>that before, and even if it's no mission-critical data, it remains a
huge
>PITA :)
That's true.  Restoring is always a pain in the ass, no matter the media 
you use.

>thanks for the detailed explanation of how RAID5 works, somehow I didn't
>really catch the distinction between the normal and degraded operations on 
>the array.
>
>what would be your recommendations for this particular (and very limited) 
>application?
Honestly I'd probably go for a RAID1+0 setup.  It wastes half the space in 
total for mirroring, but it has none of the performance penalties of 
RAID-5, and upto half the drives in the array can fail without anything but 
speed being degraded.  You can sort of think of this as having a second 
dedicated array for 'backups' if you want, with the normal caveats -- 
namely that "destroyed" data cannot be recovered, such as things
purposely
deleted.

RAID5 sacrifices write speed and redundancy for the sake of space.  Since 
you're using IDE and the drives are pretty cheap, I don't see the need
for
such a sacrifice.

Just make sure the controller can do "real" 1+0.  Several vendoers are
confused about what the differences are between 1+0, 0+1, and 10 -- they 
mistakenly call their raid 0+1 support "RAID-10".

The difference is pretty important though.  If you have say 8 drives, in 
RAID 1+0 (aka 10) you would first create 4 RAID-1 mirrors with 2 disks 
each, and then use these 4 virtual disks in a RAID-0 stripe setup.  This 
would be optimal, as any 4 drives could fail provided they all came from 
different RAID-1 pairs.

In 0+1, you first create two 4-disk RAID-0 arrays and then use one as a 
mirror of the other to create one large RAID-1 disk.  In this setup, which 
has *no* benefits over 1+0, if any drive fails the entire 4-disk RAID-0 
stripe set that the disk is in goes offline and you are left with no 
redundancy -- the entire array is degraded running off the remaining 4-disk 
RAID-0 array, and if any of the drives in that array fail, you're smoked.

If you want redundancy to avoid having to possibly restore data, and you 
can afford more disks, go 1+0.  If you can't afford more disks, then one of 
the striped+parity solutions (-3, -4, -5) are all you can do.. but be ready 
to see write performance anywhere from "ok" on a $1500 controller, to 
"annoying" on a sub $500 controller, to "downright retardedly
slow" on
anything down in the cheap end -- including most IDE controllers -- Look up 
the controller, find out what I/O chip it's using (most are intel based, 
either StrongARM or i960) and see if the chip supports hardware XOR.  If it 
doesn't, you'll really wish it did.

Søren Schmidt

2005-May-22 16:24 UTC

head link

drive failure during rebuild causes page fault

On 22/05/2005, at 18:11, Joe Rhett wrote:
>>>> You need to overwrite the metadata (se above) which are located
in
>>>> different places again depending on metadata format.
>>>>
>>>
>>> So where is it located with the sil3114 controler?
>>> (same as 3112, but with 4 ports...)
>>>
>
> On Sun, May 22, 2005 at 12:45:05AM +0200, S?ren Schmidt wrote:
>
>> Depends on what BIOS you have on there, several exists for the SiI
>> chips, -current or mkIII would tell you which. Just null out the last
>> 63 sectors on the disks and you should be fine since all possible
>> formats are in that range...
>>
>
> I know how to do this using dd from the start of the disk.  How do  
> I do
> this at the end of the disk?
man dd ? :)

you need to get the size of the disk in sectors (hint atacontrol)

then you do dd if=/dev/zero of=/dev/adN oseek=(size-63)

- S?ren

freebsd stable - Dec 2004 - confusing status/log messages for degraded array

confusing status/log messages for degraded array

drive failure during rebuild causes page fault

drive failure during rebuild causes page fault

drive failure during rebuild causes page fault

drive failure during rebuild causes page fault