thr3ads.net - CentOS - [CentOS] Question about RAID 5 array rebuild with mdadm [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Mark Hennessy

2008-Apr-17 17:01 UTC

[CentOS] Question about RAID 5 array rebuild with mdadm

I'm using Centos 4.5 right now, and I had a RAID 5 array stop because  
two drives became unavailable.  After adjusting the cables on several  
occasions and shutting down and restarting, I was able to see the  
drives again.  This is when I snatched defeat from the jaws of  
victory.  Please, someone with vast knowledge of how RAID 5 with mdadm  
works, tell me if I have any chance at all that this array will pull  
through with most or all of my data.

Background info about the machine
/dev/md0 is a RAID1 consisting of /dev/sda1 and /dev/sda2
/dev/md1 is a RAID1 consisting of /dev/sda2 and /dev/sdb2
/dev/md2 (our special friend) is a RAID5 consisting of /dev/sd[c-j]

/dev/sdi and /dev/sdj were the drives that detached from the array and  
were marked as faulty.

I did the following things that in hindsight were probably VERY BAD

Step 1 (Misassign drives to wrong array):
I could probably have had things going again in a tenth of a second if  
I hadn't typed this:
mdadm --manage --add /dev/md0 /dev/sdi
mdadm --manage --add /dev/md0 /dev/sdi

This clobbered the superblock and replaced it with that of /dev/md0,  
yes?
well, that's what mdadm --misc --examine /dev/sdi and sdj said anyhow.

Ok, so what next?
Step 2 (rebuild the array but make sure the params are right!):
I wipe out the superblocks on all of the drives in the array and  
rebuild with --assume-clean
for i in c d e f g h i j ; do mdadm --zero-superblock /dev/sd$i ; done
mdadm --create /dev/md2 --assume-clean --level=5 --raid-devices=8 /dev/ 
sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj

ok, now it says that the array is recovering and will take about 10  
hours to rebulid.
/dev/sd[c-i] say that they are "active	sync" and /dev/sdj says
it's a
spare that's rebuilding.
But now I scroll back in my history and see that oops, the chunk size  
is WRONG.  Not only that, but I don't stop the array until the rebuild  
is at around 8%

Ok, I stop the array and rebuild with
mdadm --create /dev/md2 --assume-clean --level=5 --chunk --raid- 
devices=8 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/ 
sdi /dev/sdj

Now it says it's going to take another 10 hours to rebuild.

How likely are my data irretrievable/gone and at what step would it  
have happened if so?

Mark Hennessy

2008-Apr-17 17:06 UTC

head link

[CentOS] Question about RAID 5 array rebuild with mdadm

Sorry about that, my previous e-mail had just '--chunk' toward the  
bottom.  It should have been '--chunk=256'  Please see the quoted  
snippet for detail.

On Apr 17, 2008, at 1:01 PM, Mark Hennessy wrote:> Ok, I stop the array and rebuild with
> mdadm --create /dev/md2 --assume-clean --level=5 --chunk=256 --raid- 
> devices=8 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/ 
> sdi /dev/sdj

Ross S. W. Walker

2008-Apr-17 17:50 UTC

head link

[CentOS] Question about RAID 5 array rebuild with mdadm

Mark Hennessy wrote:> 
> I'm using Centos 4.5 right now, and I had a RAID 5 array stop because  
> two drives became unavailable.  After adjusting the cables on several  
> occasions and shutting down and restarting, I was able to see the  
> drives again.  This is when I snatched defeat from the jaws of  
> victory.  Please, someone with vast knowledge of how RAID 5 with mdadm  
> works, tell me if I have any chance at all that this array will pull  
> through with most or all of my data.
It may be possible...
> Background info about the machine
> /dev/md0 is a RAID1 consisting of /dev/sda1 and /dev/sda2
> /dev/md1 is a RAID1 consisting of /dev/sda2 and /dev/sdb2
> /dev/md2 (our special friend) is a RAID5 consisting of /dev/sd[c-j]
> 
> /dev/sdi and /dev/sdj were the drives that detached from the array and  
> were marked as faulty.
> 
> I did the following things that in hindsight were probably VERY BAD
> 
> Step 1 (Misassign drives to wrong array):
> I could probably have had things going again in a tenth of a second if  
> I hadn't typed this:
> mdadm --manage --add /dev/md0 /dev/sdi
> mdadm --manage --add /dev/md0 /dev/sdi
> 
> This clobbered the superblock and replaced it with that of /dev/md0, yes?
> well, that's what mdadm --misc --examine /dev/sdi and sdj said anyhow.
Hmm, not good, but we will mark this drive 'sdi' as bad.
> Ok, so what next?
> Step 2 (rebuild the array but make sure the params are right!):
> I wipe out the superblocks on all of the drives in the array and  
> rebuild with --assume-clean
> for i in c d e f g h i j ; do mdadm --zero-superblock /dev/sd$i ; done
> mdadm --create /dev/md2 --assume-clean --level=5 --raid-devices=8 /dev/ 
> sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj
Nooo, you need to make sure sdi is marked as 'bad' offline, you are
going to need to assemble the array degraded, then add sdi as a
replacement and let it rebuild sdi off the parity.
> ok, now it says that the array is recovering and will take about 10  
> hours to rebulid.
> /dev/sd[c-i] say that they are "active	sync" and 
> /dev/sdj says it's a  
> spare that's rebuilding.
> But now I scroll back in my history and see that oops, the chunk size  
> is WRONG.  Not only that, but I don't stop the array until the rebuild
> is at around 8%
Well, now I think it's all messed up.
> Ok, I stop the array and rebuild with
> mdadm --create /dev/md2 --assume-clean --level=5 --chunk --raid- 
> devices=8 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/ 
> sdi /dev/sdj
> 
> Now it says it's going to take another 10 hours to rebuild.
It's truly hosed now.
> How likely are my data irretrievable/gone and at what step would it  
> have happened if so?
I hope you have backups cause your going to need them.

If only you posted to the list BEFORE you tried to recover it without
knowing what to do.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Maybe Matching Threads

Search for more seemingly similar threads

CentOS - Apr 2008 - Question about RAID 5 array rebuild with mdadm

[CentOS] Question about RAID 5 array rebuild with mdadm

[CentOS] Question about RAID 5 array rebuild with mdadm

[CentOS] Question about RAID 5 array rebuild with mdadm

Maybe Matching Threads