thr3ads.net - CentOS - [CentOS] more software raid questions [Oct 2010]

If this information is useful, please help other people find it:
Share via:

fred smith

2010-Oct-19 23:59 UTC

[CentOS] more software raid questions

hi all!

back in Aug several of you assisted me in solving a problem where one
of my drives had dropped out of (or been kicked out of) the raid1 array.

something vaguely similar appears to have happened just a few mins ago,
upon rebooting after a small update. I received four emails like this,
one for /dev/md0, one for /dev/md1, one for /dev/md125 and one for
/dev/md126:

	Subject: DegradedArray event on /dev/md125:fcshome.stoneham.ma.us
	X-Spambayes-Classification: unsure; 0.24
	Status: RO
	Content-Length: 564
	Lines: 23

	This is an automatically generated mail message from mdadm
	running on fcshome.stoneham.ma.us

	A DegradedArray event had been detected on md device /dev/md125.

	Faithfully yours, etc.

	P.S. The /proc/mdstat file currently contains the following:

	Personalities : [raid1] 
	md0 : active raid1 sda1[0]
	      104320 blocks [2/1] [U_]
	      
	md126 : active raid1 sdb1[1]
	      104320 blocks [2/1] [_U]
	      
	md125 : active raid1 sdb2[1]
	      312464128 blocks [2/1] [_U]
	      
	md1 : active raid1 sda2[0]
	      312464128 blocks [2/1] [U_]
	      
	unused devices: <none>

firstly, what the heck are md125 and md126? previously there was
only md0 and md1.... ????

secondly, I'm not sure what it's trying to tell me. it says there was a 
"degradedarray event" but at the bottom it says there are no unused
devices.

there are also some messages in /var/log/messages from the time of the
boot earlier today, but they do NOT say anything about "kicking out"
any of the md member devices (as they did in the event back in August):

	Oct 19 18:29:41 fcshome kernel: device-mapper: dm-raid45: initialized v0.2594l
	Oct 19 18:29:41 fcshome kernel: md: Autodetecting RAID arrays.
	Oct 19 18:29:41 fcshome kernel: md: autorun ...
	Oct 19 18:29:41 fcshome kernel: md: considering sdb2 ...
	Oct 19 18:29:41 fcshome kernel: md:  adding sdb2 ...
	Oct 19 18:29:41 fcshome kernel: md: sdb1 has different UUID to sdb2
	Oct 19 18:29:41 fcshome kernel: md: sda2 has same UUID but different superblock
	to sdb2
	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sdb2
	Oct 19 18:29:41 fcshome kernel: md: created md125
	Oct 19 18:29:41 fcshome kernel: md: bind<sdb2>
	Oct 19 18:29:41 fcshome kernel: md: running: <sdb2>
	Oct 19 18:29:41 fcshome kernel: raid1: raid set md125 active with 1 out of 2
mir
	rors
	Oct 19 18:29:41 fcshome kernel: md: considering sdb1 ...
	Oct 19 18:29:41 fcshome kernel: md:  adding sdb1 ...
	Oct 19 18:29:41 fcshome kernel: md: sda2 has different UUID to sdb1
	Oct 19 18:29:41 fcshome kernel: md: sda1 has same UUID but different superblock
	to sdb1
	Oct 19 18:29:41 fcshome kernel: md: created md126
	Oct 19 18:29:41 fcshome kernel: md: bind<sdb1>
	Oct 19 18:29:41 fcshome kernel: md: running: <sdb1>
	Oct 19 18:29:41 fcshome kernel: raid1: raid set md126 active with 1 out of 2
mirrors
	Oct 19 18:29:41 fcshome kernel: md: considering sda2 ...
	Oct 19 18:29:41 fcshome kernel: md:  adding sda2 ...
	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sda2
	Oct 19 18:29:41 fcshome kernel: md: created md1
	Oct 19 18:29:41 fcshome kernel: md: bind<sda2>
	Oct 19 18:29:41 fcshome kernel: md: running: <sda2>
	Oct 19 18:29:41 fcshome kernel: raid1: raid set md1 active with 1 out of 2
mirrors
	Oct 19 18:29:41 fcshome kernel: md: considering sda1 ...
	Oct 19 18:29:41 fcshome kernel: md:  adding sda1 ...
	Oct 19 18:29:41 fcshome kernel: md: created md0
	Oct 19 18:29:41 fcshome kernel: md: bind<sda1>
	Oct 19 18:29:41 fcshome kernel: md: running: <sda1>
	Oct 19 18:29:41 fcshome kernel: raid1: raid set md0 active with 1 out of 2
mirrors
	Oct 19 18:29:41 fcshome kernel: md: ... autorun DONE.

and here's /etc/mdadm.conf:

	# cat /etc/mdadm.conf

	# mdadm.conf written out by anaconda
	DEVICE partitions
	MAILADDR fredex
	ARRAY /dev/md0 level=raid1 num-devices=2
uuid=4eb13e45:b5228982:f03cd503:f935bd69
	ARRAY /dev/md1 level=raid1 num-devices=2
uuid=5c79b138:e36d4286:df9cf6f6:62ae1f12

which doesn't say anything about md125 or md126,... might they be some kind
of detritus
or fragments left over from whatever kind of failure caused the array to become
degraded?

do ya suppose a boot from power-off might somehow give it a whack upside the
head so
it'll reassemble itself according to mdadm.conf?

I'm not sure which devices need to be failed and re-added to make it clean
again (which
is all I had to do when I had the aforementioned earlier problem.)

Thanks in advance for any advice!

Fred

-- 
---- Fred Smith -- fredex at fcshome.stoneham.ma.us
-----------------------------
                        The Lord is like a strong tower. 
             Those who do what is right can run to him for safety.
--------------------------- Proverbs 18:10 (niv) -----------------------------

Rob Kampen

2010-Oct-20 02:13 UTC

head link

[CentOS] more software raid questions

fred smith wrote:> hi all!
>
> back in Aug several of you assisted me in solving a problem where one
> of my drives had dropped out of (or been kicked out of) the raid1 array.
>
> something vaguely similar appears to have happened just a few mins ago,
> upon rebooting after a small update. I received four emails like this,
> one for /dev/md0, one for /dev/md1, one for /dev/md125 and one for
> /dev/md126:
>
> 	Subject: DegradedArray event on /dev/md125:fcshome.stoneham.ma.us
> 	X-Spambayes-Classification: unsure; 0.24
> 	Status: RO
> 	Content-Length: 564
> 	Lines: 23
>
> 	This is an automatically generated mail message from mdadm
> 	running on fcshome.stoneham.ma.us
>
> 	A DegradedArray event had been detected on md device /dev/md125.
>
> 	Faithfully yours, etc.
>
> 	P.S. The /proc/mdstat file currently contains the following:
>
> 	Personalities : [raid1] 
> 	md0 : active raid1 sda1[0]
> 	      104320 blocks [2/1] [U_]
> 	      
> 	md126 : active raid1 sdb1[1]
> 	      104320 blocks [2/1] [_U]
> 	      
> 	md125 : active raid1 sdb2[1]
> 	      312464128 blocks [2/1] [_U]
> 	      
> 	md1 : active raid1 sda2[0]
> 	      312464128 blocks [2/1] [U_]
> 	      
> 	unused devices: <none>
>
> firstly, what the heck are md125 and md126? previously there was
> only md0 and md1.... ????
>
> secondly, I'm not sure what it's trying to tell me. it says there
was a
> "degradedarray event" but at the bottom it says there are no
unused devices.
>
> there are also some messages in /var/log/messages from the time of the
> boot earlier today, but they do NOT say anything about "kicking
out"
> any of the md member devices (as they did in the event back in August):
>
> 	Oct 19 18:29:41 fcshome kernel: device-mapper: dm-raid45: initialized
v0.2594l
> 	Oct 19 18:29:41 fcshome kernel: md: Autodetecting RAID arrays.
> 	Oct 19 18:29:41 fcshome kernel: md: autorun ...
> 	Oct 19 18:29:41 fcshome kernel: md: considering sdb2 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sdb2 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sdb1 has different UUID to sdb2
> 	Oct 19 18:29:41 fcshome kernel: md: sda2 has same UUID but different
superblock
> 	to sdb2
>   
This appears to be the cause> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sdb2
> 	Oct 19 18:29:41 fcshome kernel: md: created md125
>   this was auto created - I've not experienced this myself and run half a 
dozen of these on different machines.> 	Oct 19 18:29:41 fcshome kernel: md: bind<sdb2>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sdb2>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md125 active with 1 out of
2 mir
> 	rors
>   
now it has mounted it separately> 	Oct 19 18:29:41 fcshome kernel: md: considering sdb1 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sdb1 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sda2 has different UUID to sdb1
> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has same UUID but different
superblock
> 	to sdb1
>   
and now for the second one> 	Oct 19 18:29:41 fcshome kernel: md: created md126
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sdb1>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sdb1>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md126 active with 1 out of
2 mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: considering sda2 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sda2 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sda2
> 	Oct 19 18:29:41 fcshome kernel: md: created md1
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sda2>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sda2>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md1 active with 1 out of 2
mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: considering sda1 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sda1 ...
> 	Oct 19 18:29:41 fcshome kernel: md: created md0
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sda1>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sda1>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md0 active with 1 out of 2
mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: ... autorun DONE.
>
> and here's /etc/mdadm.conf:
>
> 	# cat /etc/mdadm.conf
>
> 	# mdadm.conf written out by anaconda
> 	DEVICE partitions
> 	MAILADDR fredex
> 	ARRAY /dev/md0 level=raid1 num-devices=2
uuid=4eb13e45:b5228982:f03cd503:f935bd69
> 	ARRAY /dev/md1 level=raid1 num-devices=2
uuid=5c79b138:e36d4286:df9cf6f6:62ae1f12
>
> which doesn't say anything about md125 or md126,... might they be some
kind of detritus
> or fragments left over from whatever kind of failure caused the array to
become degraded?
>
>   now you need to decide (by looking at each device (may need to mount 
first.)) which is the correct master.
remove the other one and add it back to the original array - it will 
them rebuild.
If these are SATA drives just check the cable - I have one machine where 
they work loose and cause failures.> do ya suppose a boot from power-off might somehow give it a whack upside
the head so
> it'll reassemble itself according to mdadm.conf?
>   
doubt it - see the above dmesg.> I'm not sure which devices need to be failed and re-added to make it
clean again (which
> is all I had to do when I had the aforementioned earlier problem.)
>
> Thanks in advance for any advice!
>
> Fred
>
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rkampen.vcf
Type: text/x-vcard
Size: 278 bytes
Desc: not available
URL:
<http://lists.centos.org/pipermail/centos/attachments/20101019/817e5ba6/attachment-0002.vcf>

Tom H

2010-Oct-20 02:34 UTC

head link

[CentOS] more software raid questions

On Tue, Oct 19, 2010 at 7:59 PM, fred smith
<fredex at fcshome.stoneham.ma.us> wrote:>
> back in Aug several of you assisted me in solving a problem where one
> of my drives had dropped out of (or been kicked out of) the raid1 array.
>
> something vaguely similar appears to have happened just a few mins ago,
> upon rebooting after a small update. I received four emails like this,
> one for /dev/md0, one for /dev/md1, one for /dev/md125 and one for
> /dev/md126:
>
> ? ? ? ?Subject: DegradedArray event on /dev/md125:fcshome.stoneham.ma.us
> ? ? ? ?X-Spambayes-Classification: unsure; 0.24
> ? ? ? ?Status: RO
> ? ? ? ?Content-Length: 564
> ? ? ? ?Lines: 23
>
> ? ? ? ?This is an automatically generated mail message from mdadm
> ? ? ? ?running on fcshome.stoneham.ma.us
>
> ? ? ? ?A DegradedArray event had been detected on md device /dev/md125.
>
> ? ? ? ?Faithfully yours, etc.
>
> ? ? ? ?P.S. The /proc/mdstat file currently contains the following:
>
> ? ? ? ?Personalities : [raid1]
> ? ? ? ?md0 : active raid1 sda1[0]
> ? ? ? ? ? ? ?104320 blocks [2/1] [U_]
>
> ? ? ? ?md126 : active raid1 sdb1[1]
> ? ? ? ? ? ? ?104320 blocks [2/1] [_U]
>
> ? ? ? ?md125 : active raid1 sdb2[1]
> ? ? ? ? ? ? ?312464128 blocks [2/1] [_U]
>
> ? ? ? ?md1 : active raid1 sda2[0]
> ? ? ? ? ? ? ?312464128 blocks [2/1] [U_]
>
> ? ? ? ?unused devices: <none>
>
> firstly, what the heck are md125 and md126? previously there was
> only md0 and md1.... ????
>
> secondly, I'm not sure what it's trying to tell me. it says there
was a
> "degradedarray event" but at the bottom it says there are no
unused devices.
>
> there are also some messages in /var/log/messages from the time of the
> boot earlier today, but they do NOT say anything about "kicking
out"
> any of the md member devices (as they did in the event back in August):
>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: device-mapper: dm-raid45:
initialized v0.2594l
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: Autodetecting RAID arrays.
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: autorun ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: considering sdb2 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: ?adding sdb2 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sdb1 has different UUID to sdb2
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sda2 has same UUID but different
superblock
> ? ? ? ?to sdb2
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sdb2
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: created md125
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: bind<sdb2>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: running: <sdb2>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: raid1: raid set md125 active with 1
out of 2 mir
> ? ? ? ?rors
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: considering sdb1 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: ?adding sdb1 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sda2 has different UUID to sdb1
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sda1 has same UUID but different
superblock
> ? ? ? ?to sdb1
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: created md126
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: bind<sdb1>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: running: <sdb1>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: raid1: raid set md126 active with 1
out of 2 mirrors
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: considering sda2 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: ?adding sda2 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sda2
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: created md1
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: bind<sda2>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: running: <sda2>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: raid1: raid set md1 active with 1
out of 2 mirrors
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: considering sda1 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: ?adding sda1 ...
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: created md0
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: bind<sda1>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: running: <sda1>
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: raid1: raid set md0 active with 1
out of 2 mirrors
> ? ? ? ?Oct 19 18:29:41 fcshome kernel: md: ... autorun DONE.
>
> and here's /etc/mdadm.conf:
>
> ? ? ? ?# cat /etc/mdadm.conf
>
> ? ? ? ?# mdadm.conf written out by anaconda
> ? ? ? ?DEVICE partitions
> ? ? ? ?MAILADDR fredex
> ? ? ? ?ARRAY /dev/md0 level=raid1 num-devices=2
uuid=4eb13e45:b5228982:f03cd503:f935bd69
> ? ? ? ?ARRAY /dev/md1 level=raid1 num-devices=2
uuid=5c79b138:e36d4286:df9cf6f6:62ae1f12
>
> which doesn't say anything about md125 or md126,... might they be some
kind of detritus
> or fragments left over from whatever kind of failure caused the array to
become degraded?
The superblocks in sdb1 and sdb2 is different from the superblocks in
sda1 and sda2 so mdadm assembled sdb1 and sdb2 into different arrays.
I'd have expected them to be md126 and md127 not md125 and md126 bu
that's normal.

Your problem is that all four arrays are degraded.

Which ones are mounted? Assuming that you're running off the drives
with the most recent changes and updates, you'll have to stop the two
unused arrays, zero the superblocks, and add them to the running
arrays.

Nataraj

2010-Oct-20 02:34 UTC

head link

[CentOS] more software raid questions

fred smith wrote:helppain/backups/disks/> hi all!
>
> back in Aug several of you assisted me in solving a problem where one
> of my drives had dropped out of (or been kicked out of) the raid1 array.
>
> something vaguely similar appears to have happened just a few mins ago,
> upon rebooting after a small update. I received four emails like this,
> one for /dev/md0, one for /dev/md1, one for /dev/md125 and one for
> /dev/md126:
>
> 	Subject: DegradedArray event on /dev/md125:fcshome.stoneham.ma.us
> 	X-Spambayes-Classification: unsure; 0.24
> 	Status: RO
> 	Content-Length: 564
> 	Lines: 23
>
> 	This is an automatically generated mail message from mdadm
> 	running on fcshome.stoneham.ma.us
>
> 	A DegradedArray event had been detected on md device /dev/md125.
>
> 	Faithfully yours, etc.resources/
>
> 	P.S. The /proc/mdstat file currently contains the following:
>
> 	Personalities : [raid1] 
> 	md0 : active raid1 sda1[0]
> 	      104320 blocks [2/1] [U_]
> 	      
> 	md126 : active raid1 sdb1[1]
> 	      104320 blocks [2/1] [_U]
> 	      
> 	md125 : active raid1 sdb2[1]
> 	      312464128 blocks [2/1] [_U]
> 	      
> 	md1 : active raid1 sda2[0]
> 	      312464128 blocks [2/1] [U_]
> 	      
> 	unused devices: <none>
>
> firstly, what the heck are md125 and md126? previously there was
> only md0 and md1.... ????
>
> secondly, I'm not sure what it's trying to tell me. it says there
was a
> "degradedarray event" but at the bottom it says there are no
unused devices.
>
> there are also some messages in /var/log/messages from the time of the
> boot earlier today, but they do NOT say anything about "kicking
out"
> any of the md member devices (as they did in the event back in August):
>
> 	Oct 19 18:29:41 fcshome kernel: device-mapper: dm-raid45: initialized
v0.2594l
> 	Oct 19 18:29:41 fcshome kernel: md: Autodetecting RAID arrays.
> 	Oct 19 18:29:41 fcshome kernel: md: autorun ...
> 	Oct 19 18:29:41 fcshome kernel: md: considering sdb2 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sdb2 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sdb1 has different UUID to sdb2
> 	Oct 19 18:29:41 fcshome kernel: md: sda2 has same UUID but different
superblock
> 	to sdb2
> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sdb2
> 	Oct 19 18:29:41 fcshome kernel: md: created md125
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sdb2>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sdb2>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md125 active with 1 out of
2 mir
> 	rors
> 	Oct 19 18:29:41 fcshome kernel: md: considering sdb1 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sdb1 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sda2 has different UUID to sdb1
> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has same UUID but different
superblock
> 	to sdb1
> 	Oct 19 18:29:41 fcshome kernel: md: created md126
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sdb1>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sdb1>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md126 active with 1 out of
2 mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: considering sda2 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sda2 ...
> 	Oct 19 18:29:41 fcshome kernel: md: sda1 has different UUID to sda2
> 	Oct 19 18:29:41 fcshome kernel: md: created md1
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sda2>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sda2>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md1 active with 1 out of 2
mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: considering sda1 ...
> 	Oct 19 18:29:41 fcshome kernel: md:  adding sda1 ...
> 	Oct 19 18:29:41 fcshome kernel: md: created md0
> 	Oct 19 18:29:41 fcshome kernel: md: bind<sda1>
> 	Oct 19 18:29:41 fcshome kernel: md: running: <sda1>
> 	Oct 19 18:29:41 fcshome kernel: raid1: raid set md0 active with 1 out of 2
mirrors
> 	Oct 19 18:29:41 fcshome kernel: md: ... autorun DONE.
>
> and here's /etc/mdadm.conf:
>
> 	# cat /etc/mdadm.conf
>
> 	# mdadm.conf written out by anaconda
> 	DEVICE partitions
> 	MAILADDR fredex
> 	ARRAY /dev/md0 level=raid1 num-devices=2
uuid=4eb13e45:b5228982:f03cd503:f935bd69
> 	ARRAY /dev/md1 level=raid1 num-devices=2
uuid=5c79b138:e36d4286:df9cf6f6:62ae1f12
>
> which doesn't say anything about md125 or md126,... might they be some
kind of detritus
> or fragments left over from whatever kind of failure caused the array to
become degraded?
>
> do ya suppose a boot from power-off might somehow give it a whack upside
the head so
> it'll reassemble itself according to mdadm.conf?
>
> I'm not sure which devices need to be failed and re-added to make it
clean again (which
> is all I had to do when I had the aforementioned earlier problem.)
>
> Thanks in advance for any advice!
>
> Fred
>
>   I've seen this kind of thing happen when the autodetection stuff 
misbehaves. I'm not sure why it does this or how to prevent it. Anyway, 
to recover, I would use something like:

mdadm --stop /dev/md125
mdadm --stop /dev/md126

If for some reason the above commands fail, check and make sure it has 
not automounted the file systems from md125 and md126. Hopefully this 
won't happen.

Then use:
mdadm /dev/md0 -a /dev/sdXX
To add back the drive which belongs in md0, and similar for md1. In 
general, it won't let you add the wrong drive, but if you want to check use:
mdadm --examine /dev/sda1 | grep UUID
and so forth for all your drives and find the ones with the same UUID.

When I create my Raid arrays, I always use the option --bitmap=internal. 
With this option set, a bitmap is used to keep track of which pages on 
the drive are out of date and then you only resync pages which need 
updating instead of recopying the whole drive when this happens. In the 
past I once added a bitmap to an existing raid1 array using something 
like this. This may not be the exact command, but I know it can be done:
mdadm /dev/mdN --bitmap=internal

Adding the bitmap is very worthwhile and saves time and risk of data 
loss by not having to recopy the whole partition.

Nataraj

Seemingly Similar Threads

Search for more apparently analagous threads

CentOS - Oct 2010 - more software raid questions

[CentOS] more software raid questions

[CentOS] more software raid questions

[CentOS] more software raid questions

[CentOS] more software raid questions

Seemingly Similar Threads