thr3ads.net - CentOS - [CentOS] DegradedArray message [Dec 2014]

If this information is useful, please help other people find it:
Share via:

David McGuffey

2014-Dec-04 13:45 UTC

[CentOS] DegradedArray message

Thanks for all the responses.  A little more digging revealed:

md0 is made up of two 250G disks on which the OS and a very large /var
partions resides for a number of virtual machines.

md1 is made up of two 2T disks on which /home resides.

Challenge is that disk 0 of md0 is the problem and it has a 524M /boot
partition outside of the raid partition.

My plan is to back up /home (md1) and at a minimum /etc/libvirt
and /var/lib/libvirt (md0) before I do anything else.

Here are the log entries for 'raid'

Dec  1 20:50:15 desk4 kernel: md/raid1:md1: not clean -- starting
background reconstruction
Dec  1 20:50:15 desk4 kernel: md/raid1:md1: active with 2 out of 2
mirrors
Dec  1 20:50:15 desk4 kernel: md/raid1:md0: active with 1 out of 2
mirrors

This is a desktop, not a server. We've had several short (<20 sec) power
outages over the last month. The last one was on 1 Dec. I suspect the
sudden loss and restoration of power could have trashed a portion of
disk 0 in md0.

I finally obtained an APC UPS (BX1500G), installed, configured, and
tested it. In the future, it will carry me through these short outages.

I'll obtain a new 250G (or larger) drive and start rooting around for
guidance on how to replace a drive with the MBR and /boot on it.

On Wed, 2014-12-03 at 22:11 +0100, Leon Fauster wrote:> Hi David,
> 
> Am 03.12.2014 um 02:14 schrieb David McGuffey <davidmcguffey at
verizion.net>:
> > This is an automatically generated mail message from mdadm
> > running on desk4
> > 
> > A DegradedArray event had been detected on md device /dev/md0.
> > 
> > Faithfully yours, etc.
> > 
> > P.S. The /proc/mdstat file currently contains the following:
> > 
> > Personalities : [raid1] 
> > md0 : active raid1 dm-2[1]
> >      243682172 blocks super 1.1 [2/1] [_U]
> >      bitmap: 2/2 pages [8KB], 65536KB chunk
> > 
> > md1 : active raid1 dm-3[0] dm-0[1]
> >      1953510268 blocks super 1.1 [2/2] [UU]
> >      bitmap: 3/15 pages [12KB], 65536KB chunk
> 
> 
> the reason why one drive was kicked out (above [_U] ) will 
> be in /var/log/messages. If it is also part of md1 then 
> it should be manually removed from md1 before replacing the 
> hd. 
> 
> --
> LF
> 
> 
> 
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

Gordon Messmer

2014-Dec-05 00:46 UTC

head link

[CentOS] DegradedArray message

On 12/04/2014 05:45 AM, David McGuffey wrote:> md0 is made up of two 250G disks on which the OS and a very large /var
> partions resides for a number of virtual machines.
...> Challenge is that disk 0 of md0 is the problem and it has a 524M /boot
> partition outside of the raid partition.
Assuming that you have an unused drive port, you can fix that pretty easily.

Attach a new replacement disk to the unused port. Let's say that it
comes up as /dev/sde.

Copy the partition table to it (unless it's GPT, in which case use parted):
sfdisk -d /dev/sda | sfdisk /dev/sde

Unmount /boot and copy that partition (assuming that it is sda1):
umount /boot
dd if=/dev/sda1 of=/dev/sde1 bs=1M

Install grub on the new drive:
grub-install /dev/sde

At that point, you should be able to also add the new partition to the
md array:
mdadm /dev/md0 /dev/sda2

Once it rebuilds, shut down. Remove the bad drive. Put the new drive
in its place. In theory the system will boot and be whole.

In practice, however, there's a bunch of information you didn't provide,
so some of those steps are wrong.

I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in your
mdstat. I'm guessing that you made partitions, and then made LVM or
crypto devices, and then did RAID on top of that. If either of those
are correct, that's completely the wrong way to build RAID sets. You
risk either bad performance from doing crypto more often than is
required, or possibly corruption as a result of LVM not mapping blocks
the way you expect.

If you build software RAID, I really strongly recommend that you keep it
as simple as possible. That means a) build sofware RAID sets from raw
partitions and b) use as few partitions as possible.

Typically, I'll create two partitions on all disks. The first is a
small partition for /boot, which may be part of a RAID1 set or may be
unused. The second partition covers the rest of the drive and will be
used in whatever arrangement is suitable for that system, whether it's
RAID1, RAID5, or RAID10. All of the drives are consistent, so there's
always a place to copy /boot, and just one script or process to set up
new disks regardless of their position in the array. md0 is used for
/boot, and md1 is an LVM PV. All of the filesystems other than /boot
are LVs.

Hopefully btrfs will become the default fs in the near future and all of
this will be vastly simplified.

David McGuffey

2014-Dec-09 02:11 UTC

head link

[CentOS] DegradedArray message

On Thu, 2014-12-04 at 16:46 -0800, Gordon Messmer wrote:> On 12/04/2014 05:45 AM, David McGuffey wrote:
> In practice, however, there's a bunch of information you didn't
provide,
> so some of those steps are wrong.
> 
> I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in
your
> mdstat.  I'm guessing that you made partitions, and then made LVM or 
> crypto devices, and then did RAID on top of that.  If either of those 
> are correct, that's completely the wrong way to build RAID sets.  You 
> risk either bad performance from doing crypto more often than is 
> required, or possibly corruption as a result of LVM not mapping blocks 
> the way you expect.
> 
> If you build software RAID, I really strongly recommend that you keep it 
> as simple as possible.  That means a) build sofware RAID sets from raw 
> partitions and b) use as few partitions as possible.
> 
Gordon,

Agree, I've probably made it too complicated. It is a workstation with
sensitive data on it so I've encrypted the partitions.

md1 is fairly simple...two large disks in raid1, encrypted, and mounted
as /home.

md0 is probably way too complicated and not a good way to go.  The
sensitive data in md0 is in /var (virtual machines).

I've backed up both /home and /var/lib/libvirt/images, so I think I'll
start over on md0 with a new disk and a fresh install.

Dave

Seemingly Similar Threads

Search for more seemingly similar threads

CentOS - Dec 2014 - DegradedArray message

[CentOS] DegradedArray message

[CentOS] DegradedArray message

[CentOS] DegradedArray message

Seemingly Similar Threads