thr3ads.net - CentOS - [CentOS] Software RAID1 Failure Help [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Matt

2014-Feb-07 22:47 UTC

[CentOS] Software RAID1 Failure Help

I am running software RAID1 on a somewhat critical server.  Today I
noticed one drive is giving errors.  Good thing I had RAID.  I planned
on upgrading this server in next month or so.  Just wandering if there
was an easy way to fix this to avoid rushing the upgrade?  Having a
single drive is slowing down reads as well, I think.

Thanks.




Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1
Currently unreadable (pending) sectors
Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline
uncorrectable sectors
Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1
Currently unreadable (pending) sectors
Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline
uncorrectable sectors


[root at server ~]# smartctl -H /dev/sda

=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: PASSED

[root at server ~]# smartctl -H /dev/sdb

=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: PASSED

[root at server ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 1.4T 142G 1.2T 12% /
/dev/md0 99M 37M 58M 39% /boot
tmpfs 7.9G 20K 7.9G 1% /dev/shm


[root at server ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
8385856 blocks [2/2] [UU]

md2 : active raid1 sdb3[2](F) sda3[0]
1456645568 blocks [2/1] [U_]

unused devices: <none>


[root at server ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Tue Jan 4 05:39:36 2011
Raid Level : raid1
Array Size : 1456645568 (1389.17 GiB 1491.61 GB)
Used Dev Size : 1456645568 (1389.17 GiB 1491.61 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Fri Feb 7 15:21:45 2014
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

Events : 0.758203

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed

2 8 19 - faulty spare /dev/sdb3


[root at server ~]# mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Tue Jan 4 05:39:36 2011
Raid Level : raid1
Array Size : 8385856 (8.00 GiB 8.59 GB)
Used Dev Size : 8385856 (8.00 GiB 8.59 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Feb 7 14:29:36 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Events : 0.460

Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2


[root at server ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Tue Jan 4 05:48:17 2011
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Feb 5 11:02:25 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Events : 0.460

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

Chris Stone

2014-Feb-07 23:47 UTC

head link

[CentOS] Software RAID1 Failure Help

Sure, replace the bad drive and rebuild build the mirror. Or add 2 drives,
using one to replace the bad one and the other as a hot spare. Then if one
of the 2 drives in the mirror fails again, the hot spare will take over for
it.


Chris


On Fri, Feb 7, 2014 at 3:47 PM, Matt <matt.mailinglists at gmail.com>
wrote:
> I am running software RAID1 on a somewhat critical server.  Today I
> noticed one drive is giving errors.  Good thing I had RAID.  I planned
> on upgrading this server in next month or so.  Just wandering if there
> was an easy way to fix this to avoid rushing the upgrade?  Having a
> single drive is slowing down reads as well, I think.
>
> Thanks.
>
>
>
>
> Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1
> Currently unreadable (pending) sectors
> Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline
> uncorrectable sectors
> Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1
> Currently unreadable (pending) sectors
> Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline
> uncorrectable sectors
>
>
> [root at server ~]# smartctl -H /dev/sda
>
> === START OF READ SMART DATA SECTION ==> SMART overall-health
self-assessment test result: PASSED
>
> [root at server ~]# smartctl -H /dev/sdb
>
> === START OF READ SMART DATA SECTION ==> SMART overall-health
self-assessment test result: PASSED
>
> [root at server ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/md2 1.4T 142G 1.2T 12% /
> /dev/md0 99M 37M 58M 39% /boot
> tmpfs 7.9G 20K 7.9G 1% /dev/shm
>
>
> [root at server ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb1[1] sda1[0]
> 104320 blocks [2/2] [UU]
>
> md1 : active raid1 sdb2[1] sda2[0]
> 8385856 blocks [2/2] [UU]
>
> md2 : active raid1 sdb3[2](F) sda3[0]
> 1456645568 blocks [2/1] [U_]
>
> unused devices: <none>
>
>
> [root at server ~]# mdadm --detail /dev/md2
> /dev/md2:
> Version : 0.90
> Creation Time : Tue Jan 4 05:39:36 2011
> Raid Level : raid1
> Array Size : 1456645568 (1389.17 GiB 1491.61 GB)
> Used Dev Size : 1456645568 (1389.17 GiB 1491.61 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 2
> Persistence : Superblock is persistent
>
> Update Time : Fri Feb 7 15:21:45 2014
> State : active, degraded
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 1
> Spare Devices : 0
>
> Events : 0.758203
>
> Number Major Minor RaidDevice State
> 0 8 3 0 active sync /dev/sda3
> 1 0 0 1 removed
>
> 2 8 19 - faulty spare /dev/sdb3
>
>
> [root at server ~]# mdadm --detail /dev/md1
> /dev/md1:
> Version : 0.90
> Creation Time : Tue Jan 4 05:39:36 2011
> Raid Level : raid1
> Array Size : 8385856 (8.00 GiB 8.59 GB)
> Used Dev Size : 8385856 (8.00 GiB 8.59 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Fri Feb 7 14:29:36 2014
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Events : 0.460
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync /dev/sda2
> 1 8 18 1 active sync /dev/sdb2
>
>
> [root at server ~]# mdadm --detail /dev/md0
> /dev/md0:
> Version : 0.90
> Creation Time : Tue Jan 4 05:48:17 2011
> Raid Level : raid1
> Array Size : 104320 (101.89 MiB 106.82 MB)
> Used Dev Size : 104320 (101.89 MiB 106.82 MB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Wed Feb 5 11:02:25 2014
> State : clean
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Events : 0.460
>
> Number Major Minor RaidDevice State
> 0 8 1 0 active sync /dev/sda1
> 1 8 17 1 active sync /dev/sdb1
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


-- 
Chris Stone
AxisInternet, Inc.
www.axint.net

SilverTip257

2014-Feb-07 23:51 UTC

head link

[CentOS] Software RAID1 Failure Help

On Fri, Feb 7, 2014 at 5:47 PM, Matt <matt.mailinglists at gmail.com>
wrote:
> I am running software RAID1 on a somewhat critical server.  Today I
> noticed one drive is giving errors.  Good thing I had RAID.  I planned
> on upgrading this server in next month or so.  Just wandering if there
> was an easy way to fix this to avoid rushing the upgrade?  Having a
> single drive is slowing down reads as well, I think.
>
> Thanks.
>
>Maybe it is slowing things down, but I would recommend you fix your RAID1
mirror to avoid losing all your data.

Hopefully the information below helps you...

If you have hotswap drives/caddies, then you should be able to remove the
drives while the server continues running.  First, hot fail and hot remove
[0] all raid members for that drive /dev/sdb from any software raid arrays
you have.  Next step is to remove the drive from the SCSI subsystem [1].
 Next step is to physically remove the drive and replace it with healthy
one.  Make the OS detect the new drive [2].  From there, you can use sfdisk
to clone the partition structure from the working drive to the new one.
 Then add the new partitions to your software raid arrays (and watch
/proc/mdstat as it rebuilds).

-f or --fail
-r or --remove
-a or --add

mdadm /dev/mdX -f /dev/sdbY
mdadm /dev/mdX -r /dev/sdbY

sfdisk -d /dev/sda | sfdisk /dev/sdb

mdadm /dev/mdX -a /dev/sdbY

watch /proc/mdstat

[0] http://www.ducea.com/2009/03/08/mdadm-cheat-sheet/
[1]
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/removing_devices.html
[2]
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/adding_storage-device-or-path.html

-- 
---~~.~~---
Mike
//  SilverTip257  //

Mogens Kjaer

2014-Feb-08 10:25 UTC

head link

[CentOS] Software RAID1 Failure Help

On 02/07/2014 11:47 PM, Matt wrote:> Having a
> single drive is slowing down reads as well, I think.
This depends upon how the RAID is set up.

A standard Linux RAID1 setup does not give better reading
performance when reading large files than a single disk.

I don't know if the RAID system is cleaver enough to
save some seek time.

In order to get better read performance you'll have
to set it up as RAID10 with far copies.

Mogens

-- 
Mogens Kjaer, mk at lemo.dk
http://www.lemo.dk

Apparently Analagous Threads

Search for more possibly parallel threads

CentOS - Feb 2014 - Software RAID1 Failure Help

[CentOS] Software RAID1 Failure Help

[CentOS] Software RAID1 Failure Help

[CentOS] Software RAID1 Failure Help

[CentOS] Software RAID1 Failure Help

Apparently Analagous Threads