I am running software RAID1 on a somewhat critical server. Today I noticed one drive is giving errors. Good thing I had RAID. I planned on upgrading this server in next month or so. Just wandering if there was an easy way to fix this to avoid rushing the upgrade? Having a single drive is slowing down reads as well, I think. Thanks. Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors [root at server ~]# smartctl -H /dev/sda === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: PASSED [root at server ~]# smartctl -H /dev/sdb === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: PASSED [root at server ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 1.4T 142G 1.2T 12% / /dev/md0 99M 37M 58M 39% /boot tmpfs 7.9G 20K 7.9G 1% /dev/shm [root at server ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 8385856 blocks [2/2] [UU] md2 : active raid1 sdb3[2](F) sda3[0] 1456645568 blocks [2/1] [U_] unused devices: <none> [root at server ~]# mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Tue Jan 4 05:39:36 2011 Raid Level : raid1 Array Size : 1456645568 (1389.17 GiB 1491.61 GB) Used Dev Size : 1456645568 (1389.17 GiB 1491.61 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Fri Feb 7 15:21:45 2014 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Events : 0.758203 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 0 0 1 removed 2 8 19 - faulty spare /dev/sdb3 [root at server ~]# mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Tue Jan 4 05:39:36 2011 Raid Level : raid1 Array Size : 8385856 (8.00 GiB 8.59 GB) Used Dev Size : 8385856 (8.00 GiB 8.59 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Feb 7 14:29:36 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Events : 0.460 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 [root at server ~]# mdadm --detail /dev/md0 /dev/md0: Version : 0.90 Creation Time : Tue Jan 4 05:48:17 2011 Raid Level : raid1 Array Size : 104320 (101.89 MiB 106.82 MB) Used Dev Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Feb 5 11:02:25 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Events : 0.460 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1
Sure, replace the bad drive and rebuild build the mirror. Or add 2 drives, using one to replace the bad one and the other as a hot spare. Then if one of the 2 drives in the mirror fails again, the hot spare will take over for it. Chris On Fri, Feb 7, 2014 at 3:47 PM, Matt <matt.mailinglists at gmail.com> wrote:> I am running software RAID1 on a somewhat critical server. Today I > noticed one drive is giving errors. Good thing I had RAID. I planned > on upgrading this server in next month or so. Just wandering if there > was an easy way to fix this to avoid rushing the upgrade? Having a > single drive is slowing down reads as well, I think. > > Thanks. > > > > > Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 > Currently unreadable (pending) sectors > Feb 7 15:28:28 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline > uncorrectable sectors > Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 > Currently unreadable (pending) sectors > Feb 7 15:58:29 server smartd[2980]: Device: /dev/sdb [SAT], 1 Offline > uncorrectable sectors > > > [root at server ~]# smartctl -H /dev/sda > > === START OF READ SMART DATA SECTION ==> SMART overall-health self-assessment test result: PASSED > > [root at server ~]# smartctl -H /dev/sdb > > === START OF READ SMART DATA SECTION ==> SMART overall-health self-assessment test result: PASSED > > [root at server ~]# df -h > Filesystem Size Used Avail Use% Mounted on > /dev/md2 1.4T 142G 1.2T 12% / > /dev/md0 99M 37M 58M 39% /boot > tmpfs 7.9G 20K 7.9G 1% /dev/shm > > > [root at server ~]# cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb1[1] sda1[0] > 104320 blocks [2/2] [UU] > > md1 : active raid1 sdb2[1] sda2[0] > 8385856 blocks [2/2] [UU] > > md2 : active raid1 sdb3[2](F) sda3[0] > 1456645568 blocks [2/1] [U_] > > unused devices: <none> > > > [root at server ~]# mdadm --detail /dev/md2 > /dev/md2: > Version : 0.90 > Creation Time : Tue Jan 4 05:39:36 2011 > Raid Level : raid1 > Array Size : 1456645568 (1389.17 GiB 1491.61 GB) > Used Dev Size : 1456645568 (1389.17 GiB 1491.61 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 2 > Persistence : Superblock is persistent > > Update Time : Fri Feb 7 15:21:45 2014 > State : active, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 1 > Spare Devices : 0 > > Events : 0.758203 > > Number Major Minor RaidDevice State > 0 8 3 0 active sync /dev/sda3 > 1 0 0 1 removed > > 2 8 19 - faulty spare /dev/sdb3 > > > [root at server ~]# mdadm --detail /dev/md1 > /dev/md1: > Version : 0.90 > Creation Time : Tue Jan 4 05:39:36 2011 > Raid Level : raid1 > Array Size : 8385856 (8.00 GiB 8.59 GB) > Used Dev Size : 8385856 (8.00 GiB 8.59 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 1 > Persistence : Superblock is persistent > > Update Time : Fri Feb 7 14:29:36 2014 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Events : 0.460 > > Number Major Minor RaidDevice State > 0 8 2 0 active sync /dev/sda2 > 1 8 18 1 active sync /dev/sdb2 > > > [root at server ~]# mdadm --detail /dev/md0 > /dev/md0: > Version : 0.90 > Creation Time : Tue Jan 4 05:48:17 2011 > Raid Level : raid1 > Array Size : 104320 (101.89 MiB 106.82 MB) > Used Dev Size : 104320 (101.89 MiB 106.82 MB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Wed Feb 5 11:02:25 2014 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Events : 0.460 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 17 1 active sync /dev/sdb1 > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >-- Chris Stone AxisInternet, Inc. www.axint.net
On Fri, Feb 7, 2014 at 5:47 PM, Matt <matt.mailinglists at gmail.com> wrote:> I am running software RAID1 on a somewhat critical server. Today I > noticed one drive is giving errors. Good thing I had RAID. I planned > on upgrading this server in next month or so. Just wandering if there > was an easy way to fix this to avoid rushing the upgrade? Having a > single drive is slowing down reads as well, I think. > > Thanks. > >Maybe it is slowing things down, but I would recommend you fix your RAID1 mirror to avoid losing all your data. Hopefully the information below helps you... If you have hotswap drives/caddies, then you should be able to remove the drives while the server continues running. First, hot fail and hot remove [0] all raid members for that drive /dev/sdb from any software raid arrays you have. Next step is to remove the drive from the SCSI subsystem [1]. Next step is to physically remove the drive and replace it with healthy one. Make the OS detect the new drive [2]. From there, you can use sfdisk to clone the partition structure from the working drive to the new one. Then add the new partitions to your software raid arrays (and watch /proc/mdstat as it rebuilds). -f or --fail -r or --remove -a or --add mdadm /dev/mdX -f /dev/sdbY mdadm /dev/mdX -r /dev/sdbY sfdisk -d /dev/sda | sfdisk /dev/sdb mdadm /dev/mdX -a /dev/sdbY watch /proc/mdstat [0] http://www.ducea.com/2009/03/08/mdadm-cheat-sheet/ [1] https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/removing_devices.html [2] https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/adding_storage-device-or-path.html -- ---~~.~~--- Mike // SilverTip257 //
On 02/07/2014 11:47 PM, Matt wrote:> Having a > single drive is slowing down reads as well, I think.This depends upon how the RAID is set up. A standard Linux RAID1 setup does not give better reading performance when reading large files than a single disk. I don't know if the RAID system is cleaver enough to save some seek time. In order to get better read performance you'll have to set it up as RAID10 with far copies. Mogens -- Mogens Kjaer, mk at lemo.dk http://www.lemo.dk