thr3ads.net - CentOS - [CentOS] Drive failed in 4-drive md RAID 10 [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Kenneth Porter

2020-Sep-18 19:23 UTC

[CentOS] Drive failed in 4-drive md RAID 10

I got the email that a drive in my 4-drive RAID10 setup failed. What are my 
options?

Drives are WD1000FYPS (Western Digital 1 TB 3.5" SATA).

mdadm.conf:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/root level=raid10 num-devices=4 
UUID=942f512e:2db8dc6c:71667abc:daf408c3

/proc/mdstat:
Personalities : [raid10]
md127 : active raid10 sdf1[2](F) sdg1[3] sde1[1] sdd1[0]
      1949480960 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
      bitmap: 15/15 pages [60KB], 65536KB chunk

smartctl reports this for sdf:
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always 
-       1
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline 
-       6

So it's got 6 bad blocks, 1 pending for remapping.

Can I clear the error and rebuild? (It's not clear what commands would do 
that.) Or should I buy a replacement drive? I'm considering a WDS100T1R0A 
(2.5" 1TB red drive), which Amazon has for $135, plus the 3.5"
adapter.

The system serves primarily as a home mail server (it fetchmails from an 
outside VPS serving as my domain's MX) and archival file server.

Simon Matter

2020-Sep-18 19:53 UTC

head link

[CentOS] Drive failed in 4-drive md RAID 10

> I got the email that a drive in my 4-drive RAID10 setup failed. What are
> my
> options?
>
> Drives are WD1000FYPS (Western Digital 1 TB 3.5" SATA).
>
> mdadm.conf:
>
> # mdadm.conf written out by anaconda
> MAILADDR root
> AUTO +imsm +1.x -all
> ARRAY /dev/md/root level=raid10 num-devices=4
> UUID=942f512e:2db8dc6c:71667abc:daf408c3
>
> /proc/mdstat:
> Personalities : [raid10]
> md127 : active raid10 sdf1[2](F) sdg1[3] sde1[1] sdd1[0]
>       1949480960 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
>       bitmap: 15/15 pages [60KB], 65536KB chunk
>
> smartctl reports this for sdf:
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
> -       1
> 198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline
> -       6
>
> So it's got 6 bad blocks, 1 pending for remapping.
>
> Can I clear the error and rebuild? (It's not clear what commands would
do
> that.) Or should I buy a replacement drive? I'm considering a
WDS100T1R0A
Hi,

mdadm --remove /dev/md127 /dev/sdf1

and then the same with --add should hotremove and add dev device again.

If it rebuilds fine it may again work for a long time.

Simon
> (2.5" 1TB red drive), which Amazon has for $135, plus the 3.5"
adapter.
>
> The system serves primarily as a home mail server (it fetchmails from an
> outside VPS serving as my domain's MX) and archival file server.
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

Kenneth Porter

2020-Sep-18 20:19 UTC

head link

[CentOS] Drive failed in 4-drive md RAID 10

--On Friday, September 18, 2020 10:53 PM +0200 Simon Matter 
<simon.matter at invoca.ch> wrote:
> mdadm --remove /dev/md127 /dev/sdf1
>
> and then the same with --add should hotremove and add dev device again.
>
> If it rebuilds fine it may again work for a long time.
Thanks. That reminds me: If I need to replace it, is there some easy way to 
figure out which drive bay is sdf? It's an old Supermicro rack chassis with 
6 drive bays. Perhaps a way to blink the drive light?

Jon Pruente

2020-Sep-18 20:33 UTC

head link

[CentOS] Drive failed in 4-drive md RAID 10

On Fri, Sep 18, 2020 at 3:20 PM Kenneth Porter <shiva at sewingwitch.com>
wrote:
> Thanks. That reminds me: If I need to replace it, is there some easy way
> to
> figure out which drive bay is sdf? It's an old Supermicro rack chassis
> with
> 6 drive bays. Perhaps a way to blink the drive light?
>
It's easy enough with dd. Be sure it's the drive you want to find then
put
dd if=/dev/sdf of=/dev/null into a shell, but don't run it. Look at the
drives and hit enter and watch for which one lights up. Then ^c while
watching to be sure the light turns off exactly when you hit it.

Kenneth Porter

2020-Sep-19 23:16 UTC

head link

[CentOS] Drive failed in 4-drive md RAID 10

--On Friday, September 18, 2020 10:53 PM +0200 Simon Matter 
<simon.matter at invoca.ch> wrote:
> mdadm --remove /dev/md127 /dev/sdf1
>
> and then the same with --add should hotremove and add dev device again.
>
> If it rebuilds fine it may again work for a long time.
This worked like a charm. When I added it back, it told me it was 
"re-adding" the drive, so it recognized the drive I'd just
removed. I
checked /proc/mdstat and it showed rebuilding. It took about 90 minutes to 
finish and is now running fine.

Possibly Parallel Threads

Search for more apparently analagous threads

CentOS - Sep 2020 - Drive failed in 4-drive md RAID 10

[CentOS] Drive failed in 4-drive md RAID 10

[CentOS] Drive failed in 4-drive md RAID 10

[CentOS] Drive failed in 4-drive md RAID 10

[CentOS] Drive failed in 4-drive md RAID 10

[CentOS] Drive failed in 4-drive md RAID 10

Possibly Parallel Threads