thr3ads.net - CentOS - [CentOS] C7, mdadm issues [Jan 2019]

If this information is useful, please help other people find it:
Share via:

mark

2019-Jan-30 15:33 UTC

[CentOS] C7, mdadm issues

Alessandro Baggi wrote:> Il 30/01/19 14:02, mark ha scritto:
>> On 01/30/19 03:45, Alessandro Baggi wrote:
>>> Il 29/01/19 20:42, mark ha scritto:
>>>> Alessandro Baggi wrote:
>>>>> Il 29/01/19 18:47, mark ha scritto:
>>>>>> Alessandro Baggi wrote:
>>>>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>>>>
>>>>>>>> I've no idea what happened, but the box I
was working on
>>>>>>>> last week has a *second* bad drive. Actually,
I'm starting
>>>>>>>> to wonder about that particulare hot-swap bay.
>>>>>>>>
>>>>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove.
I've added
>>>>>>>> /dev/sdi1...
>>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare,
and have yet
>>>>>>>> to find a reliable way to make either one
active.
>>>>>>>>
>>>>>>>> Actually, I would have expected the linux RAID
to replace a
>>>>>>>> failed one with a spare....
>>>>
>>>>>>> can you report your raid configuration like raid
level and
>>>>>>> raid devices and the current status from
/proc/mdstat?
>>>>>>>
>>>>>> Well, nope. I got to the point of rebooting the system
(xfs had
>>>>>> the RAID
>>>>>> volume, and wouldn't let go; I also commented out
the RAID
>>>>>> volume.
>>>>>>
>>>>>> It's RAID 5, /dev/sdb *also* appears to have died.
If I do
>>>>>> mdadm --assemble --force -v /dev/md0? /dev/sd[cefgdh]1
mdadm:
>>>>>> looking for devices for /dev/md0 mdadm: /dev/sdc1 is
identified
>>>>>> as a member of /dev/md0, slot 0.
>>>>>> mdadm: /dev/sdd1 is identified as a member of /dev/md0,
slot -1.
>>>>>>  mdadm: /dev/sde1 is identified as a member of
/dev/md0, slot
>>>>>> 2.
>>>>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0,
slot 3.
>>>>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0,
slot 4.
>>>>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0,
slot -1.
>>>>>>  mdadm: no uptodate device for slot 1 of /dev/md0
>>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>>>>> mdadm: no uptodate device for slot 5 of /dev/md0
>>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>>>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares -
not
>>>>>> enough to start the array.
>>>>>>
>>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that
both are
>>>>>> spares.
>>>>> Hi Mark,
>>>>> please post the result from
>>>>>
>>>>> cat /sys/block/md0/md/sync_action
>>>>
>>>> There is none. There is no /dev/md0. mdadm refusees, saying
that
>>>> it's lost too many drives.
>>>>
>>>> ?????? mark
>>>>
>>>>
>>>> _______________________________________________
>>>> CentOS mailing list
>>>> CentOS at centos.org
>>>> https://lists.centos.org/mailman/listinfo/centos
>>>
>>> I suppose that your config is 5 drive and 1 spare with 1 drive
>>> failed. It's strange that your spare was not used for resync.
>>> Then you added a new drive but it does not start because it marks
the
>>> new disk as spare and you have a raid5 with 4 devices and 2 spares.
>>>
>>> First I hope that you have a backup for all your data and don't
run
>>> some exotic command before backupping your data. If you can't
backup
>>> your data, it's a problem.
>>
>> This is at work. We have automated nightly backups, and I do offline
>> backups of the backups every two weeks.
>>>
>>> Have you tried to remove the last added device sdi1 and restart the
>>> raid and force to start a resync?
>>
>> The thing is, it had one? two? spares when /dev/sdb1 started dying, and
>>  it didn't use them.
>>>
>>> Have you tried to remove this 2 devices and re-add only the device
>>> that will be usefull for resync?? Maybe you can set 5 devices for
your
>>>  raid and not 6, if it works (after resync) you can add your spare
>>> device growing your raid set.
>>
>> I tried, and that's when I lost it (again), and it refuses to
>> assemble/start the RAID "not enough devices".
>>>
>>> Reading on google many users use --zero-superblock before re-add
the
>>> device.
>>
>> I can take one out, and re-add, but I think I'm going to have to
>> recreate the RAID again, and again restore from backup.
>>>
>>> Other user reassemble the raid using --assume-clean but I don't
know
>>> what effect it will produces
>
> Hope that someone give you a better help for this.
>
> Update here if you got the solution.
>
Not that I'm into American football, but I seem to have pulled off what I
understand is called a hail-mary: *without* zeroing the superrblocks, I
did a create with all six good drives, excluding /dev/sdb1, and explicitly
told it one spare.

And the array is there, complete with data, with *one* spare, five good
drives, and it's currently rebuilding the spare.

The last resort worked, though we'll see how long.

        mark

Alessandro Baggi

2019-Jan-30 15:59 UTC

head link

[CentOS] C7, mdadm issues

Il 30/01/19 16:33, mark ha scritto:> Alessandro Baggi wrote:
>> Il 30/01/19 14:02, mark ha scritto:
>>> On 01/30/19 03:45, Alessandro Baggi wrote:
>>>> Il 29/01/19 20:42, mark ha scritto:
>>>>> Alessandro Baggi wrote:
>>>>>> Il 29/01/19 18:47, mark ha scritto:
>>>>>>> Alessandro Baggi wrote:
>>>>>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>>>>>
>>>>>>>>> I've no idea what happened, but the box
I was working on
>>>>>>>>> last week has a *second* bad drive.
Actually, I'm starting
>>>>>>>>> to wonder about that particulare hot-swap
bay.
>>>>>>>>>
>>>>>>>>> Anyway, mdadm --detail shows /dev/sdb1
remove. I've added
>>>>>>>>> /dev/sdi1...
>>>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as
spare, and have yet
>>>>>>>>> to find a reliable way to make either one
active.
>>>>>>>>>
>>>>>>>>> Actually, I would have expected the linux
RAID to replace a
>>>>>>>>> failed one with a spare....
>>>>>
>>>>>>>> can you report your raid configuration like
raid level and
>>>>>>>> raid devices and the current status from
/proc/mdstat?
>>>>>>>>
>>>>>>> Well, nope. I got to the point of rebooting the
system (xfs had
>>>>>>> the RAID
>>>>>>> volume, and wouldn't let go; I also commented
out the RAID
>>>>>>> volume.
>>>>>>>
>>>>>>> It's RAID 5, /dev/sdb *also* appears to have
died. If I do
>>>>>>> mdadm --assemble --force -v /dev/md0?
/dev/sd[cefgdh]1 mdadm:
>>>>>>> looking for devices for /dev/md0 mdadm: /dev/sdc1
is identified
>>>>>>> as a member of /dev/md0, slot 0.
>>>>>>> mdadm: /dev/sdd1 is identified as a member of
/dev/md0, slot -1.
>>>>>>>   mdadm: /dev/sde1 is identified as a member of
/dev/md0, slot
>>>>>>> 2.
>>>>>>> mdadm: /dev/sdf1 is identified as a member of
/dev/md0, slot 3.
>>>>>>> mdadm: /dev/sdg1 is identified as a member of
/dev/md0, slot 4.
>>>>>>> mdadm: /dev/sdh1 is identified as a member of
/dev/md0, slot -1.
>>>>>>>   mdadm: no uptodate device for slot 1 of /dev/md0
>>>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>>>>>> mdadm: no uptodate device for slot 5 of /dev/md0
>>>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>>>>>> mdadm: /dev/md0 assembled from 4 drives and 2
spares - not
>>>>>>> enough to start the array.
>>>>>>>
>>>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but
that both are
>>>>>>> spares.
>>>>>> Hi Mark,
>>>>>> please post the result from
>>>>>>
>>>>>> cat /sys/block/md0/md/sync_action
>>>>>
>>>>> There is none. There is no /dev/md0. mdadm refusees, saying
that
>>>>> it's lost too many drives.
>>>>>
>>>>>  ?????? mark
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> CentOS mailing list
>>>>> CentOS at centos.org
>>>>> https://lists.centos.org/mailman/listinfo/centos
>>>>
>>>> I suppose that your config is 5 drive and 1 spare with 1 drive
>>>> failed. It's strange that your spare was not used for
resync.
>>>> Then you added a new drive but it does not start because it
marks the
>>>> new disk as spare and you have a raid5 with 4 devices and 2
spares.
>>>>
>>>> First I hope that you have a backup for all your data and
don't run
>>>> some exotic command before backupping your data. If you
can't backup
>>>> your data, it's a problem.
>>>
>>> This is at work. We have automated nightly backups, and I do
offline
>>> backups of the backups every two weeks.
>>>>
>>>> Have you tried to remove the last added device sdi1 and restart
the
>>>> raid and force to start a resync?
>>>
>>> The thing is, it had one? two? spares when /dev/sdb1 started dying,
and
>>>   it didn't use them.
>>>>
>>>> Have you tried to remove this 2 devices and re-add only the
device
>>>> that will be usefull for resync?? Maybe you can set 5 devices
for your
>>>>   raid and not 6, if it works (after resync) you can add your
spare
>>>> device growing your raid set.
>>>
>>> I tried, and that's when I lost it (again), and it refuses to
>>> assemble/start the RAID "not enough devices".
>>>>
>>>> Reading on google many users use --zero-superblock before
re-add the
>>>> device.
>>>
>>> I can take one out, and re-add, but I think I'm going to have
to
>>> recreate the RAID again, and again restore from backup.
>>>>
>>>> Other user reassemble the raid using --assume-clean but I
don't know
>>>> what effect it will produces
>>
>> Hope that someone give you a better help for this.
>>
>> Update here if you got the solution.
>>
> 
> Not that I'm into American football, but I seem to have pulled off what
I
> understand is called a hail-mary: *without* zeroing the superrblocks, I
> did a create with all six good drives, excluding /dev/sdb1, and explicitly
> told it one spare.
> 
> And the array is there, complete with data, with *one* spare, five good
> drives, and it's currently rebuilding the spare.
> 
> The last resort worked, though we'll see how long.
> 
>          mark
> 
> 
So you have recreated the array without faulty device?

mark

2019-Jan-30 17:49 UTC

head link

[CentOS] C7, mdadm issues

Alessandro Baggi wrote:> Il 30/01/19 16:33, mark ha scritto:
>
>> Alessandro Baggi wrote:
>>
>>> Il 30/01/19 14:02, mark ha scritto:
>>>
>>>> On 01/30/19 03:45, Alessandro Baggi wrote:
>>>>
>>>>> Il 29/01/19 20:42, mark ha scritto:
>>>>>
>>>>>> Alessandro Baggi wrote:
>>>>>>
>>>>>>> Il 29/01/19 18:47, mark ha scritto:
>>>>>>>
>>>>>>>> Alessandro Baggi wrote:
>>>>>>>>
>>>>>>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I've no idea what happened, but the
box I was working
>>>>>>>>>> on last week has a *second* bad drive.
Actually, I'm
>>>>>>>>>> starting to wonder about that
particulare hot-swap bay.
>>>>>>>>>>
>>>>>>>>>> Anyway, mdadm --detail shows /dev/sdb1
remove. I've
>>>>>>>>>> added /dev/sdi1...
>>>>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as
spare, and have
>>>>>>>>>> yet to find a reliable way to make
either one active.
>>>>>>>>>>
>>>>>>>>>> Actually, I would have expected the
linux RAID to
>>>>>>>>>> replace a failed one with a spare....
>>>>>>
>>>>>>>>> can you report your raid configuration like
raid level
>>>>>>>>> and raid devices and the current status
from /proc/mdstat?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Well, nope. I got to the point of rebooting the
system (xfs
>>>>>>>> had the RAID volume, and wouldn't let go; I
also commented
>>>>>>>> out the RAID volume.
>>>>>>>>
>>>>>>>> It's RAID 5, /dev/sdb *also* appears to
have died. If I do
>>>>>>>> mdadm --assemble --force -v /dev/md0?
/dev/sd[cefgdh]1
>>>>>>>> mdadm:
>>>>>>>> looking for devices for /dev/md0 mdadm:
/dev/sdc1 is
>>>>>>>> identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd1
>>>>>>>> is identified as a member of /dev/md0, slot -1.
mdadm:
>>>>>>>> /dev/sde1 is identified as a member of
/dev/md0, slot
>>>>>>>> 2.
>>>>>>>> mdadm: /dev/sdf1 is identified as a member of
/dev/md0, slot
>>>>>>>> 3.
>>>>>>>> mdadm: /dev/sdg1 is identified as a member of
/dev/md0, slot
>>>>>>>> 4.
>>>>>>>> mdadm: /dev/sdh1 is identified as a member of
/dev/md0, slot
>>>>>>>> -1.
>>>>>>>> mdadm: no uptodate device for slot 1 of
/dev/md0
>>>>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>>>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>>>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>>>>>>> mdadm: no uptodate device for slot 5 of
/dev/md0
>>>>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>>>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>>>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>>>>>>> mdadm: /dev/md0 assembled from 4 drives and 2
spares - not
>>>>>>>> enough to start the array.
>>>>>>>>
>>>>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but
that both
>>>>>>>> are spares.
>>>>>>> Hi Mark,
>>>>>>> please post the result from
>>>>>>>
>>>>>>> cat /sys/block/md0/md/sync_action
>>>>>>
>>>>>> There is none. There is no /dev/md0. mdadm refusees,
saying
>>>>>> that it's lost too many drives.
>>>>>>
>>>>>> ?????? mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> CentOS mailing list
>>>>>> CentOS at centos.org
>>>>>> https://lists.centos.org/mailman/listinfo/centos
>>>>>>
>>>>>
>>>>> I suppose that your config is 5 drive and 1 spare with 1
drive
>>>>> failed. It's strange that your spare was not used for
resync. Then
>>>>> you added a new drive but it does not start because it
marks the
>>>>> new disk as spare and you have a raid5 with 4 devices and 2
>>>>> spares.
>>>>>
>>>>> First I hope that you have a backup for all your data and
don't
>>>>> run some exotic command before backupping your data. If you
can't
>>>>> backup your data, it's a problem.
>>>>
>>>> This is at work. We have automated nightly backups, and I do
>>>> offline backups of the backups every two weeks.
>>>>>
>>>>> Have you tried to remove the last added device sdi1 and
restart
>>>>> the raid and force to start a resync?
>>>>
>>>> The thing is, it had one? two? spares when /dev/sdb1 started
dying,
>>>> and it didn't use them.
>>>>>
>>>>> Have you tried to remove this 2 devices and re-add only the
>>>>> device that will be usefull for resync?? Maybe you can set
5
>>>>> devices for your raid and not 6, if it works (after resync)
you
>>>>> can add your spare device growing your raid set.
>>>>
>>>> I tried, and that's when I lost it (again), and it refuses
to
>>>> assemble/start the RAID "not enough devices".
>>>>>
>>>>> Reading on google many users use --zero-superblock before
re-add
>>>>> the device.
>>>>
>>>> I can take one out, and re-add, but I think I'm going to
have to
>>>> recreate the RAID again, and again restore from backup.
>>>>>
>>>>> Other user reassemble the raid using --assume-clean but I
don't
>>>>> know what effect it will produces
>>>
>>> Hope that someone give you a better help for this.
>>>
>>>
>>> Update here if you got the solution.
>>>
>>>
>>
>> Not that I'm into American football, but I seem to have pulled off
what
>> I
>> understand is called a hail-mary: *without* zeroing the superrblocks, I
>> did a create with all six good drives, excluding /dev/sdb1, and
>> explicitly told it one spare.
>>
>> And the array is there, complete with data, with *one* spare, five good
>>  drives, and it's currently rebuilding the spare.
>>
>> The last resort worked, though we'll see how long.
>>
> So you have recreated the array without faulty device?
>Yep.
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 /dev/sd[cdefgh]1

It's currently at 2.2% recovered for the extra drive.

     mark

Seemingly Similar Threads

Search for more seemingly similar threads

CentOS - Jan 2019 - C7, mdadm issues

[CentOS] C7, mdadm issues

[CentOS] C7, mdadm issues

[CentOS] C7, mdadm issues

Seemingly Similar Threads