Alessandro Baggi wrote:> Il 29/01/19 18:47, mark ha scritto: >> Alessandro Baggi wrote: >>> Il 29/01/19 15:03, mark ha scritto: >>> >>>> I've no idea what happened, but the box I was working on last week >>>> has a *second* bad drive. Actually, I'm starting to wonder about >>>> that particulare hot-swap bay. >>>> >>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've added >>>> /dev/sdi1... >>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have yet to find >>>> a reliable way to make either one active. >>>> >>>> Actually, I would have expected the linux RAID to replace a failed >>>> one with a spare....>>> can you report your raid configuration like raid level and raid devices >>> and the current status from /proc/mdstat? >>> >> Well, nope. I got to the point of rebooting the system (xfs had the >> RAID >> volume, and wouldn't let go; I also commented out the RAID volume. >> >> It's RAID 5, /dev/sdb *also* appears to have died. If I do >> mdadm --assemble --force -v /dev/md0 /dev/sd[cefgdh]1 mdadm: looking for >> devices for /dev/md0 mdadm: /dev/sdc1 is identified as a member of >> /dev/md0, slot 0. >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2. >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3. >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4. >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot -1. >> mdadm: no uptodate device for slot 1 of /dev/md0 >> mdadm: added /dev/sde1 to /dev/md0 as 2 >> mdadm: added /dev/sdf1 to /dev/md0 as 3 >> mdadm: added /dev/sdg1 to /dev/md0 as 4 >> mdadm: no uptodate device for slot 5 of /dev/md0 >> mdadm: added /dev/sdd1 to /dev/md0 as -1 >> mdadm: added /dev/sdh1 to /dev/md0 as -1 >> mdadm: added /dev/sdc1 to /dev/md0 as 0 >> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not enough to >> start the array. >> >> --examine shows me /dev/sdd1 and /dev/sdh1, but that both are spares. > Hi Mark, > please post the result from > > cat /sys/block/md0/md/sync_actionThere is none. There is no /dev/md0. mdadm refusees, saying that it's lost too many drives. mark
Il 29/01/19 20:42, mark ha scritto:> Alessandro Baggi wrote: >> Il 29/01/19 18:47, mark ha scritto: >>> Alessandro Baggi wrote: >>>> Il 29/01/19 15:03, mark ha scritto: >>>> >>>>> I've no idea what happened, but the box I was working on last week >>>>> has a *second* bad drive. Actually, I'm starting to wonder about >>>>> that particulare hot-swap bay. >>>>> >>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've added >>>>> /dev/sdi1... >>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have yet to find >>>>> a reliable way to make either one active. >>>>> >>>>> Actually, I would have expected the linux RAID to replace a failed >>>>> one with a spare.... > >>>> can you report your raid configuration like raid level and raid devices >>>> and the current status from /proc/mdstat? >>>> >>> Well, nope. I got to the point of rebooting the system (xfs had the >>> RAID >>> volume, and wouldn't let go; I also commented out the RAID volume. >>> >>> It's RAID 5, /dev/sdb *also* appears to have died. If I do >>> mdadm --assemble --force -v /dev/md0 /dev/sd[cefgdh]1 mdadm: looking for >>> devices for /dev/md0 mdadm: /dev/sdc1 is identified as a member of >>> /dev/md0, slot 0. >>> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. >>> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2. >>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3. >>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4. >>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot -1. >>> mdadm: no uptodate device for slot 1 of /dev/md0 >>> mdadm: added /dev/sde1 to /dev/md0 as 2 >>> mdadm: added /dev/sdf1 to /dev/md0 as 3 >>> mdadm: added /dev/sdg1 to /dev/md0 as 4 >>> mdadm: no uptodate device for slot 5 of /dev/md0 >>> mdadm: added /dev/sdd1 to /dev/md0 as -1 >>> mdadm: added /dev/sdh1 to /dev/md0 as -1 >>> mdadm: added /dev/sdc1 to /dev/md0 as 0 >>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not enough to >>> start the array. >>> >>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both are spares. >> Hi Mark, >> please post the result from >> >> cat /sys/block/md0/md/sync_action > > There is none. There is no /dev/md0. mdadm refusees, saying that it's lost > too many drives. > > mark > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >I suppose that your config is 5 drive and 1 spare with 1 drive failed. It's strange that your spare was not used for resync. Then you added a new drive but it does not start because it marks the new disk as spare and you have a raid5 with 4 devices and 2 spares. First I hope that you have a backup for all your data and don't run some exotic command before backupping your data. If you can't backup your data, it's a problem. Have you tried to remove the last added device sdi1 and restart the raid and force to start a resync? Have you tried to remove this 2 devices and re-add only the device that will be usefull for resync? Maybe you can set 5 devices for your raid and not 6, if it works (after resync) you can add your spare device growing your raid set. Reading on google many users use --zero-superblock before re-add the device. Other user reassemble the raid using --assume-clean but I don't know what effect it will produces Hope that this helps.
On 01/30/19 03:45, Alessandro Baggi wrote:> Il 29/01/19 20:42, mark ha scritto: >> Alessandro Baggi wrote: >>> Il 29/01/19 18:47, mark ha scritto: >>>> Alessandro Baggi wrote: >>>>> Il 29/01/19 15:03, mark ha scritto: >>>>> >>>>>> I've no idea what happened, but the box I was working on last week >>>>>> has a *second* bad drive. Actually, I'm starting to wonder about >>>>>> that particulare hot-swap bay. >>>>>> >>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've added >>>>>> /dev/sdi1... >>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have yet to find >>>>>> a reliable way to make either one active. >>>>>> >>>>>> Actually, I would have expected the linux RAID to replace a failed >>>>>> one with a spare.... >> >>>>> can you report your raid configuration like raid level and raid devices >>>>> and the current status from /proc/mdstat? >>>>> >>>> Well, nope. I got to the point of rebooting the system (xfs had the >>>> RAID >>>> volume, and wouldn't let go; I also commented out the RAID volume. >>>> >>>> It's RAID 5, /dev/sdb *also* appears to have died. If I do >>>> mdadm --assemble --force -v /dev/md0? /dev/sd[cefgdh]1 mdadm: looking for >>>> devices for /dev/md0 mdadm: /dev/sdc1 is identified as a member of >>>> /dev/md0, slot 0. >>>> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. >>>> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2. >>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3. >>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4. >>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot -1. >>>> mdadm: no uptodate device for slot 1 of /dev/md0 >>>> mdadm: added /dev/sde1 to /dev/md0 as 2 >>>> mdadm: added /dev/sdf1 to /dev/md0 as 3 >>>> mdadm: added /dev/sdg1 to /dev/md0 as 4 >>>> mdadm: no uptodate device for slot 5 of /dev/md0 >>>> mdadm: added /dev/sdd1 to /dev/md0 as -1 >>>> mdadm: added /dev/sdh1 to /dev/md0 as -1 >>>> mdadm: added /dev/sdc1 to /dev/md0 as 0 >>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not enough to >>>> start the array. >>>> >>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both are spares. >>> Hi Mark, >>> please post the result from >>> >>> cat /sys/block/md0/md/sync_action >> >> There is none. There is no /dev/md0. mdadm refusees, saying that it's lost >> too many drives. >> >> ?????? mark >> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> https://lists.centos.org/mailman/listinfo/centos >> > > > I suppose that your config is 5 drive and 1 spare with 1 drive failed. > It's strange that your spare was not used for resync. > Then you added a new drive but it does not start because it marks the new disk > as spare and you have a raid5 with 4 devices and 2 spares. > > First I hope that you have a backup for all your data and don't run some > exotic command before backupping your data. If you can't backup your data, > it's a problem.This is at work. We have automated nightly backups, and I do offline backups of the backups every two weeks.> > Have you tried to remove the last added device sdi1 and restart the raid and > force to start a resync?The thing is, it had one? two? spares when /dev/sdb1 started dying, and it didn't use them.> > Have you tried to remove this 2 devices and re-add only the device that will > be usefull for resync?? Maybe you can set 5 devices for your raid and not 6, > if it works (after resync) you can add your spare device growing your raid set.I tried, and that's when I lost it (again), and it refuses to assemble/start the RAID "not enough devices".> > Reading on google many users use --zero-superblock before re-add the device.I can take one out, and re-add, but I think I'm going to have to recreate the RAID again, and again restore from backup.> > Other user reassemble the raid using --assume-clean but I don't know what > effect it will produces > > Hope that this helps.Thanks. mark