thr3ads.net - CentOS - [CentOS] Problem with softwareraid [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Mr Typo

2017-Aug-18 19:35 UTC

[CentOS] Problem with softwareraid

Hello all,

i have already had a discussion on the software raid mailinglist and i
want to switch to this one :)

I am having a really strange problem with my md0 device running
centos7. after a new start of my server the md0 was gone. now after
trying to find the problem i detected the following:

Booting any installed kernel gives me NO md0 device. (ls /dev/md*
doesnt give anything). a 'cat /proc/partitions show me now
/dev/sd[a-d]1 partition. partprobe and a mdadm assemble gives me "disk
busy"

[root at quad live]# cat mdstat
Personalities : [raid6] [raid5] [raid4] [raid10]
unused devices: <none>

[root at quad ~]# partprobe
device-mapper: remove ioctl on WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1
failed: Device or resource busy
Warning: parted was unable to re-read the partition table on
/dev/mapper/WDC_WD20EFRX-68AX9N0_WD-WMC301255087 (Device or resource
busy).  This means Linux won't know anything about the modifications
you made.
....
....

[root at quad ~]# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1
/dev/sdc1 /dev/sdd1
mdadm: /dev/sda1 is busy - skipping
mdadm: /dev/sdb1 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping


booting from a usb stick for rescue my centos everything works. the
md0 device exists and is mounted. (rw).

[root at quad usb-rescue]# cat mount  | grep '/data'
/dev/mapper/data-store on /mnt/sysimage/store type xfs
(rw,noatime,seclabel,attr2,largeio,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=256,swidth=768,noquota)
/dev/mapper/data-tm on /mnt/sysimage/var/lib/vdr/video type xfs
(rw,noatime,seclabel,attr2,largeio,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=256,swidth=768,noquota)


3rd option: i am booting the installed rescue kernel from disk:
i am getting a md0 device, but its not started. when i stop the md0 i
cant assemble it anymore (disk busy)

/dev/md0:
        Version : 1.2
  Creation Time : Wed Aug 20 19:28:52 2014
     Raid Level : raid5
  Used Dev Size : 1953382272 (1862.89 GiB 2000.26 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Thu Aug 17 22:38:14 2017
          State : active, Not Started
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : quad.core.sartori.at:0  (local to host quad.core.sartori.at)
           UUID : 9d020f27:c0542472:b95a18d2:5741114d
         Events : 25458

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       4       8       49        3      active sync   /dev/sdd1


anyone got an idea, in which direction the problem could be? more logs
needed? please help, i have no ideas anymore.

regards
Andy

Gordon Messmer

2017-Aug-18 21:56 UTC

head link

[CentOS] Problem with softwareraid

On 08/18/2017 12:35 PM, Mr Typo wrote:> mdadm: /dev/sda1 is busy - skipping
> mdadm: /dev/sdb1 is busy - skipping
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: /dev/sdd1 is busy - skipping

That's plenty strange.  The output of "lsblk" might tell you why
those
devices are busy.

mad.scientist.at.large at tutanota.com

2017-Aug-19 00:47 UTC

head link

[CentOS] Problem with softwareraid

18. Aug 2017 13:35 by euroregistrar at gmail.com:

> Hello all,
>
> i have already had a discussion on the software raid mailinglist and i
> want to switch to this one :)
>
> I am having a really strange problem with my md0 device running
> centos7. after a new start of my server the md0 was gone. now after
> trying to find the problem i detected the following:
>
> Booting any installed kernel gives me NO md0 device. (ls /dev/md*
> doesnt give anything). a 'cat /proc/partitions show me now
> /dev/sd[a-d]1 partition. partprobe and a mdadm assemble gives me "disk
> busy"
>
> [root at quad live]# cat mdstat
> Personalities : [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
>
> [root at quad ~]# partprobe
> device-mapper: remove ioctl on WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1
> failed: Device or resource busy
>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>snip



Are you definately using cables rated for sata III?? Have you checked the power
connections?? Have you checked the power supply voltages durning spin up/later??





Is there tension or major twisting forces on the sata cables? ? I've seen
this cause intermittent problems and was solved by using a longer cable that
reduced the stress at the connector.





Are the drives getting hot (your' model shouldn't have a heat issue
under normal conditions).? Are the drives bolted into a system?? Drives can be
sensitive to vibration and identical, unmounted drives will tend to shake each
other and can produce rotational torque as well (especially when the same model
as they'll all have the same resonances in that case).? Either can cause
problems with keeping the heads over the track reliably.




I'd definately run all the smart test.? start with the conveyance test and
then the short self test, and possibly the long test.?? do check the drive
temperatures immediately after each test to make sure they aren't getting
too hot.





I assume you've done an fsck on the file systems?? If not it might be good
to check.




Are you using the mother boards sata interfaces or an add-on card?? If using a
card i'd check the firmware version on the card and what the manufacturer is
offering for updates.




Are the drives still under warranty?? If so try WD tech support.? Also check
that all the Raid tools are properly installed with their' dependencies
met.? could be other hardware/drivers interfering.? might reset the bios to
"optimized settings".? Which software raid package are you using?





Other than that I'd possibly suspect a software problem, not familiar with
software raids myself (haven't used on, know what they are).? Or possibly a
problem with the drive that is intermitant or complex in how it fails.

>
>

Mr Typo

2017-Aug-19 19:06 UTC

head link

[CentOS] Problem with softwareraid

Hello Gordon,

yeah. it is really strange. from one boot to the next, everyhing is
f** up.(2 months between).

any idea?

[root at quad live]#  lsblk
NAME                                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                          8:0    0   1.8T  0 disk
??sda1                                       8:1    0   1.8T  0 part
??WDC_WD20EFRX-68AX9N0_WD-WMC1T2547260     253:3    0   1.8T  0 mpath
  ??WDC_WD20EFRX-68AX9N0_WD-WMC1T2547260p1 253:8    0   1.8T  0 part
sdb                                          8:16   0   1.8T  0 disk
??sdb1                                       8:17   0   1.8T  0 part
??WDC_WD20EFRX-68AX9N0_WD-WMC301255087     253:4    0   1.8T  0 mpath
  ??WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1 253:9    0   1.8T  0 part
sdc                                          8:32   0   1.8T  0 disk
??sdc1                                       8:33   0   1.8T  0 part
??WDC_WD20EFRX-68EUZN0_WD-WCC4M2668622     253:5    0   1.8T  0 mpath
  ??WDC_WD20EFRX-68EUZN0_WD-WCC4M2668622p1 253:7    0   1.8T  0 part
sdd                                          8:48   0   1.8T  0 disk
??sdd1                                       8:49   0   1.8T  0 part
??WDC_WD20EFRX-68EUZN0_WD-WMC4M2878723     253:2    0   1.8T  0 mpath
  ??WDC_WD20EFRX-68EUZN0_WD-WMC4M2878723p1 253:6    0   1.8T  0 part
sde                                          8:64   0 119.2G  0 disk
??sde1                                       8:65   0   500M  0 part  /boot
??sde2                                       8:66   0 118.8G  0 part
  ??centos-swap                            253:0    0     2G  0 lvm   [SWAP]
  ??centos-root                            253:1    0    50G  0 lvm   /
  ??centos-home                            253:10   0  66.8G  0 lvm   /home

On Fri, Aug 18, 2017 at 11:56 PM, Gordon Messmer
<gordon.messmer at gmail.com> wrote:> On 08/18/2017 12:35 PM, Mr Typo wrote:
>>
>> mdadm: /dev/sda1 is busy - skipping
>> mdadm: /dev/sdb1 is busy - skipping
>> mdadm: /dev/sdc1 is busy - skipping
>> mdadm: /dev/sdd1 is busy - skipping
>
>
>
> That's plenty strange.  The output of "lsblk" might tell you
why those
> devices are busy.
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos

Mr Typo

2017-Aug-19 19:19 UTC

head link

[CentOS] Problem with softwareraid

> Are you definately using cables rated for sata III?  Have you checked the
power connections?  Have you checked the power supply voltages durning spin
up/later?
yeah. the setup is running for years now. as i said: booting from usb
stick -> everything works
>
>
>
>
>
> Is there tension or major twisting forces on the sata cables?   I've
seen this cause intermittent problems and was solved by using a longer cable
that reduced the stress at the connector.nope. check and works.
>
>
>
>
>
> Are the drives getting hot (your' model shouldn't have a heat issue
under normal conditions).  Are the drives bolted into a system?  Drives can be
sensitive to vibration and identical, unmounted drives will tend to shake each
other and can produce rotational torque as well (especially when the same model
as they'll all have the same resonances in that case).  Either can cause
problems with keeping the heads over the track reliably.nope. the issue arised first time after the box was down for several
hours. the box is in my cellar so in a good environment.
>
>
>
>
> I'd definately run all the smart test.  start with the conveyance test
and then the short self test, and possibly the long test.   do check the drive
temperatures immediately after each test to make sure they aren't getting
too hot.
output of the test after my reply.
>
>
>
>
>
> I assume you've done an fsck on the file systems?  If not it might be
good to check.
no i did not. i am running xfs. and the filesystem ist not corrupt. so
no repair needed. i can access the data when booting from usb.
>
>
>
>
> Are you using the mother boards sata interfaces or an add-on card?  If
using a card i'd check the firmware version on the card and what the
manufacturer is offering for updates.motherboard sata. hp microserver gen8
>
>
>
>
> Are the drives still under warranty?  If so try WD tech support.  Also
check that all the Raid tools are properly installed with their'
dependencies met.  could be other hardware/drivers interfering.  might reset the
bios to "optimized settings".  Which software raid package are you
using?
mdadm has nom dependencies. but i reinstalled the package. version
3.4-14.el7_3.1
>
>
>
>
>
> Other than that I'd possibly suspect a software problem, not familiar
with software raids myself (haven't used on, know what they are).  Or
possibly a problem with the drive that is intermitant or complex in how it
fails.
software problem sounds great. i would like to find out, why its not
working. i could reinstalled the complete box, but that is not my
intension. takes lots of time and i am not learning something new :)

regards
Andy
>
>
>>
>>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos

Possibly Parallel Threads

Search for more possibly parallel threads

CentOS - Aug 2017 - Problem with softwareraid

[CentOS] Problem with softwareraid

[CentOS] Problem with softwareraid

[CentOS] Problem with softwareraid

[CentOS] Problem with softwareraid

[CentOS] Problem with softwareraid

Possibly Parallel Threads