Hello all, i have already had a discussion on the software raid mailinglist and i want to switch to this one :) I am having a really strange problem with my md0 device running centos7. after a new start of my server the md0 was gone. now after trying to find the problem i detected the following: Booting any installed kernel gives me NO md0 device. (ls /dev/md* doesnt give anything). a 'cat /proc/partitions show me now /dev/sd[a-d]1 partition. partprobe and a mdadm assemble gives me "disk busy" [root at quad live]# cat mdstat Personalities : [raid6] [raid5] [raid4] [raid10] unused devices: <none> [root at quad ~]# partprobe device-mapper: remove ioctl on WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1 failed: Device or resource busy Warning: parted was unable to re-read the partition table on /dev/mapper/WDC_WD20EFRX-68AX9N0_WD-WMC301255087 (Device or resource busy). This means Linux won't know anything about the modifications you made. .... .... [root at quad ~]# mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sda1 is busy - skipping mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: /dev/sdd1 is busy - skipping booting from a usb stick for rescue my centos everything works. the md0 device exists and is mounted. (rw). [root at quad usb-rescue]# cat mount | grep '/data' /dev/mapper/data-store on /mnt/sysimage/store type xfs (rw,noatime,seclabel,attr2,largeio,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=256,swidth=768,noquota) /dev/mapper/data-tm on /mnt/sysimage/var/lib/vdr/video type xfs (rw,noatime,seclabel,attr2,largeio,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=256,swidth=768,noquota) 3rd option: i am booting the installed rescue kernel from disk: i am getting a md0 device, but its not started. when i stop the md0 i cant assemble it anymore (disk busy) /dev/md0: Version : 1.2 Creation Time : Wed Aug 20 19:28:52 2014 Raid Level : raid5 Used Dev Size : 1953382272 (1862.89 GiB 2000.26 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Thu Aug 17 22:38:14 2017 State : active, Not Started Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K Name : quad.core.sartori.at:0 (local to host quad.core.sartori.at) UUID : 9d020f27:c0542472:b95a18d2:5741114d Events : 25458 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 4 8 49 3 active sync /dev/sdd1 anyone got an idea, in which direction the problem could be? more logs needed? please help, i have no ideas anymore. regards Andy
On 08/18/2017 12:35 PM, Mr Typo wrote:> mdadm: /dev/sda1 is busy - skipping > mdadm: /dev/sdb1 is busy - skipping > mdadm: /dev/sdc1 is busy - skipping > mdadm: /dev/sdd1 is busy - skippingThat's plenty strange. The output of "lsblk" might tell you why those devices are busy.
mad.scientist.at.large at tutanota.com
2017-Aug-19 00:47 UTC
[CentOS] Problem with softwareraid
18. Aug 2017 13:35 by euroregistrar at gmail.com:> Hello all, > > i have already had a discussion on the software raid mailinglist and i > want to switch to this one :) > > I am having a really strange problem with my md0 device running > centos7. after a new start of my server the md0 was gone. now after > trying to find the problem i detected the following: > > Booting any installed kernel gives me NO md0 device. (ls /dev/md* > doesnt give anything). a 'cat /proc/partitions show me now > /dev/sd[a-d]1 partition. partprobe and a mdadm assemble gives me "disk > busy" > > [root at quad live]# cat mdstat > Personalities : [raid6] [raid5] [raid4] [raid10] > unused devices: <none> > > [root at quad ~]# partprobe > device-mapper: remove ioctl on WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1 > failed: Device or resource busy > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>snipAre you definately using cables rated for sata III?? Have you checked the power connections?? Have you checked the power supply voltages durning spin up/later?? Is there tension or major twisting forces on the sata cables? ? I've seen this cause intermittent problems and was solved by using a longer cable that reduced the stress at the connector. Are the drives getting hot (your' model shouldn't have a heat issue under normal conditions).? Are the drives bolted into a system?? Drives can be sensitive to vibration and identical, unmounted drives will tend to shake each other and can produce rotational torque as well (especially when the same model as they'll all have the same resonances in that case).? Either can cause problems with keeping the heads over the track reliably. I'd definately run all the smart test.? start with the conveyance test and then the short self test, and possibly the long test.?? do check the drive temperatures immediately after each test to make sure they aren't getting too hot. I assume you've done an fsck on the file systems?? If not it might be good to check. Are you using the mother boards sata interfaces or an add-on card?? If using a card i'd check the firmware version on the card and what the manufacturer is offering for updates. Are the drives still under warranty?? If so try WD tech support.? Also check that all the Raid tools are properly installed with their' dependencies met.? could be other hardware/drivers interfering.? might reset the bios to "optimized settings".? Which software raid package are you using? Other than that I'd possibly suspect a software problem, not familiar with software raids myself (haven't used on, know what they are).? Or possibly a problem with the drive that is intermitant or complex in how it fails.> >
Hello Gordon, yeah. it is really strange. from one boot to the next, everyhing is f** up.(2 months between). any idea? [root at quad live]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ??sda1 8:1 0 1.8T 0 part ??WDC_WD20EFRX-68AX9N0_WD-WMC1T2547260 253:3 0 1.8T 0 mpath ??WDC_WD20EFRX-68AX9N0_WD-WMC1T2547260p1 253:8 0 1.8T 0 part sdb 8:16 0 1.8T 0 disk ??sdb1 8:17 0 1.8T 0 part ??WDC_WD20EFRX-68AX9N0_WD-WMC301255087 253:4 0 1.8T 0 mpath ??WDC_WD20EFRX-68AX9N0_WD-WMC301255087p1 253:9 0 1.8T 0 part sdc 8:32 0 1.8T 0 disk ??sdc1 8:33 0 1.8T 0 part ??WDC_WD20EFRX-68EUZN0_WD-WCC4M2668622 253:5 0 1.8T 0 mpath ??WDC_WD20EFRX-68EUZN0_WD-WCC4M2668622p1 253:7 0 1.8T 0 part sdd 8:48 0 1.8T 0 disk ??sdd1 8:49 0 1.8T 0 part ??WDC_WD20EFRX-68EUZN0_WD-WMC4M2878723 253:2 0 1.8T 0 mpath ??WDC_WD20EFRX-68EUZN0_WD-WMC4M2878723p1 253:6 0 1.8T 0 part sde 8:64 0 119.2G 0 disk ??sde1 8:65 0 500M 0 part /boot ??sde2 8:66 0 118.8G 0 part ??centos-swap 253:0 0 2G 0 lvm [SWAP] ??centos-root 253:1 0 50G 0 lvm / ??centos-home 253:10 0 66.8G 0 lvm /home On Fri, Aug 18, 2017 at 11:56 PM, Gordon Messmer <gordon.messmer at gmail.com> wrote:> On 08/18/2017 12:35 PM, Mr Typo wrote: >> >> mdadm: /dev/sda1 is busy - skipping >> mdadm: /dev/sdb1 is busy - skipping >> mdadm: /dev/sdc1 is busy - skipping >> mdadm: /dev/sdd1 is busy - skipping > > > > That's plenty strange. The output of "lsblk" might tell you why those > devices are busy. > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos
> Are you definately using cables rated for sata III? Have you checked the power connections? Have you checked the power supply voltages durning spin up/later?yeah. the setup is running for years now. as i said: booting from usb stick -> everything works> > > > > > Is there tension or major twisting forces on the sata cables? I've seen this cause intermittent problems and was solved by using a longer cable that reduced the stress at the connector.nope. check and works.> > > > > > Are the drives getting hot (your' model shouldn't have a heat issue under normal conditions). Are the drives bolted into a system? Drives can be sensitive to vibration and identical, unmounted drives will tend to shake each other and can produce rotational torque as well (especially when the same model as they'll all have the same resonances in that case). Either can cause problems with keeping the heads over the track reliably.nope. the issue arised first time after the box was down for several hours. the box is in my cellar so in a good environment.> > > > > I'd definately run all the smart test. start with the conveyance test and then the short self test, and possibly the long test. do check the drive temperatures immediately after each test to make sure they aren't getting too hot.output of the test after my reply.> > > > > > I assume you've done an fsck on the file systems? If not it might be good to check.no i did not. i am running xfs. and the filesystem ist not corrupt. so no repair needed. i can access the data when booting from usb.> > > > > Are you using the mother boards sata interfaces or an add-on card? If using a card i'd check the firmware version on the card and what the manufacturer is offering for updates.motherboard sata. hp microserver gen8> > > > > Are the drives still under warranty? If so try WD tech support. Also check that all the Raid tools are properly installed with their' dependencies met. could be other hardware/drivers interfering. might reset the bios to "optimized settings". Which software raid package are you using?mdadm has nom dependencies. but i reinstalled the package. version 3.4-14.el7_3.1> > > > > > Other than that I'd possibly suspect a software problem, not familiar with software raids myself (haven't used on, know what they are). Or possibly a problem with the drive that is intermitant or complex in how it fails.software problem sounds great. i would like to find out, why its not working. i could reinstalled the complete box, but that is not my intension. takes lots of time and i am not learning something new :) regards Andy> > >> >> > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos