Keith Keller
2013-Feb-11 04:39 UTC
[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy
Hello all, I have run into a sticky problem with a failed device in an md array, and I asked about it on the linux raid mailing list, but since the problem may not be md-specific, I am hoping to find some insight here. (If you are on the MD list, and are seeing this twice, I humbly apologize.) The summary is that during a reshape of a raid6 on an up to date CentOS 6.3 box, one disk failed, and was marked as such in the array, but is not allowing me to remove it: # mdadm /dev/md127 --fail /dev/sdg mdadm: set /dev/sdg faulty in /dev/md127 # mdadm /dev/md127 --remove /dev/sdg mdadm: hot remove failed for /dev/sdg: Device or resource busy And in dmesg, I get an error like so: md: cannot remove active disk sdg from md127 ... More details, including mdadm -D output and other diagnostics, are at http://www.spinics.net/lists/raid/msg41928.html . As I note there, the array seems fine otherwise, but is not currently in active use (so perhaps my options are greater than if I wished to keep it deployed). As the other messages in that thread show, I think I've already done the ''obvious'' steps to try to remove the device from the array. Checking things out further, I found that it may be that udev did not completely remove the disk, even though the controller no longer believes that the exported unit exists. (udevadm output is here: http://www.spinics.net/lists/raid/msg41950.html ) So my hypothesis is that if I can somehow force udev to drop the references to the disk completely, perhaps I can remove sdg from the array and start a rebuild with the spare already available. I found these docs for Fedora: https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/removing_devices.html But of course I can't do step 3, since md is refusing to give up sdg. But sdg is already gone, so I really don't care about outstanding IO, and it's a bit too late to worry about a 100% clean removal. So my questions are, will step 7 actually clean up references to sdg, and how likely is it that doing so would let me remove it from the array? And finally, if the above is not a wise way to go, are there better things to try? If other diagnostic output is desired please let me know. Thanks! --keith -- kkeller at wombat.san-francisco.ca.us
Vincent Li
2013-Feb-11 05:13 UTC
[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy
Hi Keith, It seems that the mdadm -D indicates the root cause of "device busy": >5 8 96 5 faulty spare rebuilding /dev/sdg Is there any clue in /proc/mdstat and /var/log/messages? On 02/11/2013 12:39 PM, Keith Keller wrote:> Hello all, > > I have run into a sticky problem with a failed device in an md array, > and I asked about it on the linux raid mailing list, but since the > problem may not be md-specific, I am hoping to find some insight here. > (If you are on the MD list, and are seeing this twice, I humbly > apologize.) > > The summary is that during a reshape of a raid6 on an up to date CentOS > 6.3 box, one disk failed, and was marked as such in the array, but is > not allowing me to remove it: > > # mdadm /dev/md127 --fail /dev/sdg > mdadm: set /dev/sdg faulty in /dev/md127 > # mdadm /dev/md127 --remove /dev/sdg > mdadm: hot remove failed for /dev/sdg: Device or resource busy > > And in dmesg, I get an error like so: > > md: cannot remove active disk sdg from md127 ... > > More details, including mdadm -D output and other diagnostics, are at > http://www.spinics.net/lists/raid/msg41928.html . As I note there, the > array seems fine otherwise, but is not currently in active use (so > perhaps my options are greater than if I wished to keep it deployed). > As the other messages in that thread show, I think I've already done > the ''obvious'' steps to try to remove the device from the array. > > Checking things out further, I found that it may be that udev did not > completely remove the disk, even though the controller no longer > believes that the exported unit exists. (udevadm output is here: > http://www.spinics.net/lists/raid/msg41950.html ) So my hypothesis is > that if I can somehow force udev to drop the references to the disk > completely, perhaps I can remove sdg from the array and start a rebuild > with the spare already available. I found these docs for Fedora: > > https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/removing_devices.html > > But of course I can't do step 3, since md is refusing to give up sdg. > But sdg is already gone, so I really don't care about outstanding IO, > and it's a bit too late to worry about a 100% clean removal. So my > questions are, will step 7 actually clean up references to sdg, and how > likely is it that doing so would let me remove it from the array? > > And finally, if the above is not a wise way to go, are there better > things to try? If other diagnostic output is desired please let me > know. Thanks! > > --keith >-- Vincent Li
Apparently Analagous Threads
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- CentOS 6.0 CR mdadm-3.2.2 breaks Intel BIOS RAID
- Question about RAID 5 array rebuild with mdadm
- CentOS 7: software RAID 5 array with 4 disks and no spares?