thr3ads.net - CentOS - [CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Keith Keller

2013-Feb-11 04:39 UTC

[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy

Hello all,

I have run into a sticky problem with a failed device in an md array,
and I asked about it on the linux raid mailing list, but since the
problem may not be md-specific, I am hoping to find some insight here.
(If you are on the MD list, and are seeing this twice, I humbly
apologize.)

The summary is that during a reshape of a raid6 on an up to date CentOS
6.3 box, one disk failed, and was marked as such in the array, but is
not allowing me to remove it:

# mdadm /dev/md127 --fail /dev/sdg
mdadm: set /dev/sdg faulty in /dev/md127
# mdadm /dev/md127 --remove /dev/sdg
mdadm: hot remove failed for /dev/sdg: Device or resource busy

And in dmesg, I get an error like so:

md: cannot remove active disk sdg from md127 ...

More details, including mdadm -D output and other diagnostics, are at 
http://www.spinics.net/lists/raid/msg41928.html .  As I note there, the
array seems fine otherwise, but is not currently in active use (so
perhaps my options are greater than if I wished to keep it deployed).
As the other messages in that thread show, I think I've already done
the ''obvious'' steps to try to remove the device from the
array.

Checking things out further, I found that it may be that udev did not
completely remove the disk, even though the controller no longer
believes that the exported unit exists.  (udevadm output is here:
http://www.spinics.net/lists/raid/msg41950.html )  So my hypothesis is
that if I can somehow force udev to drop the references to the disk
completely, perhaps I can remove sdg from the array and start a rebuild
with the spare already available.  I found these docs for Fedora:

https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/removing_devices.html

But of course I can't do step 3, since md is refusing to give up sdg.
But sdg is already gone, so I really don't care about outstanding IO,
and it's a bit too late to worry about a 100% clean removal.  So my
questions are, will step 7 actually clean up references to sdg, and how
likely is it that doing so would let me remove it from the array?

And finally, if the above is not a wise way to go, are there better
things to try?  If other diagnostic output is desired please let me
know.  Thanks!

--keith

-- 
kkeller at wombat.san-francisco.ca.us

Vincent Li

2013-Feb-11 05:13 UTC

head link

[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy

Hi Keith,

It seems that the mdadm -D indicates the root cause of "device busy":

 >5 8 96 5 faulty spare rebuilding /dev/sdg

Is there any clue in /proc/mdstat and /var/log/messages?

On 02/11/2013 12:39 PM, Keith Keller wrote:> Hello all,
>
> I have run into a sticky problem with a failed device in an md array,
> and I asked about it on the linux raid mailing list, but since the
> problem may not be md-specific, I am hoping to find some insight here.
> (If you are on the MD list, and are seeing this twice, I humbly
> apologize.)
>
> The summary is that during a reshape of a raid6 on an up to date CentOS
> 6.3 box, one disk failed, and was marked as such in the array, but is
> not allowing me to remove it:
>
> # mdadm /dev/md127 --fail /dev/sdg
> mdadm: set /dev/sdg faulty in /dev/md127
> # mdadm /dev/md127 --remove /dev/sdg
> mdadm: hot remove failed for /dev/sdg: Device or resource busy
>
> And in dmesg, I get an error like so:
>
> md: cannot remove active disk sdg from md127 ...
>
> More details, including mdadm -D output and other diagnostics, are at
> http://www.spinics.net/lists/raid/msg41928.html .  As I note there, the
> array seems fine otherwise, but is not currently in active use (so
> perhaps my options are greater than if I wished to keep it deployed).
> As the other messages in that thread show, I think I've already done
> the ''obvious'' steps to try to remove the device from the
array.
>
> Checking things out further, I found that it may be that udev did not
> completely remove the disk, even though the controller no longer
> believes that the exported unit exists.  (udevadm output is here:
> http://www.spinics.net/lists/raid/msg41950.html )  So my hypothesis is
> that if I can somehow force udev to drop the references to the disk
> completely, perhaps I can remove sdg from the array and start a rebuild
> with the spare already available.  I found these docs for Fedora:
>
>
https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/removing_devices.html
>
> But of course I can't do step 3, since md is refusing to give up sdg.
> But sdg is already gone, so I really don't care about outstanding IO,
> and it's a bit too late to worry about a 100% clean removal.  So my
> questions are, will step 7 actually clean up references to sdg, and how
> likely is it that doing so would let me remove it from the array?
>
> And finally, if the above is not a wise way to go, are there better
> things to try?  If other diagnostic output is desired please let me
> know.  Thanks!
>
> --keith
>
-- 

Vincent Li

Seemingly Similar Threads

Search for more possibly parallel threads

CentOS - Feb 2013 - mdadm: hot remove failed for /dev/sdg: Device or resource busy

[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy

[CentOS] mdadm: hot remove failed for /dev/sdg: Device or resource busy

Seemingly Similar Threads