IHAC that is asking the following. any thoughts would be appreciated Take two drives, zpool to make a mirror. Remove a drive - and the server HANGS. Power off and reboot the server, and everything comes up cleanly. Take the same two drives (still Solaris 10). Install Veritas Volume Manager (4.1). Mirror the two drives. Remove a drive - everything is still running. Replace the drive, everything still working. No outage. So the big questions to Tech support: 1. Is this a "known property" of ZFS ? That when a drive from a hot swap system is removed the server hangs ? (We were attempting to simulate a drive failure) 2. Or is this just because it was an E450 ? Ie, would removing a zfs mirror disk (unexpected hardware removal as opposed to using zfs to remove the disk) on a V240 or V480 cause the same problem ? 3. What could we expect if a drive "mysteriously failed" during operation of a server with a zfs mirror ? Would the server hang like it did during testing ? How can we test this ? 4. If it is a "known property" of zfs, is there a date when it is expected to be fixed (if ever) ? Peter PS: I may not be on this alias so please respond to me directly -- ============================================================================ ______ /_____/\ /_____\\ \ Peter Wilk - OS/Security Support /_____\ \\ / Sun Microsystems /_____/ \/ / / 1 Network Drive, P.O Box 4004 /_____/ / \//\ Burlington, Massachusetts 01803-0904 \_____\//\ / / 1-800-USA-4SUN, opt 1, opt 1,<case number># \_____/ / /\ / Email: peter.wilk at sun.com \_____/ \\ \ ==================================== \_____\ \\ \_____\/ =============================================================================
The current behavior depends on the implementation of the driver and support for hotplug events. When a drive is yanked, one of two things can happen: - I/Os will fail, and any attempt to re-open the device will result in failure. - I/Os will fail, but the device can continued to be opened by its existing path. ZFS currently handles case #1 and will mark the device faulted, generating an FMA fault in the process. Future ZFS/FMA integration will address problem #2, and is on the short list of features to address. In the meantime, you can ''zpool offline'' the bad device to prevent ZFS from trying to access it. That being said, the server should never hang - only proceed arbitrarily slowly. When you say ''hang'', what does that mean? - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Peter, Are you sure your customer is not hitting this: 6456939 sd_send_scsi_SYNCHRONIZE_CACHE_biodone() can issue TUR which calls biowait()and deadlock/hangs host I have a fix that you could have your customer try. Thanks, George Peter Wilk wrote:> IHAC that is asking the following. any thoughts would be appreciated > > Take two drives, zpool to make a mirror. > Remove a drive - and the server HANGS. Power off and reboot the server, > and everything comes up cleanly. > > Take the same two drives (still Solaris 10). Install Veritas Volume > Manager (4.1). Mirror the two drives. Remove a drive - everything is > still running. Replace the drive, everything still working. No outage. > > So the big questions to Tech support: > 1. Is this a "known property" of ZFS ? That when a drive from a hot swap > system is removed the server hangs ? (We were attempting to simulate a > drive failure) > 2. Or is this just because it was an E450 ? Ie, would removing a zfs > mirror disk (unexpected hardware removal as opposed to using zfs to > remove the disk) on a V240 or V480 cause the same problem ? > 3. What could we expect if a drive "mysteriously failed" during > operation of a server with a zfs mirror ? Would the server hang like it > did during testing ? How can we test this ? > 4. If it is a "known property" of zfs, is there a date when it is > expected to be fixed (if ever) ? > > > > Peter > > PS: I may not be on this alias so please respond to me directly