thr3ads.net - zfs discuss - [zfs-discuss] Some questions I had while testing ZFS. [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Jeffrey Scott

2007-Jan-23 23:05 UTC

[zfs-discuss] Some questions I had while testing ZFS.

I''m looking at bringing up a new Solaris 10 based file server running
off an older UltraSPARC-IIi 360MHz with 512mb ram.  I''ve brought up the
11/06 release from scratch no patches installed at this time.  I have 4
externally attached 36gb scsi devices off the hosts systems scsi bus.

After setting up a few different zpool scenarios with mirrors, raidz, raidz2 to
familiarize myself with the commands.  I created some home directory like
filesystems off the pool.

I''m trying to simulate a drive failure by either powering down a single
drive or physically removing it from it''s enclosure so as not to
interrupt the scsi bus, however each time I do this and then attempt to access
my zfs pool the system hangs and i get flooded with errors :
Jan 23 14:49:13 foo scsi: WARNING: /pci at 1f,0/pci at 1/scsi at 1,1/sd at a,0
(sd24):
Jan 23 14:49:13 foo       disk not responding to selection

eventually the system will freeze and i will have to go down to the eeprom level
and issue a boot command to restart the host.

Is this type of failure something it should be able to handle or am I doing
something wrong and my expectations are too high here?

Is this an issue with ZFS or more with the host system not being able to cope
with a device being removed in this fashion.

Also does anyone have an opinion based off the system I''m using whether
this would be sufficient to go into production with assuming the errors
i''m having can be addressed?  This system would simply be an NFS server
for home shares for approx 100 users.

Thanks,
-Jeff
 
 
This message posted from opensolaris.org

Andrea Soliva

2007-Jan-25 10:46 UTC

head link

[zfs-discuss] Re: Some questions I had while testing ZFS.

Hi

I did the absolutly same test and have the same issue. I also posted a message.
NO answer so far.

Andrea
 
 
This message posted from opensolaris.org

Anders Odberg

2007-Jan-25 12:39 UTC

head link

[zfs-discuss] Re: Some questions I had while testing ZFS.

[Jeffrey Scott]

|   I''m trying to simulate a drive failure by either powering down a
single
|   drive or physically removing it from it''s enclosure so as not to
|   interrupt the scsi bus, however each time I do this and then attempt to
|   access my zfs pool the system hangs and i get flooded with errors : 
|   Jan 23 14:49:13 foo scsi: WARNING: /pci at 1f,0/pci at 1/scsi at 1,1/sd at
a,0 (sd24):
|   Jan 23 14:49:13 foo       disk not responding to selection
|
|   eventually the system will freeze and i will have to go down to the
|   eeprom level and issue a boot command to restart the host. 
|
|   Is this type of failure something it should be able to handle or am I
|   doing something wrong and my expectations are too high here? 
|
|   Is this an issue with ZFS or more with the host system not being able
|   to cope with a device being removed in this fashion. 


I''ve seen similar problems with a T2000. If I create a mirror of the
internal SAS-disks with "zpool create mirror foo c0t2d0 c0t3d0" and
physically remove one of those disks, the system will hang completely after
a short while, and I have to break the system from ALOM and reboot. If I do
a "zpool offline" of the disk first, there are no problems when
removing
the disk.

If I create a DiskSuite mirror, or a HW-raid mirror, on the disks instead,
and then create a single-disk zpool on top of this mirror, there are no
problems with the system or ZFS when I physically remove one of the disks in
the mirror.

I opened a support-case with Sun about this, and after a while I received
a test-patch (IDR125057-01) which so far seems to have solved all my
problems with this issue. If you have a support-contract with Sun, you
could probably ask for this test-patch. I''ve not been told when it will
make it into an official patch.

Regards,
	-Anders.
-- 
  Anders Odberg, <anders.odberg at usit.uio.no>
  Center for Information Technology Services
  University of Oslo, Norway

Jeremy Teo

2007-Jan-25 12:58 UTC

head link

[zfs-discuss] Re: Some questions I had while testing ZFS.

This is 6456939:
sd_send_scsi_SYNCHRONIZE_CACHE_biodone() can issue TUR which calls
biowait()and deadlock/hangs host

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6456939

(Thanks to Tpenta for digging this up)

-- 
Regards,
Jeremy

Apparently Analagous Threads

Search for more seemingly similar threads

zfs discuss - Jan 2007 - Some questions I had while testing ZFS.

[zfs-discuss] Some questions I had while testing ZFS.

[zfs-discuss] Re: Some questions I had while testing ZFS.

[zfs-discuss] Re: Some questions I had while testing ZFS.

[zfs-discuss] Re: Some questions I had while testing ZFS.

Apparently Analagous Threads