thr3ads.net - zfs discuss - [zfs-discuss] ZFS related kernel panic [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Douglas Denny

2006-Dec-04 14:47 UTC

[zfs-discuss] ZFS related kernel panic

Last Friday, one of our V880s kernel panicked with the following
message.This is a SAN connected ZFS pool attached to one LUN. From
this, it appears that the SAN ''disappeared'' and then there was
a panic
shortly after.

Am I reading this correctly?

Is this normal behavior for ZFS?

This is a mostly patched Solaris 10 6/06 install. Before patching this
system we did have a couple of NFS related panics, always on Fridays!
This is the fourth panic, first time with a ZFS error. There are no
errors in zpool status.

Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     SCSI transport failed: reason
''incomplete'':
retrying command
Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     SCSI transport failed: reason
''incomplete'':
retrying command
Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     disk not responding to selection
Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     disk not responding to selection
Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     disk not responding to selection
Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:21 foobar     disk not responding to selection
Dec  1 20:30:22 foobar scsi: [ID 107833 kern.warning] WARNING:
/pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
Dec  1 20:30:22 foobar     disk not responding to selection
Dec  1 20:30:22 foobar unix: [ID 836849 kern.notice]
Dec  1 20:30:22 foobar ^Mpanic[cpu2]/thread=2a100aedcc0:
Dec  1 20:30:22 foobar unix: [ID 809409 kern.notice] ZFS: I/O failure
(write on <unknown> off 0: zio 3004c0ce540 [L0 unallocated]
20000L/20000P DVA
[0]=<0:2ae1900000:20000> fletcher2 uncompressed BE contiguous
birth=586818 fill=0
cksum=102297a2db39dfc:cc8e38087da7a38f:239520856ececf15:c2fd36
9cea9db4a1): error 5
Dec  1 20:30:22 foobar unix: [ID 100000 kern.notice]
Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
000002a100aed740 zfs:zio_done+284 (3004c0ce540, 0, a8, 70513bf0, 0,
60001374940)
Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
000003006319fc80 0000000070513800 0000000000000005 0000000000000005
Dec  1 20:30:22 foobar   %l4-7: 000000007b224278 0000000000000002
000000000008f442 0000000000000005
Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
000002a100aed940 zfs:zio_vdev_io_assess+178 (3004c0ce540, 8000, 10, 0,
0, 10)
Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000002 0000000000000001 0000000000000000 0000000000000005
Dec  1 20:30:22 foobar   %l4-7: 0000000000000010 0000000035a536bc
0000000000000000 00043d7293172cfc
Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
000002a100aeda00 genunix:taskq_thread+1a4 (600012a0c38, 600012a0be0,
50001, 43d72c8bfb810,
2a100aedaca, 2a100aedac8)
Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
0000000000010000 00000600012a0c08 00000600012a0c10 00000600012a0c12
Dec  1 20:30:22 foobar   %l4-7: 0000030060946320 0000000000000002
0000000000000000 00000600012a0c00
Dec  1 20:30:22 foobar unix: [ID 100000 kern.notice]
Dec  1 20:30:22 foobar genunix: [ID 672855 kern.notice] syncing file systems...

James C. McPherson

2006-Dec-04 15:40 UTC

head link

[zfs-discuss] ZFS related kernel panic

Douglas Denny wrote:
 > Last Friday, one of our V880s kernel panicked with the following
 > message.This is a SAN connected ZFS pool attached to one LUN. From
 > this, it appears that the SAN ''disappeared'' and then
there was a panic
 > shortly after.
 >
 > Am I reading this correctly?

Yes.

 > Is this normal behavior for ZFS?

Yes. You have no redundancy (from ZFS'' point of view at least),
so ZFS has no option except panicing in order to maintain the
integrity of your data.

 > This is a mostly patched Solaris 10 6/06 install. Before patching this
 > system we did have a couple of NFS related panics, always on Fridays!
 > This is the fourth panic, first time with a ZFS error. There are no
 > errors in zpool status.

Without data, it is difficult to suggest what might have caused
your NFS panics.


James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Douglas Denny

2006-Dec-04 16:03 UTC

head link

[zfs-discuss] ZFS related kernel panic

On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:>  > Is this normal behavior for ZFS?
>
> Yes. You have no redundancy (from ZFS'' point of view at least),
> so ZFS has no option except panicing in order to maintain the
> integrity of your data.
This is interesting from a implementation point of view. Any singly
attached SAN connection that has a disconnect from its switch/backend
will cause the ZFS to panic, why would it not wait and see if the
device came back? Should all SAN connected ZFS pools have redundancy
built in with dual HBAs to dual SAN switches/controllers?

-Doug

James C. McPherson

2006-Dec-04 16:10 UTC

head link

[zfs-discuss] ZFS related kernel panic

Douglas Denny wrote:> On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:
>>  > Is this normal behavior for ZFS?
>>
>> Yes. You have no redundancy (from ZFS'' point of view at
least),
>> so ZFS has no option except panicing in order to maintain the
>> integrity of your data.
> 
> This is interesting from a implementation point of view. Any singly
> attached SAN connection that has a disconnect from its switch/backend
> will cause the ZFS to panic, why would it not wait and see if the
> device came back? Should all SAN connected ZFS pools have redundancy
> built in with dual HBAs to dual SAN switches/controllers?
If you look into your /var/adm/messages file, you should see
more than a few seconds'' worth of IO retries, indicating that
there was a delay before panicing while waiting for the device
to return.

Answering your second question, all ZFS pools should be configured
with redundancy from ZFS'' point of view.


James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Douglas Denny

2006-Dec-04 16:28 UTC

head link

[zfs-discuss] ZFS related kernel panic

On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:> If you look into your /var/adm/messages file, you should see
> more than a few seconds'' worth of IO retries, indicating that
> there was a delay before panicing while waiting for the device
> to return.
My original post contains all the warnings. The first error happened
at 20:30:21 and the system paniced at 20:30:21. It makes me wonder if
there is something else going here.
> Answering your second question, all ZFS pools should be configured
> with redundancy from ZFS'' point of view.
I am sure this is the right answer, but it is not obvious to me how I
would do this like I do this with UFS file systems using the SAN as
the redundant file backing. Thanks for the feedback.

-Doug

James C. McPherson

2006-Dec-04 16:39 UTC

head link

[zfs-discuss] ZFS related kernel panic

Douglas Denny wrote:> On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:
>> If you look into your /var/adm/messages file, you should see
>> more than a few seconds'' worth of IO retries, indicating that
>> there was a delay before panicing while waiting for the device
>> to return.
> 
> My original post contains all the warnings. The first error happened
> at 20:30:21 and the system paniced at 20:30:21. It makes me wonder if
> there is something else going here.
That''s surprising. My experience of non-redundant pools (root pools
no less :>) is that there would be several minutes of retries, when
all the sd and lower layers'' retries were added up.
>> Answering your second question, all ZFS pools should be configured
>> with redundancy from ZFS'' point of view.
> 
> I am sure this is the right answer, but it is not obvious to me how I
> would do this like I do this with UFS file systems using the SAN as
> the redundant file backing. Thanks for the feedback.
create 2 luns on your san.
zone them so your host can see them
zpool create poolname mirror vdev1 vdev2
zfs create poolname/fsname


For my ultra20, I have / + /usr + /var and some of /opt mirrored using
svm, and then I have an uber-pool to contain everything else:


$ zdb -C
sink
     version=3
     name=''sink''
     state=0
     txg=4
     pool_guid=6548940762722570489
     vdev_tree
         type=''root''
         id=0
         guid=6548940762722570489
         children[0]
                 type=''mirror''
                 id=0
                 guid=5106440632267737007
                 metaslab_array=13
                 metaslab_shift=31
                 ashift=9
                 asize=307077840896
                 children[0]
                         type=''disk''
                         id=0
                         guid=9432259574297221550
                         path=''/dev/dsk/c1d0s3''
                         devid=''id1,cmdk at
AST3320620AS=____________4QF01RZE/d''
                         whole_disk=0
                 children[1]
                         type=''disk''
                         id=1
                         guid=7176220706626775710
                         path=''/dev/dsk/c2d0s3''
                         devid=''id1,cmdk at
AST3320620AS=____________3QF0EAFP/d''
                         whole_disk=0


whch I created by first slicing the disks, then running

  # zpool create sink mirror c1d0s3 c2d0s3


Under the "sink" zpool, I have a few zfs:


$ zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
sink                     119G   161G  24.5K  /sink
sink/hole                574M   161G   574M  /opt/csw
sink/home               2.22G   161G  2.22G  /export/home
sink/scratch            96.1G   161G  96.1G  /scratch
sink/src                6.66G   161G  6.66G  /opt/gate
sink/swim                555M   161G   555M  /opt/local
sink/zones              12.9G   161G  27.5K  /zones
sink/zones/kitchensink  10.6G   161G  10.6G  /zones/kitchensink


which I created with

# zfs create sink/hole
# zfs create sink/home

etc etc.


James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
               http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

Craig Morgan

2006-Dec-04 18:50 UTC

head link

[zfs-discuss] ZFS related kernel panic

If you take a look at these messages the somewhat unusual condition  
that may lead to unexpected behaviour (ie. fast giveup) is that  
whilst this is a SAN connection it is achieved through a non- 
Leadville config, note the fibre-channel and sd references. In a  
Leadville compliant installation this would be the ssd driver, hence  
you''d have to investigate the specific semantics and driver tweaks  
that this system has applied to sd in this instance.

Maybe the sd retries have been `tuned` down ... ??

More info ... ie. an explorer would be useful ... before we jump to  
any incorrect conclusions.

Craig

On 4 Dec 2006, at 14:47, Douglas Denny wrote:
> Last Friday, one of our V880s kernel panicked with the following
> message.This is a SAN connected ZFS pool attached to one LUN. From
> this, it appears that the SAN ''disappeared'' and then
there was a panic
> shortly after.
>
> Am I reading this correctly?
>
> Is this normal behavior for ZFS?
>
> This is a mostly patched Solaris 10 6/06 install. Before patching this
> system we did have a couple of NFS related panics, always on Fridays!
> This is the fourth panic, first time with a ZFS error. There are no
> errors in zpool status.
>
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     SCSI transport failed: reason
''incomplete'':
> retrying command
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     SCSI transport failed: reason
''incomplete'':
> retrying command
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     disk not responding to selection
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     disk not responding to selection
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     disk not responding to selection
> Dec  1 20:30:21 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:21 foobar     disk not responding to selection
> Dec  1 20:30:22 foobar scsi: [ID 107833 kern.warning] WARNING:
> /pci at 9,600000/fibre-channel at 1/sd at 1,1 (sd17):
> Dec  1 20:30:22 foobar     disk not responding to selection
> Dec  1 20:30:22 foobar unix: [ID 836849 kern.notice]
> Dec  1 20:30:22 foobar ^Mpanic[cpu2]/thread=2a100aedcc0:
> Dec  1 20:30:22 foobar unix: [ID 809409 kern.notice] ZFS: I/O failure
> (write on <unknown> off 0: zio 3004c0ce540 [L0 unallocated]
> 20000L/20000P DVA
> [0]=<0:2ae1900000:20000> fletcher2 uncompressed BE contiguous
> birth=586818 fill=0
> cksum=102297a2db39dfc:cc8e38087da7a38f:239520856ececf15:c2fd36
> 9cea9db4a1): error 5
> Dec  1 20:30:22 foobar unix: [ID 100000 kern.notice]
> Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
> 000002a100aed740 zfs:zio_done+284 (3004c0ce540, 0, a8, 70513bf0, 0,
> 60001374940)
> Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
> 000003006319fc80 0000000070513800 0000000000000005 0000000000000005
> Dec  1 20:30:22 foobar   %l4-7: 000000007b224278 0000000000000002
> 000000000008f442 0000000000000005
> Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
> 000002a100aed940 zfs:zio_vdev_io_assess+178 (3004c0ce540, 8000, 10, 0,
> 0, 10)
> Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
> 0000000000000002 0000000000000001 0000000000000000 0000000000000005
> Dec  1 20:30:22 foobar   %l4-7: 0000000000000010 0000000035a536bc
> 0000000000000000 00043d7293172cfc
> Dec  1 20:30:22 foobar genunix: [ID 723222 kern.notice]
> 000002a100aeda00 genunix:taskq_thread+1a4 (600012a0c38, 600012a0be0,
> 50001, 43d72c8bfb810,
> 2a100aedaca, 2a100aedac8)
> Dec  1 20:30:22 foobar genunix: [ID 179002 kern.notice]   %l0-3:
> 0000000000010000 00000600012a0c08 00000600012a0c10 00000600012a0c12
> Dec  1 20:30:22 foobar   %l4-7: 0000030060946320 0000000000000002
> 0000000000000000 00000600012a0c00
> Dec  1 20:30:22 foobar unix: [ID 100000 kern.notice]
> Dec  1 20:30:22 foobar genunix: [ID 672855 kern.notice] syncing  
> file systems...
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2693 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061204/26d9a8d5/attachment.bin>

Richard Elling

2006-Dec-04 18:50 UTC

head link

[zfs-discuss] ZFS related kernel panic

Douglas Denny wrote:> On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:
>>  > Is this normal behavior for ZFS?
>>
>> Yes. You have no redundancy (from ZFS'' point of view at
least),
>> so ZFS has no option except panicing in order to maintain the
>> integrity of your data.
> 
> This is interesting from a implementation point of view. Any singly
> attached SAN connection that has a disconnect from its switch/backend
> will cause the ZFS to panic, why would it not wait and see if the
> device came back? Should all SAN connected ZFS pools have redundancy
> built in with dual HBAs to dual SAN switches/controllers?
UFS will panic on EIO also.  Most other file systems, too.
You can put UFS on top of SVM, but unless SVM is configured for
redundancy, it (UFS) would still panic in such situations.  ZFS
doesn''t bring anything new here, but I sense a change in expectations
that I can''t quite reconcile.
  -- richard

Jason J. W. Williams

2006-Dec-04 19:01 UTC

head link

[zfs-discuss] ZFS related kernel panic

Hi all,

Having experienced this, it would be nice if there was an option to
offline the filesystem instead of kernel panicking on a per-zpool
basis. If its a system-critical partition like a database I''d prefer
it to kernel-panick and thereby trigger a fail-over of the
application. However, if its a zpool hosting some fileshares I''d
prefer it to stay online. Putting that level of control in would
alleviate a lot of the complaints it seems to me...or at least give
less of a leg to stand on. ;-)

A nasty little notice that tells you the system will kernel panick if
a vdev becomes unavailable, wouldn''t be bad either when you''re
creating a striped zpool. Even the best of us forgets these things.

Best Regards,
Jason

On 12/4/06, Richard Elling <Richard.Elling at sun.com>
wrote:> Douglas Denny wrote:
> > On 12/4/06, James C. McPherson <James.C.McPherson at gmail.com>
wrote:
> >>  > Is this normal behavior for ZFS?
> >>
> >> Yes. You have no redundancy (from ZFS'' point of view at
least),
> >> so ZFS has no option except panicing in order to maintain the
> >> integrity of your data.
> >
> > This is interesting from a implementation point of view. Any singly
> > attached SAN connection that has a disconnect from its switch/backend
> > will cause the ZFS to panic, why would it not wait and see if the
> > device came back? Should all SAN connected ZFS pools have redundancy
> > built in with dual HBAs to dual SAN switches/controllers?
>
> UFS will panic on EIO also.  Most other file systems, too.
> You can put UFS on top of SVM, but unless SVM is configured for
> redundancy, it (UFS) would still panic in such situations.  ZFS
> doesn''t bring anything new here, but I sense a change in
expectations
> that I can''t quite reconcile.
>   -- richard
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Matthew Ahrens

2006-Dec-04 22:32 UTC

head link

[zfs-discuss] ZFS related kernel panic

Jason J. W. Williams wrote:> Hi all,
> 
> Having experienced this, it would be nice if there was an option to
> offline the filesystem instead of kernel panicking on a per-zpool
> basis. If its a system-critical partition like a database I''d
prefer
> it to kernel-panick and thereby trigger a fail-over of the
> application. However, if its a zpool hosting some fileshares I''d
> prefer it to stay online. Putting that level of control in would
> alleviate a lot of the complaints it seems to me...or at least give
> less of a leg to stand on. ;-)
Agreed, and we are working on this.

--matt

Peter Eriksson

2006-Dec-04 22:58 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

> If you take a look at these messages the somewhat unusual condition 
> that may lead to unexpected behaviour (ie. fast giveup) is that 
> whilst this is a SAN connection it is achieved through a non- 
> Leadville config, note the fibre-channel and sd references. In a 
> Leadville compliant installation this would be the ssd driver, hence 
> you''d have to investigate the specific semantics and driver tweaks
> that this system has applied to sd in this instance.
If only it was possible to use the Leadville drivers... We''ve seen the
same
problems here (*instant* panic if the FC switch reboots due to ZFS - I
wouldn''t mind if it kept on retrying a tad bit longer - preferably
configurable). And to panic? How can that in any sane way be good way to
"protect" the application?
*BANG* - no chance at all for the application to handle the problem...


Btw. in our case we have also wrapped the raw FC-attached "disks" with
SVM metadevices first because if a disk in a A3500FC units goes bad then we had
the _other_ failure mode of ZFS - total hang until I noticed that by wrapping
the device with a layer of SVM metadevices insulated ZFS from that problem - now
it correctly notices that the disk is "gone/dead" and displays that
when doing "zfs status" etc.


(We (Lysator ACS - a students computer club) can''t use the Leadville
driver, since the ''ifp" driver (and hence use the "ssd"
disks) for the Qlogic QLA2100 HBA boards is based on an older Qlogic firmware
that only supports max 16 LUNs per target and we want more... So we use the
Qlogic qla2100 driver instead which works really nicely but then it uses the
"sd" disk devices instead.

Being a computer club with limited funds means one finds ways to use old
hardware in new and interesting ways :-)

Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100 HBAs.
One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes (filled with
9 and 18GB FC disks), the other via a Brocade 2400 8-port switch (running in
"QuickLoop" mode) to a Compaq StorageWorks RA8000 RAID and two A3500FC
systems.

Now... What can *possibly* go wrong with that setup? :-)

I''ll tell you a couple:

1. When the server entered multiuser and started serving NFS to all the users
$HOME - many many disks in the A5000 started resetting themself again and again
and again... Solution: Tune down the maximum number of tagged commands that was
sent to the disks in /kernel/drv/qla2100.conf:
   hba1-max-iocb-allocation=7; # was 256
   hba1-execution-throttle=7; # was 31
(This problem wasn''t there with the old Sun "ifp" driver,
probably because it
has less agressive limits - but since that driver is totally nonconfigurable
it''s impossible to tell).

2. The power cord got slightly lose to the Brocade switch causing it to reboot
causing the server into an *Instant PANIC thanks to ZFS*
 
 
This message posted from opensolaris.org

Jason J. W. Williams

2006-Dec-04 23:13 UTC

head link

[zfs-discuss] ZFS related kernel panic

Any chance we might get a short refresher warning when creating a
striped zpool? O:-)

Best Regards,
Jason

On 12/4/06, Matthew Ahrens <Matthew.Ahrens at sun.com>
wrote:> Jason J. W. Williams wrote:
> > Hi all,
> >
> > Having experienced this, it would be nice if there was an option to
> > offline the filesystem instead of kernel panicking on a per-zpool
> > basis. If its a system-critical partition like a database I''d
prefer
> > it to kernel-panick and thereby trigger a fail-over of the
> > application. However, if its a zpool hosting some fileshares
I''d
> > prefer it to stay online. Putting that level of control in would
> > alleviate a lot of the complaints it seems to me...or at least give
> > less of a leg to stand on. ;-)
>
> Agreed, and we are working on this.
>
> --matt
>

James C. McPherson

2006-Dec-05 00:03 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

Peter Eriksson wrote:>> If you take a look at these messages the somewhat unusual condition 
>> that may lead to unexpected behaviour (ie. fast giveup) is that whilst
>> this is a SAN connection it is achieved through a non- Leadville
>> config, note the fibre-channel and sd references. In a Leadville
>> compliant installation this would be the ssd driver, hence
you''d have
>> to investigate the specific semantics and driver tweaks that this
>> system has applied to sd in this instance.
> 
> If only it was possible to use the Leadville drivers... We''ve seen
the
> same problems here (*instant* panic if the FC switch reboots due to ZFS -
> I wouldn''t mind if it kept on retrying a tad bit longer -
preferably
> configurable). And to panic? How can that in any sane way be good way to
> "protect" the application? *BANG* - no chance at all for the
application
> to handle the problem...
The *application* should not be worrying about handling error
conditions in the kernel. That''s the kernel''s job, and in this
case, ZFS'' job.

ZFS protects *your data* by preventing any more writes from
occurring when it cannot guarantee the integrity of your data.
> Btw. in our case we have also wrapped the raw FC-attached "disks"
with
> SVM metadevices first because if a disk in a A3500FC units goes bad then
> we had the _other_ failure mode of ZFS - total hang until I noticed that
> by wrapping the device with a layer of SVM metadevices insulated ZFS from
> that problem - now it correctly notices that the disk is
"gone/dead" and
> displays that when doing "zfs status" etc.
Hm. An extra layer of complexity. Kinda defeats one of stated goals
of ZFS.
> (We (Lysator ACS - a students computer club) can''t use the
Leadville
> driver, since the ''ifp" driver (and hence use the
"ssd" disks) for the
> Qlogic QLA2100 HBA boards is based on an older Qlogic firmware that only
> supports max 16 LUNs per target and we want more... So we use the Qlogic
> qla2100 driver instead which works really nicely but then it uses the
> "sd" disk devices instead. 
> Being a computer club with limited funds means one finds ways to use old
> hardware in new and interesting ways :-)
Ebay.se ?
> Hardware in use: Primary file server: Sun Ultra 450, two Qlogic QLA2100
> HBAs. One connected via an 8-port FC-AL *hub* to two Sun A5000 JBOD boxes
> (filled with 9 and 18GB FC disks), the other via a Brocade 2400 8-port
> switch (running in "QuickLoop" mode) to a Compaq StorageWorks
RA8000 RAID
> and two A3500FC systems.
> Now... What can *possibly* go wrong with that setup? :-)
Hmmm.... let''s start with the mere existence of the EOL''d
A3500fc
hardware in your config. Kinda goes downhill from there :)
> I''ll tell you a couple:
> 
> 1. When the server entered multiuser and started serving NFS to all the
> users $HOME - many many disks in the A5000 started resetting themself
> again and again and again... Solution: Tune down the maximum number of
> tagged commands that was sent to the disks in /kernel/drv/qla2100.conf: 
> hba1-max-iocb-allocation=7; # was 256 hba1-execution-throttle=7; # was 31
>  (This problem wasn''t there with the old Sun "ifp"
driver, probably
> because it has less agressive limits - but since that driver is totally
> nonconfigurable it''s impossible to tell).
Ebay.se
> 2. The power cord got slightly lose to the Brocade switch causing it to
> reboot causing the server into an *Instant PANIC thanks to ZFS*
Yes, as noted, this is by design in order to *protect your data*


James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems

Dale Ghent

2006-Dec-05 02:42 UTC

head link

[zfs-discuss] ZFS related kernel panic

Matthew Ahrens wrote:> Jason J. W. Williams wrote:
>> Hi all,
>>
>> Having experienced this, it would be nice if there was an option to
>> offline the filesystem instead of kernel panicking on a per-zpool
>> basis. If its a system-critical partition like a database I''d
prefer
>> it to kernel-panick and thereby trigger a fail-over of the
>> application. However, if its a zpool hosting some fileshares
I''d
>> prefer it to stay online. Putting that level of control in would
>> alleviate a lot of the complaints it seems to me...or at least give
>> less of a leg to stand on. ;-)
> 
> Agreed, and we are working on this.
Similar to UFS''s onerror mount option, I take it?

/dale

Anton B. Rang

2006-Dec-05 05:36 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

> And to panic? How can that in any sane way be good
> way to "protect" the application?
> *BANG* - no chance at all for the application to
> handle the problem...
I agree -- a disk error should never be fatal to the system; at worst, the file
system should appear to have been forcibly unmounted (and "worst"
really means that critical metadata, like the superblock/uberblock,
can''t be updated on any of the disks in the pool). That at least gives
other applications which aren''t using the file system the chance to
keep going.

An I/O error detected when writing a file can be reported at write() time,
fsync() time, or close() time. Any application which doesn''t check all
three of these won''t handle all I/O errors properly; and applications
which care about knowing that their data is on disk must either use synchronous
writes (O_SYNC/O_DSYNC) or call fsync before closing the file. ZFS should report
back these errors in all cases and avoid panicing (obviously).

That said, it also appears that the device drivers (either the FibreChannel or
SCSI disk drivers in this case) are misbehaving. The FC driver appears to be
reporting back an error which is interpreted as fatal by the SCSI disk driver
when one or the other should be retrying the I/O. (It also appears that either
the FC driver, SCSI disk driver, or ZFS is misbehaving in the observed hang.)

So ZFS should be more resilient against write errors, and the SCSI disk or FC
drivers should be more resilient against LIPs (the most likely cause of your
problem) or other transient errors. (Alternatively, the ifp driver should be
updated to support the maximum number of targets on a loop, which might also
solve your second problem.)
 
 
This message posted from opensolaris.org

James C. McPherson

2006-Dec-05 05:45 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

Anton B. Rang wrote:
 >Peter Eriksson wrote:>> And to panic? How can that in any sane way be good way to
"protect" the
>> application? *BANG* - no chance at all for the application to handle
>> the problem...
> 
> I agree -- a disk error should never be fatal to the system; at worst,
> the file system should appear to have been forcibly unmounted (and
> "worst" really means that critical metadata, like the
> superblock/uberblock, can''t be updated on any of the disks in the
pool).
> That at least gives other applications which aren''t using the file
system
> the chance to keep going.
But it''s still not the application''s problem to handle the
underlying
device failure.

...
> That said, it also appears that the device drivers (either the
> FibreChannel or SCSI disk drivers in this case) are misbehaving. The FC
> driver appears to be reporting back an error which is interpreted as
> fatal by the SCSI disk driver when one or the other should be retrying
> the I/O. (It also appears that either the FC driver, SCSI disk driver, or
> ZFS is misbehaving in the observed hang.)
In this case it is most likely that it''s the qla2x00 driver which is at
fault. The Leadville drivers do the appropriate retries. The sd driver
and ZFS also do the appropriate retries.
> So ZFS should be more resilient against write errors, and the SCSI disk
> or FC drivers should be more resilient against LIPs (the most likely
> cause of your problem) or other transient errors. (Alternatively, the ifp
> driver should be updated to support the maximum number of targets on a
> loop, which might also solve your second problem.)
Your alternative option isn''t going to happen. The ifp driver and
the card it supports have both been long since EOLd.



James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems

Richard Elling

2006-Dec-05 05:49 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

Anton B. Rang wrote:>> And to panic? How can that in any sane way be good
>> way to "protect" the application?
>> *BANG* - no chance at all for the application to
>> handle the problem...
> 
> I agree -- a disk error should never be fatal to the system; at worst, the
file system
> should appear to have been forcibly unmounted (and "worst" really
means that critical
> metadata, like the superblock/uberblock, can''t be updated on any
of the disks in the
> pool). That at least gives other applications which aren''t using
the file system the
> chance to keep going.
This is not always the desired behavior.  In particular, for a high availability
cluster, if one node is having difficulty and another is not, then we''d
really
like to have the services relocated to the good node ASAP.  I think this case is
different, though...
> An I/O error detected when writing a file can be reported at write() time,
fsync() time,
> or close() time. Any application which doesn''t check all three of
these won''t handle
> all I/O errors properly; and applications which care about knowing that
their data is
> on disk must either use synchronous writes (O_SYNC/O_DSYNC) or call fsync
before
> closing the file. ZFS should report back these errors in all cases and
avoid panicing
> (obviously).
 From what I recall of previous discussions on this topic (search the archives),
the difficulty is attributing a failure temporally, given that you want a file
system to have better performance by caching.
> That said, it also appears that the device drivers (either the FibreChannel
or SCSI
> disk drivers in this case) are misbehaving. The FC driver appears to be
reporting back
> an error which is interpreted as fatal by the SCSI disk driver when one or
the other
> should be retrying the I/O. (It also appears that either the FC driver,
SCSI disk
> driver, or ZFS is misbehaving in the observed hang.)
Agree 110%.  When debugging layered software/firmware, it is essential to
understand
all of the assumptions made at all interfaces.  Currently, ZFS assumes that a
fatal
write error is in fact fatal.
> So ZFS should be more resilient against write errors, and the SCSI disk or
FC drivers
> should be more resilient against LIPs (the most likely cause of your
problem) or other
> transient errors. (Alternatively, the ifp driver should be updated to
support the
> maximum number of targets on a loop, which might also solve your second
problem.)
NB. LIPs are a normal part of everyday life for fibre channel, they are not an
error.

But I think Anton is right here, the way that the driver deals with incurred
exceptions is key to the upper layers being stable.  This can be tuned, but
remember that tuning my lead to instability.  We might be dealing with an
instability
case here, not a functional spec problem.
  -- richard

Richard Elling

2006-Dec-05 06:01 UTC

head link

[zfs-discuss] ZFS related kernel panic

Dale Ghent wrote:> Matthew Ahrens wrote:
>> Jason J. W. Williams wrote:
>>> Hi all,
>>>
>>> Having experienced this, it would be nice if there was an option to
>>> offline the filesystem instead of kernel panicking on a per-zpool
>>> basis. If its a system-critical partition like a database
I''d prefer
>>> it to kernel-panick and thereby trigger a fail-over of the
>>> application. However, if its a zpool hosting some fileshares
I''d
>>> prefer it to stay online. Putting that level of control in would
>>> alleviate a lot of the complaints it seems to me...or at least give
>>> less of a leg to stand on. ;-)
>>
>> Agreed, and we are working on this.
> 
> Similar to UFS''s onerror mount option, I take it?
Actually, it would be interesting to see how many customers change the
onerror setting.  We have some data, just need more days in the hour.
  -- richard

Dale Ghent

2006-Dec-05 06:52 UTC

head link

[zfs-discuss] ZFS related kernel panic

Richard Elling wrote:
> Actually, it would be interesting to see how many customers change the
> onerror setting.  We have some data, just need more days in the hour.
I''m pretty sure you''d find that info in over 6 years of
submitted
Explorer output :)

I imagine that stuff is sandboxed away in a far off department, though.

/dale

Anton B. Rang

2006-Dec-05 15:34 UTC

head link

[zfs-discuss] Re: Re: ZFS related kernel panic

> But it''s still not the application''s problem to handle
the underlying
> device failure.
But it is the application''s problem to handle an error writing to the
file system -- that''s why the file system is allowed to return errors. 
;-)

Some applications might not check them, some applications might not have
anything reasonable to do (though they can usually at least output a useful
message to stderr), but other applications may be more robust.  It''s
not particularly uncommon for an application to encounter an error writing to
volume X and then choose to write to volume Y instead; or to report the error
back to another component or the end user.
 
 
This message posted from opensolaris.org

Peter Eriksson

2006-Dec-05 19:08 UTC

head link

[zfs-discuss] Re: Re: ZFS related kernel panic

>> So ZFS should be more resilient against write errors, and the SCSI disk
or FC drivers
>> should be more resilient against LIPs (the most likely cause of your
problem) or other
>> transient errors. (Alternatively, the ifp driver should be updated to
support the
>> maximum number of targets on a loop, which might also solve your second
problem.)
>
> NB. LIPs are a normal part of everyday life for fibre channel, they are not
an error.
Right. I don''t think it''s the LIP''s that''s
the problem but rather (a guess, not verified) the fact that the HBA loses
"light" on it''s fiber interface when the switch reboots... I
think I also saw the same ZFS-induced panic when I (stupid, I know, but...)
moved a fiber cable from one GBIC in the switch to another "on the
run". I also saw this with the "ifp" driver btw. And as someone
wrote - the ifp driver will never be updated since it''s for
EOL''d hardware :-)
 
 
This message posted from opensolaris.org

Peter Eriksson

2006-Dec-05 19:13 UTC

head link

[zfs-discuss] Re: Re: ZFS related kernel panic

Hmm... I just noticed this qla2100.conf option:

# During link down conditions enable/disable the reporting of
# errors.
#    0 = disabled, 1 = enable
hba0-link-down-error=1;
hba1-link-down-error=1;

I _wonder_ what might possibly happen if I change that 1 to a 0 (zero)... :-)
 
 
This message posted from opensolaris.org

Douglas Denny

2006-Dec-05 19:44 UTC

head link

[zfs-discuss] Re: Re: ZFS related kernel panic

On 12/5/06, Peter Eriksson <peter at ifm.liu.se>
wrote:> Hmm... I just noticed this qla2100.conf option:
>
> # During link down conditions enable/disable the reporting of
> # errors.
> #    0 = disabled, 1 = enable
> hba0-link-down-error=1;
> hba1-link-down-error=1;
This is the driver the we are using in this configuration. Excellent
insight. This will be added to the testing. Although, we are moving
away from the Qlogic cards, back to the Sun branded Qlogic cards to
use MPxIO. Which works flawlessly with UFS and raw drives.

I wonder if using MPxIO, and dual connected / path HBAs would also
reduce these errors.

Robert Milkowski

2006-Dec-12 01:50 UTC

head link

[zfs-discuss] ZFS related kernel panic

Hello Richard,

Tuesday, December 5, 2006, 7:01:17 AM, you wrote:

RE> Dale Ghent wrote:>>
>> Similar to UFS''s onerror mount option, I take it?
RE> Actually, it would be interesting to see how many customers change the
RE> onerror setting.  We have some data, just need more days in the hour.

Sometimes we do.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Richard Elling

2006-Dec-12 05:27 UTC

head link

[zfs-discuss] ZFS related kernel panic

Robert Milkowski wrote:> Hello Richard,
> 
> Tuesday, December 5, 2006, 7:01:17 AM, you wrote:
> 
> RE> Dale Ghent wrote:
>>> Similar to UFS''s onerror mount option, I take it?
> 
> RE> Actually, it would be interesting to see how many customers change
the
> RE> onerror setting.  We have some data, just need more days in the
hour.
> 
> Sometimes we do.
A preliminary look at a sample of the data shows that 1.6% do change this
to something other than the default (panic).  Though this is a statistically
significant sample, it is skewed towards the high-end systems.  A more
detailed study would look at the instances where we had a problem, and the
system did not panic.
  -- richard

Anton B. Rang

2006-Dec-12 16:09 UTC

head link

[zfs-discuss] Re: ZFS related kernel panic

> UFS will panic on EIO also.  Most other file systems, too.
In which cases will UFS panic on an I/O error?

A quick browse through the UFS code shows several cases where we can panic if we
have bad metadata on disk, but none if a disk read (or write) fails altogether.

If UFS fails to read a block, it returns EIO (in most cases, occasionally a
different error depending on the context) to its caller.  (In a few cases, it
can continue past the error; for instance, if it can''t read a cylinder
group header and wants to allocate a block there, it will go on to a different
cylinder group.)

If UFS fails to write a block, the buffer cache or page cache will just keep
retrying.

QFS won''t even panic on bad metadata, unless enabled with an
/etc/system variable; it will just returns errors to its caller. (It
won''t panic on I/O errors at all.)

---

As for why expectations with ZFS are higher?  I suspect that it''s
primarily because ZFS has been sold (deservedly) as being very good at dealing
with hardware problems. This means that it should not only detect the problems,
but continue on past them whenever possible. Ditto blocks are a first step in
this direction. Bringing down the machine when a read or write fails is so
1980s; ZFS needs a bit of fine-tuning here.

We don''t need to be defensive. ZFS is a new file system. It will take
some time to work all the quirks out and it will take some time to eliminate all
the panic cases. But we will.
 
 
This message posted from opensolaris.org

Lyle Merdan

2006-Dec-21 17:10 UTC

head link

[zfs-discuss] Re: Re: Re: ZFS related kernel panic

At a minimum use the QLA2200 HBAs. As they were only recently EOLd. If you tried
to give me a QLA2100 series HBA, I would not accept it. It''s 5
generations behind the current FC hardware. At least with a QLA2200 HBA you will
get qlc support and MPXIO.

Lyle
 
 
This message posted from opensolaris.org

zfs discuss - Dec 2006 - ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] Re: Re: ZFS related kernel panic

[zfs-discuss] Re: Re: ZFS related kernel panic

[zfs-discuss] Re: Re: ZFS related kernel panic

[zfs-discuss] Re: Re: ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] ZFS related kernel panic

[zfs-discuss] Re: ZFS related kernel panic

[zfs-discuss] Re: Re: Re: ZFS related kernel panic