thr3ads.net - zfs discuss - [zfs-discuss] ARGHH. An other panic!! [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Gino Ruopolo

2007-Feb-23 10:48 UTC

[zfs-discuss] ARGHH. An other panic!!

Hi All,
this morning one of our edge fc switches died. Thanks to multipath all the nodes
using that switch keeps running except the ONLY ONE still using ZFS  (we went
back to UFS on all our production servers) !
In that one we had one path failed and than a CPU panic :(

Here is the logs:

Feb 23 06:39:05 server141 panic[cpu1]/thread=fffffe8000c7bc80:
Feb 23 06:39:05 server141 genunix: [ID 809409 kern.notice] ZFS: I/O failure
(write on <unknown> off 0: zio ffffff
ff914b0540 [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:5b91cc4800:1800>
zilog uncompressed LE contiguous birt
h=7194171 fill=0 cksum=db3a8e0a817108b0:9a407c634a1a728a:406:3773): error 5
Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice]
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bac0
zfs:zfsctl_ops_root+2fdf1f02 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bad0
zfs:zio_next_stage+72 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb00
zfs:zio_wait_for_children+49 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb10
zfs:zio_wait_children_done+15 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb20
zfs:zio_next_stage+72 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb60
zfs:zio_vdev_io_assess+82 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb70
zfs:zio_next_stage+72 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbd0
zfs:vdev_mirror_io_done+c1 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbe0
zfs:zio_vdev_io_done+14 ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc60
genunix:taskq_thread+bc ()
Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc70
unix:thread_start+8 ()
Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice]
Feb 23 06:39:05 server141 genunix: [ID 672855 kern.notice] syncing file
systems...
Feb 23 06:39:05 server141 genunix: [ID 733762 kern.notice] 1
Feb 23 06:39:06 server141 genunix: [ID 904073 kern.notice] done
 
 
This message posted from opensolaris.org

Jason J. W. Williams

2007-Feb-23 17:34 UTC

head link

[zfs-discuss] ARGHH. An other panic!!

Hi Gino,

We''ve noticed similar strangeness with ZFS on MPXIO. If you actively
fail over the path, everything works hunky dory. However, if one of
the paths disappears unexpectedly (i.e. FC switch dies...or an array
controller konks out) then ZFS will panic. UFS on MPXIO in as similar
situation doesn''t.

If you use RAID-Z/Z2 or RAID-1/10 you won''t have this problem though.
Its a bit strange though.

-J

On 2/23/07, Gino Ruopolo <ginoruopolo at hotmail.com>
wrote:> Hi All,
> this morning one of our edge fc switches died. Thanks to multipath all the
nodes using that switch keeps running except the ONLY ONE still using ZFS  (we
went back to UFS on all our production servers) !
> In that one we had one path failed and than a CPU panic :(
>
> Here is the logs:
>
> Feb 23 06:39:05 server141 panic[cpu1]/thread=fffffe8000c7bc80:
> Feb 23 06:39:05 server141 genunix: [ID 809409 kern.notice] ZFS: I/O failure
(write on <unknown> off 0: zio ffffff
> ff914b0540 [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:5b91cc4800:1800>
zilog uncompressed LE contiguous birt
> h=7194171 fill=0 cksum=db3a8e0a817108b0:9a407c634a1a728a:406:3773): error 5
> Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice]
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bac0
zfs:zfsctl_ops_root+2fdf1f02 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bad0
zfs:zio_next_stage+72 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb00
zfs:zio_wait_for_children+49 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb10
zfs:zio_wait_children_done+15 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb20
zfs:zio_next_stage+72 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb60
zfs:zio_vdev_io_assess+82 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb70
zfs:zio_next_stage+72 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbd0
zfs:vdev_mirror_io_done+c1 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbe0
zfs:zio_vdev_io_done+14 ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc60
genunix:taskq_thread+bc ()
> Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc70
unix:thread_start+8 ()
> Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice]
> Feb 23 06:39:05 server141 genunix: [ID 672855 kern.notice] syncing file
systems...
> Feb 23 06:39:05 server141 genunix: [ID 733762 kern.notice] 1
> Feb 23 06:39:06 server141 genunix: [ID 904073 kern.notice] done
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Gino Ruopolo

2007-Feb-26 09:01 UTC

head link

[zfs-discuss] Re: ARGHH. An other panic!!

Hi Jason,

saturday we made some tests and found that disabling a FC port under heavy load
(MPXio enabled) often takes to a panic.  (using a RAID-Z !)
No problems with UFS ...

later,
Gino
 
 
This message posted from opensolaris.org

Jason J. W. Williams

2007-Feb-26 19:14 UTC

head link

[zfs-discuss] Re: ARGHH. An other panic!!

Hi Gino,

Was there more than one LUN in the RAID-Z using the port you disabled?

-J

On 2/26/07, Gino Ruopolo <ginoruopolo at hotmail.com>
wrote:> Hi Jason,
>
> saturday we made some tests and found that disabling a FC port under heavy
load (MPXio enabled) often takes to a panic.  (using a RAID-Z !)
> No problems with UFS ...
>
> later,
> Gino
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Gino Ruopolo

2007-Feb-27 10:26 UTC

head link

[zfs-discuss] Re: Re: ARGHH. An other panic!!

Hi Jason, 

we done the tests using S10U2, two fc cards, MPXIO. 
5 LUN in a raidZ group. 
Each LUN was visible to both the fc card. 

Gino 

> Hi Gino,
> 
> Was there more than one LUN in the RAID-Z using the
> port you disabled?
> 
 
This message posted from opensolaris.org

zfs discuss - Feb 2007 - ARGHH. An other panic!!

[zfs-discuss] ARGHH. An other panic!!

[zfs-discuss] ARGHH. An other panic!!

[zfs-discuss] Re: ARGHH. An other panic!!

[zfs-discuss] Re: ARGHH. An other panic!!

[zfs-discuss] Re: Re: ARGHH. An other panic!!