Hi All, this morning one of our edge fc switches died. Thanks to multipath all the nodes using that switch keeps running except the ONLY ONE still using ZFS (we went back to UFS on all our production servers) ! In that one we had one path failed and than a CPU panic :( Here is the logs: Feb 23 06:39:05 server141 panic[cpu1]/thread=fffffe8000c7bc80: Feb 23 06:39:05 server141 genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on <unknown> off 0: zio ffffff ff914b0540 [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:5b91cc4800:1800> zilog uncompressed LE contiguous birt h=7194171 fill=0 cksum=db3a8e0a817108b0:9a407c634a1a728a:406:3773): error 5 Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice] Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bac0 zfs:zfsctl_ops_root+2fdf1f02 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bad0 zfs:zio_next_stage+72 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb00 zfs:zio_wait_for_children+49 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb10 zfs:zio_wait_children_done+15 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb20 zfs:zio_next_stage+72 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb60 zfs:zio_vdev_io_assess+82 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb70 zfs:zio_next_stage+72 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbd0 zfs:vdev_mirror_io_done+c1 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbe0 zfs:zio_vdev_io_done+14 () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc60 genunix:taskq_thread+bc () Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc70 unix:thread_start+8 () Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice] Feb 23 06:39:05 server141 genunix: [ID 672855 kern.notice] syncing file systems... Feb 23 06:39:05 server141 genunix: [ID 733762 kern.notice] 1 Feb 23 06:39:06 server141 genunix: [ID 904073 kern.notice] done This message posted from opensolaris.org
Hi Gino, We''ve noticed similar strangeness with ZFS on MPXIO. If you actively fail over the path, everything works hunky dory. However, if one of the paths disappears unexpectedly (i.e. FC switch dies...or an array controller konks out) then ZFS will panic. UFS on MPXIO in as similar situation doesn''t. If you use RAID-Z/Z2 or RAID-1/10 you won''t have this problem though. Its a bit strange though. -J On 2/23/07, Gino Ruopolo <ginoruopolo at hotmail.com> wrote:> Hi All, > this morning one of our edge fc switches died. Thanks to multipath all the nodes using that switch keeps running except the ONLY ONE still using ZFS (we went back to UFS on all our production servers) ! > In that one we had one path failed and than a CPU panic :( > > Here is the logs: > > Feb 23 06:39:05 server141 panic[cpu1]/thread=fffffe8000c7bc80: > Feb 23 06:39:05 server141 genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on <unknown> off 0: zio ffffff > ff914b0540 [L0 ZIL intent log] 1000L/1000P DVA[0]=<0:5b91cc4800:1800> zilog uncompressed LE contiguous birt > h=7194171 fill=0 cksum=db3a8e0a817108b0:9a407c634a1a728a:406:3773): error 5 > Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice] > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bac0 zfs:zfsctl_ops_root+2fdf1f02 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bad0 zfs:zio_next_stage+72 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb00 zfs:zio_wait_for_children+49 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb10 zfs:zio_wait_children_done+15 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb20 zfs:zio_next_stage+72 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb60 zfs:zio_vdev_io_assess+82 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bb70 zfs:zio_next_stage+72 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbd0 zfs:vdev_mirror_io_done+c1 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bbe0 zfs:zio_vdev_io_done+14 () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc60 genunix:taskq_thread+bc () > Feb 23 06:39:05 server141 genunix: [ID 655072 kern.notice] fffffe8000c7bc70 unix:thread_start+8 () > Feb 23 06:39:05 server141 unix: [ID 100000 kern.notice] > Feb 23 06:39:05 server141 genunix: [ID 672855 kern.notice] syncing file systems... > Feb 23 06:39:05 server141 genunix: [ID 733762 kern.notice] 1 > Feb 23 06:39:06 server141 genunix: [ID 904073 kern.notice] done > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi Jason, saturday we made some tests and found that disabling a FC port under heavy load (MPXio enabled) often takes to a panic. (using a RAID-Z !) No problems with UFS ... later, Gino This message posted from opensolaris.org
Hi Gino, Was there more than one LUN in the RAID-Z using the port you disabled? -J On 2/26/07, Gino Ruopolo <ginoruopolo at hotmail.com> wrote:> Hi Jason, > > saturday we made some tests and found that disabling a FC port under heavy load (MPXio enabled) often takes to a panic. (using a RAID-Z !) > No problems with UFS ... > > later, > Gino > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi Jason, we done the tests using S10U2, two fc cards, MPXIO. 5 LUN in a raidZ group. Each LUN was visible to both the fc card. Gino> Hi Gino, > > Was there more than one LUN in the RAID-Z using the > port you disabled? >This message posted from opensolaris.org