Hi All,
yesterday we done some tests with ZFS using a new server and a new JBOD going in
production this week.
Here is what we found:
1) Solaris seems unable to recognize as "disk" any fc disk already
labeled by a storage processor. cfgadm reports them as "unknown".
We had to start linux and clean the partition table to have Solaris recognize
the disks ... :(
2) Our test server was connected to the JBOD through a dual fc adapter, dual fc
switch, MPXIO enabled.
We had MANY PANICS doing the following when the pool was loaded with a dd ..
-disconnecting and reconnectiong a few times one of the fc link.
-enabling/disabling a fc link port on one fc switch.
-powering off one of the two fc switches
Sometimes we get a panic and nothing on the logs!
Just a few examples:
Mar 3 18:38:54 TESTSVR offlining lun=0 (trace=0), target=cd (trace=2800004)
Mar 3 18:38:55 TESTSVR unix: [ID 836849 kern.notice]
Mar 3 18:38:55 TESTSVR ^Mpanic[cpu0]/thread=fffffe8000d1cc80:
Mar 3 18:38:55 TESTSVR genunix: [ID 809409 kern.notice] ZFS: I/O failure (write
on <unknown> off 0: zio fffffe8322055280 [L0 unallocated] 20000L/20000P
DVA[0]=<1:575a0
000:20000> fletcher2 uncompressed LE contiguous birth=9 fill=0
cksum=0:0:0:0): error 14
Mar 3 18:38:55 TESTSVR unix: [ID 100000 kern.notice]
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cac0
zfs:zfsctl_ops_root+2f9c8b42 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cad0
zfs:zio_next_stage+72 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cb00
zfs:zio_wait_for_children+49 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cb10
zfs:zio_wait_children_done+15 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cb20
zfs:zio_next_stage+72 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cb60
zfs:zio_vdev_io_assess+82 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cb70
zfs:zio_next_stage+72 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cbd0
zfs:vdev_mirror_io_done+c1 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cbe0
zfs:zio_vdev_io_done+14 ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cc60
genunix:taskq_thread+bc ()
Mar 3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fffffe8000d1cc70
unix:thread_start+8 ()
Mar 3 18:38:55 TESTSVR unix: [ID 100000 kern.notice]
Mar 3 18:38:55 TESTSVR genunix: [ID 672855 kern.notice] syncing file systems...
Mar 3 18:51:52 TESTSVR savecore: [ID 570001 auth.error] reboot after panic:
ZFS: I/O failure (write on <unknown> off 0: zio fffffe8322055280 [L0
unallocated] 20000L/20
000P DVA[0]=<1:575a0000:20000> fletcher2 uncompressed LE contiguous
birth=9 fill=0 cksum=0:0:0:0): error 14
PANIC
Nothing on the log!
Mar 4 19:08:21 TESTSVR savecore: [ID 570001 auth.error] reboot after panic:
ZFS: I/O failure (write on <unknown> off 0: zio fffffe8322055280 [L0
unallocated] 20000L/20
000P DVA[0]=<1:575a0000:20000> fletcher2 uncompressed LE contiguous
birth=9 fill=0 cksum=0:0:0:0): error 14
PANIC
Nothing on the log!
Mar 4 19:11:20 TESTSVR savecore: [ID 570001 auth.error] reboot after panic:
ZFS: I/O failure (write on <unknown> off 0: zio fffffe8322055280 [L0
unallocated] 20000L/20
000P DVA[0]=<1:575a0000:20000> fletcher2 uncompressed LE contiguous
birth=9 fill=0 cksum=0:0:0:0): error 14
Mar 4 19:25:37 TESTSVR genunix: [ID 834635 kern.info] /scsi_vhci/disk at
g20000004cfd87b7b (sd13) multipath status: degraded, path /pci at
1,0/pci1022,7450 at a/pci1011,26 at e/pc
i1077,2 at 4/fp at 0,0 (fp2) to target address: w22000004cfd87b7b,0 is offline
Load balancing: round-robin
Mar 4 19:25:37 TESTSVR unix: [ID 836849 kern.notice]
Mar 4 19:25:37 TESTSVR ^Mpanic[cpu3]/thread=fffffe80002e1c80:
Mar 4 19:25:37 TESTSVR genunix: [ID 809409 kern.notice] ZFS: I/O failure (write
on <unknown> off 0: zio fffffe811bdb7800 [L0 unallocated] 20000L/20000P
DVA[0]=<3:56260
000:20000> fletcher2 uncompressed LE contiguous birth=22 fill=0
cksum=0:0:0:0): error 14
Mar 4 19:25:37 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1ac0
zfs:zfsctl_ops_root+2f9c8b42 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1ad0
zfs:zio_next_stage+72 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1b00
zfs:zio_wait_for_children+49 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1b10
zfs:zio_wait_children_done+15 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1b20
zfs:zio_next_stage+72 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1b60
zfs:zio_vdev_io_assess+82 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1b70
zfs:zio_next_stage+72 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1bd0
zfs:vdev_mirror_io_done+c1 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1be0
zfs:zio_vdev_io_done+14 ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1c60
genunix:taskq_thread+bc ()
Mar 4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fffffe80002e1c70
unix:thread_start+8 ()
Mar 4 19:25:37 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 19:25:37 TESTSVR genunix: [ID 672855 kern.notice] syncing file systems...
Mar 4 19:25:37 TESTSVR genunix: [ID 904073 kern.notice] done
Mar 4 19:34:24 TESTSVR genunix: [ID 403854 kern.notice] assertion failed:
vdev_config_sync(rvd, txg) == 0, file: ../../common/fs/zfs/spa.c, line: 2801
Mar 4 19:34:24 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 19:34:24 TESTSVR genunix: [ID 802836 kern.notice] fffffe80007e0b60
fffffffffb9b49f3 ()
Mar 4 19:34:24 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007e0bd0
zfs:spa_sync+39c ()
Mar 4 19:34:24 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007e0c60
zfs:txg_sync_thread+115 ()
Mar 4 19:34:24 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007e0c70
unix:thread_start+8 ()
Mar 4 19:34:24 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 19:34:24 TESTSVR genunix: [ID 672855 kern.notice] syncing file systems...
Mar 4 19:34:24 TESTSVR genunix: [ID 904073 kern.notice] done
Mar 4 20:33:35 TESTSVR genunix: [ID 809409 kern.notice] ZFS: I/O failure (write
on <unknown> off 0: zio fffffe83170da300 [L0 unallocated] 20000L/20000P
DVA[0]=<6:70660
000:20000> fletcher2 uncompressed LE contiguous birth=462 fill=0
cksum=0:0:0:0): error 14
Mar 4 20:33:35 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9ac0
zfs:zfsctl_ops_root+2f9c8b42 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9ad0
zfs:zio_next_stage+72 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9b00
zfs:zio_wait_for_children+49 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9b10
zfs:zio_wait_children_done+15 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9b20
zfs:zio_next_stage+72 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9b60
zfs:zio_vdev_io_assess+82 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9b70
zfs:zio_next_stage+72 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9bd0
zfs:vdev_mirror_io_done+c1 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9be0
zfs:zio_vdev_io_done+14 ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9c60
genunix:taskq_thread+bc ()
Mar 4 20:33:35 TESTSVR genunix: [ID 655072 kern.notice] fffffe80007b9c70
unix:thread_start+8 ()
Mar 4 20:33:35 TESTSVR unix: [ID 100000 kern.notice]
Mar 4 20:33:35 TESTSVR genunix: [ID 672855 kern.notice] syncing file systems...
Mar 4 20:33:36 TESTSVR genunix: [ID 904073 kern.notice] done
3) Zpool throughput isn''t stable. Sometimes we get 150+ MB/s write
performance with 1 link, sometimes just around 40MB/s.
Using only one fc link give us always stable performance at 160MB/s.
Conclusion:
After a day of tests we are going to think that ZFS doesn''t work well
with MPXIO.
awaiting for your comments,
Gino
This message posted from opensolaris.org