Armin Ollig
2008-Oct-27 15:23 UTC
[zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus
Hi Venku and all others, thanks for your suggestions. I wrote a script to do some IO from both hosts (in non-cluster-mode) to the FC-LUNs in questions and check the md5sums of all files afterwards. As expected there was no corruption. After recreating the cluster-resource and a few failovers i found the HASP resource in this state, with the vb1 zfs concurrently mounted on *both* nodes: # clresource status vb1-storage === Cluster Resources ==Resource Name Node Name State Status Message ------------- --------- ----- -------------- vb1-storage siegfried Offline Offline voelsung Starting Unknown - Starting siegfried# zpool status pool: vb1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM vb1 ONLINE 0 0 0 c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0 errors: No known data errors voelsung# zpool status pool: vb1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM vb1 ONLINE 0 0 0 c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0[/i] In this state filesystem-corruption can occur easily. The zpool was created using the cluster-wide did device: zpool create vb1 /dev/did/dsk/d12s0 There was no fc path failure to the LUNs, both interconnects are normal. After some some minutes in this state a kernel panic is triggered and both nodes reboot. Oct 27 16:09:10 voelsung Cluster.RGM.fed: [ID 922870 daemon.error] tag vb1.vb1-storage.10: unable to kill process with SIGKILL Oct 27 16:09:10 voelsung Cluster.RGM.rgmd: [ID 904914 daemon.error] fatal: Aborting this node because method <hastorageplus_prenet_start> on resource <vb1-storage> for node <voelsung> is unkillable Best wishes, Armin -- This message posted from opensolaris.org
Victor Latushkin
2008-Oct-27 15:45 UTC
[zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStoragePlus
Armin Ollig ?????:> Hi Venku and all others, > > thanks for your suggestions. I wrote a script to do some IO from both > hosts (in non-cluster-mode) to the FC-LUNs in questions and check the > md5sums of all files afterwards. As expected there was no corruption. > > > After recreating the cluster-resource and a few failovers i found the > HASP resource in this state, with the vb1 zfs concurrently mounted on > *both* nodes:Have you imported your pool manually on on eof the hosts? Do you have file /etc/zfs/zpool.cache on these boxes? If yes, could you please provide it?> # clresource status vb1-storage > === Cluster Resources ==> Resource Name Node Name State Status Message > ------------- --------- ----- -------------- > vb1-storage siegfried Offline Offline > voelsung Starting Unknown - Starting > > > siegfried# zpool status > pool: vb1 > state: ONLINE > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > vb1 ONLINE 0 0 0 > c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0 > > errors: No known data errors > voelsung# zpool status > pool: vb1 > state: ONLINE > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > vb1 ONLINE 0 0 0 > c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0[/i] > > > In this state filesystem-corruption can occur easily.Yes, this is bad and can lead to corruption.> The zpool was created using the cluster-wide did device: > zpool create vb1 /dev/did/dsk/d12s0But this differs from above status reported by ''zpool status'' - it shows pool is made from device c4t600D0230000000000088824BC4228807d0s0, and you suggest that it was created from /dev/did/dsk/d12s0 victor> There was no fc path failure to the LUNs, both interconnects are normal. > After some some minutes in this state a kernel panic is triggered and both nodes reboot. > > Oct 27 16:09:10 voelsung Cluster.RGM.fed: [ID 922870 daemon.error] tag vb1.vb1-storage.10: unable to kill process with SIGKILL > Oct 27 16:09:10 voelsung Cluster.RGM.rgmd: [ID 904914 daemon.error] fatal: Aborting this node because method <hastorageplus_prenet_start> on resource <vb1-storage> for node <voelsung> is unkillable
Armin Ollig
2008-Oct-27 16:32 UTC
[zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStora
Hi Victor, it was initially created from c4t600D0230000000000088824BC4228807d0s0, then destroyed and recreated from /dev/did/dsk/d12s0. You are right: It still shows up the old dev. There seems to be a problem with device reservation and the cluster framework, since i get a kernel panic once in a while when a node leaves....: Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Quorum device /dev/did/rdsk/d11s2: owner set to node 1. Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Node voelsung (nodeid = 2) is down. Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Cluster members: siegfried. Oct 27 17:14:12 siegfried genunixN: NOTICE: CMM:otnode reconfigurtiion #5 completfying cluster that this node is panicking panic[cpu1]/thread=ffffff000f5a3c80: Reservation Conflict Disk: /scsi_vhci/disk at g600d02300000000000888275cd1cc500 ffffff000f5a3a00 sd:sd_panic_for_res_conflict+4f () ffffff000f5a3a40 sd:sd_pkt_status_reservation_conflict+a8 () ffffff000f5a3a90 sd:sdintr+44e () ffffff000f5a3b30 scsi_vhci:vhci_intr+6ac () ffffff000f5a3b50 fcp:fcp_post_callback+1e () ffffff000f5a3b90 fcp:fcp_cmd_callback+4b () ffffff000f5a3bd0 emlxs:emlxs_iodone+b1 () ffffff000f5a3c20 emlxs:emlxs_iodone_server+15d () ffffff000f5a3c60 emlxs:emlxs_thread+15e () ffffff000f5a3c70 unix:thread_start+8 () syncing file systems... 5 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 done (not all i/o completed) dumping to /dev/dsk/c4t600D0230000000000088824BC4228803d0s1, offset 1719074816, content: kernel 100% done: 192956 pages dumped, compression ratio 4.32, dump succeeded rebooting... I will update to nv99 and try to reproduce the issue. Best wishes, Armin -- This message posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: voelsung-zpool.cache Type: application/octet-stream Size: 1072 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081027/dff42919/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: siegfried-zpool.cache Type: application/octet-stream Size: 1076 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081027/dff42919/attachment-0001.obj>
Victor Latushkin
2008-Oct-27 16:49 UTC
[zfs-discuss] [ha-clusters-discuss] (ZFS) file corruption with HAStora
Armin Ollig ?????:> Hi Victor, > > it was initially created from > c4t600D0230000000000088824BC4228807d0s0, then destroyed and recreated > from /dev/did/dsk/d12s0. You are right: It still shows up the old > dev.You pool is cached in zpool.caches of both hosts (with the old device path): bash-3.2# usr/sbin/i86/zdb -U /export/home/vl146290/siegfried-zpool.cache vb1 version=12 name=''vb1'' state=0 txg=192 pool_guid=880122360910369369 hostid=872416547 hostname=''siegfried'' vdev_tree type=''root'' id=0 guid=880122360910369369 children[0] type=''disk'' id=0 guid=4904058874028023190 path=''/dev/dsk/c4t600D0230000000000088824BC4228807d0s0'' devid=''id1,sd at n600d0230000000000088824bc4228807/a'' phys_path=''/scsi_vhci/disk at g600d0230000000000088824bc4228807:a'' whole_disk=0 metaslab_array=23 metaslab_shift=29 ashift=9 asize=107369463808 is_log=0 bash-3.2# usr/sbin/i86/zdb -U /export/home/vl146290/voelsung-zpool.cache vb1 version=12 name=''vb1'' state=0 txg=185 pool_guid=880122360910369369 hostid=872416547 hostname=''voelsung'' vdev_tree type=''root'' id=0 guid=880122360910369369 children[0] type=''disk'' id=0 guid=4904058874028023190 path=''/dev/dsk/c4t600D0230000000000088824BC4228807d0s0'' devid=''id1,sd at n600d0230000000000088824bc4228807/a'' phys_path=''/scsi_vhci/disk at g600d0230000000000088824bc4228807:a'' whole_disk=0 metaslab_array=23 metaslab_shift=29 ashift=9 asize=107369463808 is_log=0 bash-3.2# This causes simultaneous pool import. Probably this is a result of manual pool imports in the past. Try to get rid of zpool.cache on both hosts and see if it reproduces.> There seems to be a problem with device reservation and the cluster > framework, since i get a kernel panic once in a while when a node > leaves....:Well, then it may be better to start from a known clean state first. victor> Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Quorum device /dev/did/rdsk/d11s2: owner set to node 1. > Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Node voelsung (nodeid = 2) is down. > Oct 27 17:14:12 siegfried genunix: NOTICE: CMM: Cluster members: siegfried. > Oct 27 17:14:12 siegfried genunixN: NOTICE: CMM:otnode reconfigurtiion #5 completfying cluster that this node is panicking > > panic[cpu1]/thread=ffffff000f5a3c80: Reservation Conflict > Disk: /scsi_vhci/disk at g600d02300000000000888275cd1cc500 > > ffffff000f5a3a00 sd:sd_panic_for_res_conflict+4f () > ffffff000f5a3a40 sd:sd_pkt_status_reservation_conflict+a8 () > ffffff000f5a3a90 sd:sdintr+44e () > ffffff000f5a3b30 scsi_vhci:vhci_intr+6ac () > ffffff000f5a3b50 fcp:fcp_post_callback+1e () > ffffff000f5a3b90 fcp:fcp_cmd_callback+4b () > ffffff000f5a3bd0 emlxs:emlxs_iodone+b1 () > ffffff000f5a3c20 emlxs:emlxs_iodone_server+15d () > ffffff000f5a3c60 emlxs:emlxs_thread+15e () > ffffff000f5a3c70 unix:thread_start+8 () > > syncing file systems... 5 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 done (not all i/o completed) > dumping to /dev/dsk/c4t600D0230000000000088824BC4228803d0s1, offset 1719074816, content: kernel > 100% done: 192956 pages dumped, compression ratio 4.32, dump succeeded > rebooting... > > > I will update to nv99 and try to reproduce the issue. > > Best wishes, > Armin