Ulf Björklund
2007-Sep-17 13:48 UTC
[zfs-discuss] Strange behavior zfs and soalris cluster
Hi All, Two and three-node clusters with SC3.2 and S10u3 (120011-14). If a node is rebooted when using SCSI3-PGR the node is not able to take the zpool by HAStoragePlus due to reservation conflict. SCSI2-PGRE is okay. Using the same SAN-LUN:s in a metaset (SVM) and HAStoragePlus works okay with PGR and PGRE. (both SMI and EFI-labled disks) If using scshutdown and restart all nodes then it will work. Also, (interesting) If I reboot a node and then run: update_drv -f ssd , then the node will be able to take SCSI3-PGR zpools. Storage or Solaris/Cluster issue ? What is the differences between SVM and ZFS from the ssd point of view in this case? /Regards Ulf This message posted from opensolaris.org
> Two and three-node clusters with SC3.2 and S10u3 > (120011-14). > If a node is rebooted when using SCSI3-PGR the node > is not > able to take the zpool by HAStoragePlus due to > reservation conflict. > SCSI2-PGRE is okay. > Using the same SAN-LUN:s in a metaset (SVM) and > HAStoragePlus > works okay with PGR and PGRE. (both SMI and > EFI-labled disks) > > If using scshutdown and restart all nodes then it > will work. > Also, (interesting) If I reboot a node and then run: > update_drv -f ssd , > then the node will be able to take SCSI3-PGR zpools. > > Storage or Solaris/Cluster issue ? > What is the differences between SVM and ZFS from > the ssd point of view in this case?I had a similar problem with a two-node X86 cluster running s10u3 (no extra patches) + SC 3.2 (HA-NFS + ZFS). When I rebooted the node who owned the quorum device, the surviving node would panic with: panic[cpu0]/thread=ffffffff9cec71a0: CMM: Cluster lost operational quorum; aborting. fffffe80021f5b50 genunix:vcmn_err+13 () fffffe80021f5b60 cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+24 () fffffe80021f5c40 cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+9d () fffffe80021f5e20 cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+3bc () fffffe80021f5e60 cl_haci:__1cIcmm_implStransitions_thread6M_v_+de () fffffe80021f5e70 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+b () fffffe80021f5ed0 cl_orb:cllwpwrapper+106 () fffffe80021f5ee0 unix:thread_start+8 () It appears you have SPARC clusters (based on the patch mentioned above), so this may not be the same problem. Once s10u4 was released, I reinstalled each node and failed to reproduce the problem. I didn''t try the SVM + ZFS combo. Rob This message posted from opensolaris.org