Hi, Maybe someone could elaborate on these re-occuring ocfs2 errors that always results in a reboot of 1 or more systems. Our setup: 3 node cluster Ocfs2 v. 1.2.1 OpenSuse 10.1 SAN storage uses Iscsi for disk access. Cluster settings: # O2CB_ENABELED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=ocfs2 # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=60 Kernel line parameters: elevator=deadline panic=5 I have used the "deadline" or not testing to see if this will help. The messages we receive are simply this: Node 0 Nov 4 10:54:10 atl02010305 kernel: o2net: connection to node atl02010310 (num 0) at 192.168.3.110:7777 has been idle for 10 seconds, shutting it down. Nov 4 10:54:10 atl02010305 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1162655640.698739 now 1162655650.695937 dr 1162655640.698734 adv 1162655640.698739:1162655640.698739 func (ca3835ec:504) 1162654980.779007:1162654980.779011) Nov 4 10:54:10 atl02010305 kernel: o2net: no longer connected to node atl02010310 (num 0) at 192.168.3.110:7777 And the complimentary Node 1 Nov 4 10:54:11 atl02010310 kernel: o2net: connection to node atl02010305 (num 1) at 192.168.3.105:7777 has been idle for 10 seconds, shutting it down. Nov 4 10:54:11 atl02010310 kernel: (32479,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1162655640.698521 now 1162655650.701661 dr 1 162655650.695829 adv 1162655640.698525:1162655640.698525 func (ca3835ec:505) 1162654980.778881:1162654980.778886) Nov 4 10:54:11 atl02010310 kernel: o2net: no longer connected to node atl02010305 (num 1) at 192.168.3.105:7777 This showed up shortly after and repeated for hours: Nov 4 11:00:00 atl02010310 kernel: (32540,1):dlm_send_remote_convert_request:398 ERROR: status = -107 Nov 4 11:00:00 atl02010310 kernel: (32540,1):dlm_wait_for_node_death:371 32E007178FA24E87B45ECDDE6F7D5D52: waiting 5000ms for notification of death of node 1 Nov 4 11:00:04 atl02010310 sshd[5242]: Accepted publickey for nagios from 192.168.3.102 port 44292 ssh2 Node 3 saw nothing. So I wonder why neither node rebooted from a kernel panic? Or what happened, in general. Weren't they supposed to fence etc..? Randy Ramsdell