Sjon Stigter
2006-May-26 14:05 UTC
[Ocfs2-users] Another node is heartbeating in our slot!
All, We are having some problems getting OCFS2 to run, we are using kernel 2.6.15 with OCFS2 1.2.1. Compiling the OCFS2 sources went fine and all modules load perfectly. However, we can only mount the OCFS2 volume on one machine at a time, when we try to mount the volume on the 2 other machines we get an error stating that another node is heartbeating in our slot. When we mount the volume on the 2 other machines and look at the dmesg of the first machine which has the volume mounted nothing else appears, not even a message of the other nodes joining the cluster. The cluster.conf is the same on all 3 nodes: cluster: node_count = 3 name = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.27 number = 1 name = tilmysql1 cluster = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.28 number = 2 name = tilmysql2 cluster = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.29 number = 3 name = tilmysql3 cluster = ocfs2 Dmesg output: Mounting FS on node1 succeeds: OCFS2 1.2.1 Fri May 26 11:27:14 CEST 2006 (build bd2f25ba0af9677db3572e3ccd92f739) ocfs2_dlm: Nodes in domain ("38F7643CACA64C0A932E3B03419BBC62"): 1 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (8,17) on (node 1, slot 0) Mounting FS on node2 fails when node1 has FS mounted: (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (3257,0):o2net_connect_expired:1444 ERROR: no connection established with node 1 after 10 seconds, giving up and returning errors. (4157,1):dlm_request_join:786 ERROR: status = -107 (4157,1):dlm_try_to_join_domain:934 ERROR: status = -107 (4157,1):dlm_join_domain:1186 ERROR: status = -107 (4157,1):dlm_register_domain:1379 ERROR: status = -107 (4157,1):ocfs2_dlm_init:1996 ERROR: status = -107 (4157,1):ocfs2_mount_volume:1062 ERROR: status = -107 ocfs2: Unmounting device (8,17) on (node 2) Mounting FS on node3 fails when node1 has FS mounted: (4340,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4340,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (3363,0):o2net_connect_expired:1444 ERROR: no connection established with node 1 after 10 seconds, giving up and returning errors. (4338,0):dlm_request_join:786 ERROR: status = -107 (4338,0):dlm_try_to_join_domain:934 ERROR: status = -107 (4338,0):dlm_join_domain:1186 ERROR: status = -107 (4338,1):dlm_register_domain:1379 ERROR: status = -107 (4338,1):ocfs2_dlm_init:1996 ERROR: status = -107 (4338,1):ocfs2_mount_volume:1062 ERROR: status = -107 ocfs2: Unmounting device (8,17) on (node 3) Also the ocfs2-tools 1.2.1 fails to build on Debian Sarge which we are using, we checked the dependencies and have these in place: libglib2.0-dev (>= 2.2.3), libreadline5-dev, comerr-dev, uuid-dev, libblkid-dev (>= 1.36), debhelper (>= 3.0.5) Building ocfs2-tools fails with an error on building fsck.ocfs2: /usr/lib/libc_nonshared.a(elf-init.oS)(.gnu.linkonce.t.__i686.get_pc_thu nk.bx+0x0): In function `__i686.get_pc_thunk.bx': : multiple definition of `__i686.get_pc_thunk.bx' ../libocfs2/libocfs2.a(alloc.o)(.gnu.linkonce.t.__i686.get_pc_thunk.bx+0 x0): first defined here collect2: ld returned 1 exit status make[2]: *** [fsck.ocfs2] Error 1 make[2]: Leaving directory `/usr/src/ocfs2-tools-1.2.1/fsck.ocfs2' make[1]: *** [fsck.ocfs2] Error 2 make[1]: Leaving directory `/usr/src/ocfs2-tools-1.2.1' make: *** [build-stamp] Error 2 debuild: fatal error at line 1219: debian/rules build failed Because we cannot build the ocfs2-tools 1.2.1 we are currently using the debian packages of ocfs-tools which are version 1.1.5. Could the outdated ocfs2-tools be causing the 'another node is heartbeating in our slot' errors? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060526/36ff8a84/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: LOGOHCN1.GIF Type: image/gif Size: 2112 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060526/36ff8a84/LOGOHCN1-0001.gif
Sjon Stigter
2006-May-29 09:59 UTC
[Ocfs2-users] Another node is heartbeating in our slot!
All, We are having some problems getting OCFS2 to run, we are using kernel 2.6.15 with OCFS2 1.2.1. Compiling the OCFS2 sources went fine and all modules load perfectly. However, we can only mount the OCFS2 volume on one machine at a time, when we try to mount the volume on the 2 other machines we get an error stating that another node is heartbeating in our slot. When we mount the volume on the 2 other machines and look at the dmesg of the first machine which has the volume mounted nothing else appears, not even a message of the other nodes joining the cluster. The cluster.conf is the same on all 3 nodes: cluster: node_count = 3 name = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.27 number = 1 name = tilmysql1 cluster = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.28 number = 2 name = tilmysql2 cluster = ocfs2 node: ip_port = 7777 ip_address = 172.28.100.29 number = 3 name = tilmysql3 cluster = ocfs2 Dmesg output: Mounting FS on node1 succeeds: OCFS2 1.2.1 Fri May 26 11:27:14 CEST 2006 (build bd2f25ba0af9677db3572e3ccd92f739) ocfs2_dlm: Nodes in domain ("38F7643CACA64C0A932E3B03419BBC62"): 1 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (8,17) on (node 1, slot 0) Mounting FS on node2 fails when node1 has FS mounted: (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4159,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (3257,0):o2net_connect_expired:1444 ERROR: no connection established with node 1 after 10 seconds, giving up and returning errors. (4157,1):dlm_request_join:786 ERROR: status = -107 (4157,1):dlm_try_to_join_domain:934 ERROR: status = -107 (4157,1):dlm_join_domain:1186 ERROR: status = -107 (4157,1):dlm_register_domain:1379 ERROR: status = -107 (4157,1):ocfs2_dlm_init:1996 ERROR: status = -107 (4157,1):ocfs2_mount_volume:1062 ERROR: status = -107 ocfs2: Unmounting device (8,17) on (node 2) Mounting FS on node3 fails when node1 has FS mounted: (4340,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (4340,0):o2hb_do_disk_heartbeat:962 ERROR: Device "sdb1": another node is heartbeating in our slot! (3363,0):o2net_connect_expired:1444 ERROR: no connection established with node 1 after 10 seconds, giving up and returning errors. (4338,0):dlm_request_join:786 ERROR: status = -107 (4338,0):dlm_try_to_join_domain:934 ERROR: status = -107 (4338,0):dlm_join_domain:1186 ERROR: status = -107 (4338,1):dlm_register_domain:1379 ERROR: status = -107 (4338,1):ocfs2_dlm_init:1996 ERROR: status = -107 (4338,1):ocfs2_mount_volume:1062 ERROR: status = -107 ocfs2: Unmounting device (8,17) on (node 3) Also the ocfs2-tools 1.2.1 fails to build on Debian Sarge which we are using, we checked the dependencies and have these in place: libglib2.0-dev (>= 2.2.3), libreadline5-dev, comerr-dev, uuid-dev, libblkid-dev (>= 1.36), debhelper (>= 3.0.5) Building ocfs2-tools fails with an error on building fsck.ocfs2: /usr/lib/libc_nonshared.a(elf-init.oS)(.gnu.linkonce.t.__i686.get_pc_thu nk.bx+0x0): In function `__i686.get_pc_thunk.bx': : multiple definition of `__i686.get_pc_thunk.bx' ../libocfs2/libocfs2.a(alloc.o)(.gnu.linkonce.t.__i686.get_pc_thunk.bx+0 x0): first defined here collect2: ld returned 1 exit status make[2]: *** [fsck.ocfs2] Error 1 make[2]: Leaving directory `/usr/src/ocfs2-tools-1.2.1/fsck.ocfs2' make[1]: *** [fsck.ocfs2] Error 2 make[1]: Leaving directory `/usr/src/ocfs2-tools-1.2.1' make: *** [build-stamp] Error 2 debuild: fatal error at line 1219: debian/rules build failed Because we cannot build the ocfs2-tools 1.2.1 we are currently using the debian packages of ocfs-tools which are version 1.1.5. Could the outdated ocfs2-tools be causing the 'another node is heartbeating in our slot' errors? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060529/1d7c2081/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: LOGOHCN1.GIF Type: image/gif Size: 2112 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060529/1d7c2081/LOGOHCN1-0001.gif
Possibly Parallel Threads
- o2hb_do_disk_heartbeat:982:ERROR
- OCFS2 + iscsi: another node is heartbeating in our slot (over scst)
- Another node is heartbeating in our slot! errors with LUN removal/addition
- re: o2hb_do_disk_heartbeat:963 ERROR: Device "sdb1" another node is heartbeating in our slot!
- Ocfs2-users Digest, Vol 57, Issue 14