quanta
2012-Dec-22 03:04 UTC
[Ocfs2-users] Transport endpoint is not connected while mounting?
I have replaced a dead node that was running in dual-primary mode with OCFS2. All the steps work: `/proc/drbd` version: 8.3.13 (api:88/proto:86-96) GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbuild at builder10.centos.org, 2012-05-07 11:56:36 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----- ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 until I try to mount the volume: mount -t ocfs2 /dev/drbd1 /data/webroot/ mount.ocfs2: Transport endpoint is not connected while mounting /dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this error. `/var/log/kern.log` kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):dlm_try_to_join_domain:1210 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):dlm_join_domain:1488 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):dlm_register_domain:1754 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):ocfs2_dlm_init:2808 ERROR: status = -107 kernel: (mount.ocfs2,12037,1):ocfs2_mount_volume:1447 ERROR: status = -107 kernel: ocfs2: Unmounting device (147,1) on (node 1) I'm sure `/etc/ocfs2/cluster.conf` on the both node are identical: `/etc/ocfs2/cluster.conf` node: ip_port = 7777 ip_address = 192.168.3.145 number = 0 name = SVR233NTC-3145.localdomain cluster = cpc node: ip_port = 7777 ip_address = 192.168.2.93 number = 1 name = SVR022-293.localdomain cluster = cpc cluster: node_count = 2 name = cpc and they are connected fine: # nc -z 192.168.3.145 7777 Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded! but the O2CB heartbeat is not active on the new node (192.168.2.93): `/etc/init.d/o2cb status` Driver for "configfs": Loaded Filesystem "configfs": Mounted Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster cpc: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active Here're the results when running `tcpdump` on the node 0 while starting the `ocfs2` on the node 1: 1 0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0 2 0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223 TSecr=690432180 3 0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223 4 0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt [PSH, ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223 5 0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181 6 0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [RST, ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181 The `RST` flag is sent after every 6 packets. What other can I do to debug this case? PS: OCFS2 versions on the node 0: - ocfs2-tools-1.4.4-1.el5 - ocfs2-2.6.18-274.12.1.el5-1.4.7-1.el5 OCFS2 versions on the node 1: - ocfs2-tools-1.4.4-1.el5 - ocfs2-2.6.18-308.el5-1.4.7-1.el5 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121222/e2493dec/attachment.html