Sebastian Reitenbach
2007-Aug-23 04:27 UTC
[Ocfs2-users] Transport endpoint not connected after crash of one node
Hi, I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs: ocfs2console-1.2.3-0.7 ocfs2-tools-1.2.3-0.7 I have a two node ocfs2 cluster configured. One node died (manual reset), and the second started immediately to have problems on accessing the file system for the following reason from the logs: Transport endpoint not connected. a mounted.ocfs2 on the still living machine showed that both machines have the filesystems mounted. After a umount of all the filesystems, the second node still thought that it had mounted some of the ocfs2 partitions: ppsdb101:~ # mounted.ocfs2 -f Device FS Nodes /dev/sda1 ocfs2 ppsdb102 /dev/sdb1 ocfs2 ppsdb102 /dev/sdc1 ocfs2 ppsdb102 /dev/sdd1 ocfs2 ppsdb102 /dev/sde1 ocfs2 ppsdb102 /dev/sdf1 ocfs2 ppsdb102 /dev/sdg1 ocfs2 ppsdb102 /dev/sdh1 ocfs2 ppsdb102 /dev/sdi1 ocfs2 ppsdb102 /dev/sdj1 ocfs2 ppsdb102 /dev/sdk1 ocfs2 ppsdb102 /dev/sdl1 ocfs2 ppsdb102, ppsdb101 /dev/sdm1 ocfs2 ppsdb102 /dev/sdn1 ocfs2 ppsdb102 /dev/sdo1 ocfs2 ppsdb102 /dev/sdp1 ocfs2 ppsdb102, ppsdb101 /dev/sdq1 ocfs2 ppsdb102, ppsdb101 /dev/sdr1 ocfs2 ppsdb102, ppsdb101 /dev/sds1 ocfs2 ppsdb102, ppsdb101 /dev/sdt1 ocfs2 ppsdb102 /dev/sdu1 ocfs2 ppsdb102 in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the one that is still alive. An ordinary mount command shows that there are none of the above listed partitions mounted, but mounted.ocfs2 still thinks that some of them are mounted. o2cb configure was configured like this: Load O2CB driver on boot (y/n) [y]: Cluster to start on boot (Enter "none" to clear) [ppscluster]: Specify heartbeat dead threshold (>=7) [61]: Use user-space driven heartbeat? (y/n) [n]: Cluster keepalive delay (ms) [5000]: Cluster reconnect dealy (ms) [2000]: Cluster idle timeout (ms) [10000]: Writing O2CB configuration: OK O2CB cluster ppscluster already online Two questions: 1. shouldn't the still living machine recognize the dead of the other node after 61 seconds2. shouldn't mounted.ocfs2 show the same locally mounted ocfs2 partitions as mount -t ocfs2 does? kind regards Sebastian
Sunil Mushran
2007-Aug-24 14:01 UTC
[Ocfs2-users] Transport endpoint not connected after crash of one node
You could be encountering Novell bugzilla 296606. It is specific to SLES10 (and SP1). Novell owns the bug. Sebastian Reitenbach wrote:> Hi, > > I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs: > ocfs2console-1.2.3-0.7 > ocfs2-tools-1.2.3-0.7 > > I have a two node ocfs2 cluster configured. One node died (manual reset), > and the second started immediately to have problems on accessing the file > system for the following reason from the logs: Transport endpoint not > connected. > > a mounted.ocfs2 on the still living machine showed that both machines have > the filesystems mounted. After a umount of all the filesystems, the second > node still thought that it had mounted some of the ocfs2 partitions: > > > ppsdb101:~ # mounted.ocfs2 -f > Device FS Nodes > /dev/sda1 ocfs2 ppsdb102 > /dev/sdb1 ocfs2 ppsdb102 > /dev/sdc1 ocfs2 ppsdb102 > /dev/sdd1 ocfs2 ppsdb102 > /dev/sde1 ocfs2 ppsdb102 > /dev/sdf1 ocfs2 ppsdb102 > /dev/sdg1 ocfs2 ppsdb102 > /dev/sdh1 ocfs2 ppsdb102 > /dev/sdi1 ocfs2 ppsdb102 > /dev/sdj1 ocfs2 ppsdb102 > /dev/sdk1 ocfs2 ppsdb102 > /dev/sdl1 ocfs2 ppsdb102, ppsdb101 > /dev/sdm1 ocfs2 ppsdb102 > /dev/sdn1 ocfs2 ppsdb102 > /dev/sdo1 ocfs2 ppsdb102 > /dev/sdp1 ocfs2 ppsdb102, ppsdb101 > /dev/sdq1 ocfs2 ppsdb102, ppsdb101 > /dev/sdr1 ocfs2 ppsdb102, ppsdb101 > /dev/sds1 ocfs2 ppsdb102, ppsdb101 > /dev/sdt1 ocfs2 ppsdb102 > /dev/sdu1 ocfs2 ppsdb102 > > in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the one > that is still alive. An ordinary mount command shows that there are none of > the above listed partitions mounted, but mounted.ocfs2 still thinks that > some of them are mounted. > > o2cb configure was configured like this: > Load O2CB driver on boot (y/n) [y]: > Cluster to start on boot (Enter "none" to clear) [ppscluster]: > Specify heartbeat dead threshold (>=7) [61]: > Use user-space driven heartbeat? (y/n) [n]: > Cluster keepalive delay (ms) [5000]: > Cluster reconnect dealy (ms) [2000]: > Cluster idle timeout (ms) [10000]: > Writing O2CB configuration: OK > O2CB cluster ppscluster already online > > > Two questions: > 1. shouldn't the still living machine recognize the dead of the other node > after 61 seconds> 2. shouldn't mounted.ocfs2 show the same locally mounted ocfs2 partitions as > mount -t ocfs2 does? > > kind regards > Sebastian > > > > > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Apparently Analagous Threads
- CentOS 6.6 - reshape of RAID 6 is stucked
- LVM hatred, was Re: /boot on a separate partition?
- dlm timeouts and following errors -112
- (o2net, 6301, 0):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 60.0 seconds, giving up and returning errors.
- "device delete" kills contents