thr3ads.net - Ocfs2 users - [Ocfs2-users] Transport endpoint not connected after crash of one node [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Sebastian Reitenbach

2007-Aug-23 04:27 UTC

[Ocfs2-users] Transport endpoint not connected after crash of one node

Hi,

I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs:
ocfs2console-1.2.3-0.7
ocfs2-tools-1.2.3-0.7

I have a two node ocfs2 cluster configured. One node died (manual reset), 
and the second started immediately to have problems on accessing the file 
system for the following reason from the logs: Transport endpoint not 
connected.

a mounted.ocfs2 on the still living machine showed that both machines have 
the filesystems mounted. After a umount of all the filesystems, the second 
node still thought that it had mounted some of the ocfs2 partitions:


ppsdb101:~ # mounted.ocfs2 -f
Device                FS     Nodes
/dev/sda1             ocfs2  ppsdb102
/dev/sdb1             ocfs2  ppsdb102
/dev/sdc1             ocfs2  ppsdb102
/dev/sdd1             ocfs2  ppsdb102
/dev/sde1             ocfs2  ppsdb102
/dev/sdf1             ocfs2  ppsdb102
/dev/sdg1             ocfs2  ppsdb102
/dev/sdh1             ocfs2  ppsdb102
/dev/sdi1             ocfs2  ppsdb102
/dev/sdj1             ocfs2  ppsdb102
/dev/sdk1             ocfs2  ppsdb102
/dev/sdl1             ocfs2  ppsdb102, ppsdb101
/dev/sdm1             ocfs2  ppsdb102
/dev/sdn1             ocfs2  ppsdb102
/dev/sdo1             ocfs2  ppsdb102
/dev/sdp1             ocfs2  ppsdb102, ppsdb101
/dev/sdq1             ocfs2  ppsdb102, ppsdb101
/dev/sdr1             ocfs2  ppsdb102, ppsdb101
/dev/sds1             ocfs2  ppsdb102, ppsdb101
/dev/sdt1             ocfs2  ppsdb102
/dev/sdu1             ocfs2  ppsdb102

in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the one 
that is still alive. An ordinary mount command shows that there are none of 
the above listed partitions mounted, but mounted.ocfs2 still thinks that 
some of them are mounted.

o2cb configure was configured like this:
Load O2CB driver on boot (y/n) [y]:
Cluster to start on boot (Enter "none" to clear) [ppscluster]:
Specify heartbeat dead threshold (>=7) [61]:
Use user-space driven heartbeat? (y/n) [n]:
Cluster keepalive delay (ms) [5000]:
Cluster reconnect dealy (ms) [2000]:
Cluster idle timeout (ms) [10000]:
Writing O2CB configuration: OK
O2CB cluster ppscluster already online


Two questions:
1. shouldn't the still living machine recognize the dead of the other node 
after 61 seconds2. shouldn't mounted.ocfs2 show the same locally mounted
ocfs2 partitions as
mount -t ocfs2 does?

kind regards
Sebastian

Sunil Mushran

2007-Aug-24 14:01 UTC

head link

[Ocfs2-users] Transport endpoint not connected after crash of one node

You could be encountering Novell bugzilla 296606. It is specific
to SLES10 (and SP1). Novell owns the bug.

Sebastian Reitenbach wrote:> Hi,
>
> I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs:
> ocfs2console-1.2.3-0.7
> ocfs2-tools-1.2.3-0.7
>
> I have a two node ocfs2 cluster configured. One node died (manual reset), 
> and the second started immediately to have problems on accessing the file 
> system for the following reason from the logs: Transport endpoint not 
> connected.
>
> a mounted.ocfs2 on the still living machine showed that both machines have 
> the filesystems mounted. After a umount of all the filesystems, the second 
> node still thought that it had mounted some of the ocfs2 partitions:
>
>
> ppsdb101:~ # mounted.ocfs2 -f
> Device                FS     Nodes
> /dev/sda1             ocfs2  ppsdb102
> /dev/sdb1             ocfs2  ppsdb102
> /dev/sdc1             ocfs2  ppsdb102
> /dev/sdd1             ocfs2  ppsdb102
> /dev/sde1             ocfs2  ppsdb102
> /dev/sdf1             ocfs2  ppsdb102
> /dev/sdg1             ocfs2  ppsdb102
> /dev/sdh1             ocfs2  ppsdb102
> /dev/sdi1             ocfs2  ppsdb102
> /dev/sdj1             ocfs2  ppsdb102
> /dev/sdk1             ocfs2  ppsdb102
> /dev/sdl1             ocfs2  ppsdb102, ppsdb101
> /dev/sdm1             ocfs2  ppsdb102
> /dev/sdn1             ocfs2  ppsdb102
> /dev/sdo1             ocfs2  ppsdb102
> /dev/sdp1             ocfs2  ppsdb102, ppsdb101
> /dev/sdq1             ocfs2  ppsdb102, ppsdb101
> /dev/sdr1             ocfs2  ppsdb102, ppsdb101
> /dev/sds1             ocfs2  ppsdb102, ppsdb101
> /dev/sdt1             ocfs2  ppsdb102
> /dev/sdu1             ocfs2  ppsdb102
>
> in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the
one
> that is still alive. An ordinary mount command shows that there are none of
> the above listed partitions mounted, but mounted.ocfs2 still thinks that 
> some of them are mounted.
>
> o2cb configure was configured like this:
> Load O2CB driver on boot (y/n) [y]:
> Cluster to start on boot (Enter "none" to clear) [ppscluster]:
> Specify heartbeat dead threshold (>=7) [61]:
> Use user-space driven heartbeat? (y/n) [n]:
> Cluster keepalive delay (ms) [5000]:
> Cluster reconnect dealy (ms) [2000]:
> Cluster idle timeout (ms) [10000]:
> Writing O2CB configuration: OK
> O2CB cluster ppscluster already online
>
>
> Two questions:
> 1. shouldn't the still living machine recognize the dead of the other
node
> after 61 seconds> 2. shouldn't mounted.ocfs2 show the same locally
mounted ocfs2 partitions as
> mount -t ocfs2 does?
>
> kind regards
> Sebastian
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Reasonably Related Threads

Search for more possibly parallel threads

Ocfs2 users - Aug 2007 - Transport endpoint not connected after crash of one node

[Ocfs2-users] Transport endpoint not connected after crash of one node

[Ocfs2-users] Transport endpoint not connected after crash of one node

Reasonably Related Threads