thr3ads.net - Ocfs2 users - [Ocfs2-users] How to force node [a] to consider node [b] dead? [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Karim Alkhayer

2009-Jan-26 17:42 UTC

[Ocfs2-users] How to force node [a] to consider node [b] dead?

Hi All,

 

We have  O2CB_HEARTBEAT_THRESHOLD set to 601 as the SAN gets overloaded
sometimes and hence causing the nodes to panic

 

This value has proven to be more stable than 31. However, there are
sometimes where one of the nodes, for instance node [b] crashes, for
whatever reason. While attempting to startup the troublesome node, auto
mount is enabled but doesn't succeed, "Transport endpoint is not
connected"
is usually displayed. 

 

My opinion is this: the mount doesn't succeed because node [a] still thinks
that node [b] is alive

 

We're talking about a restart that can take around 15 minutes, so basically,
the threshold is passed

 

I was wondering if there is a workaround to kick node [b] out of the cluster
so that it can join it again. What I've done so far, the incident happened
once - a month ago, is to restart the cluster services on both machines.
This was very expensive solution as all database instances had to go down

 

OCFS2 1.2.1, SLES9 SP3 2.6.5-7.257-default, RAC 10.1.0.5, 5 DBs

 

Thanks

Karim

  

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090126/4e4637d1/attachment.html

Sunil Mushran

2009-Jan-26 17:52 UTC

head link

[Ocfs2-users] How to force node [a] to consider node [b] dead?

You are running a 3 year old version of the fs. Please upgrade
to something more current. Like sles9 sp4 or sles10 sp1 that
bundles ocfs2 1.2.9, or sles10 sp2 that ships ocfs2 1.4.1.

Karim Alkhayer wrote:>
> Hi All,
>
> We have O2CB_HEARTBEAT_THRESHOLD set to 601 as the SAN gets overloaded 
> sometimes and hence causing the nodes to panic
>
> This value has proven to be more stable than 31. However, there are 
> sometimes where one of the nodes, for instance node [b] crashes, for 
> whatever reason. While attempting to startup the troublesome node, 
> auto mount is enabled but doesn?t succeed, ?Transport endpoint is not 
> connected? is usually displayed.
>
> My opinion is this: the mount doesn?t succeed because node [a] still 
> thinks that node [b] is alive
>
> We?re talking about a restart that can take around 15 minutes, so 
> basically, the threshold is passed
>
> I was wondering if there is a workaround to kick node [b] out of the 
> cluster so that it can join it again. What I?ve done so far, the 
> incident happened once - a month ago, is to restart the cluster 
> services on both machines. This was very expensive solution as all 
> database instances had to go down
>
> OCFS2 1.2.1, SLES9 SP3 2.6.5-7.257-default, RAC 10.1.0.5, 5 DBs
>
> Thanks
>
> Karim
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Ocfs2 users - Jan 2009 - How to force node [a] to consider node [b] dead?

[Ocfs2-users] How to force node [a] to consider node [b] dead?

[Ocfs2-users] How to force node [a] to consider node [b] dead?