thr3ads.net - Ocfs2 users - [Ocfs2-users] Nodes keep Fencing [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Byron Albert

2007-Nov-06 07:40 UTC

[Ocfs2-users] Nodes keep Fencing

Hello group,

 I have setup a 3 node cluster running SLES 10sp1. These nodes are all
running in vmware sharing a physical LUN from our fiber storage backend.
About twice a day the systems are all fencing and panic'ing locking up. The
message is took 58007ms to do waiting for write completion then fences. We
have tried upping all the time outs to the ones found on the site.

O2CB_HEARTBEAT_THRESHOLD = 31
O2CB_IDLE_TIMEOUT_MS = 30000
O2CB_KEEPALIVE_DELAY_MS = 2000
O2CB_RECONNECT_DELAY_MS = 2000

Today I added elevator=deadline to see if that fixes it. 
Here are the versions we are running

pbywebadmin1:~ # cat /proc/fs/ocfs2/version 
OCFS2 1.2.5-2-SLES-r3027 Tue Mar 27 16:33:19 EDT 2007 (build sles) 
pbywebadmin1:~ # uname -a 
Linux pbywebadmin1 2.6.16.53-0.16-default #1 Tue Oct 2 16:57:49 UTC 2007
i686 i686 i386 GNU/Linux 
pbywebadmin1:~ # rpm -qa | grep ocfs 
ocfs2-tools-1.2.3-0.7


Any suggestions on what is needed to get this to work without issues. There
is no heavy IO on this system the crashes happen with just random light IO.

Byron

Byron Albert
Prolifics
balbert@prolifics.com
Office: 646-201-4981 Mobile: 203-512-7456

2007 IBM Award Winner for Overall Technical Excellence
SOA. Building the Future into Your Business 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3086 bytes
Desc: not available
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20071106/e774ec68/smime.bin

Sunil Mushran

2007-Nov-07 17:22 UTC

head link

[Ocfs2-users] Nodes keep Fencing

qs is why is the io not completing within 60 secs? You could
try increasing the hb threshold to say 46 (90 secs).

It could be that that while that system may not be doing any
heavy io but some other system using the same storage is.
Has that been ruled out?

In the end, we have to first determine if it is a setup issue
or an environment issue. Once we have ruled those out can
we explore software bugs in fs and/or kernel.

Byron Albert wrote:> Hello group,
>
>  I have setup a 3 node cluster running SLES 10sp1. These nodes are all
> running in vmware sharing a physical LUN from our fiber storage backend.
> About twice a day the systems are all fencing and panic'ing locking up.
The
> message is took 58007ms to do waiting for write completion then fences. We
> have tried upping all the time outs to the ones found on the site.
>
> O2CB_HEARTBEAT_THRESHOLD = 31
> O2CB_IDLE_TIMEOUT_MS = 30000
> O2CB_KEEPALIVE_DELAY_MS = 2000
> O2CB_RECONNECT_DELAY_MS = 2000
>
> Today I added elevator=deadline to see if that fixes it. 
> Here are the versions we are running
>
> pbywebadmin1:~ # cat /proc/fs/ocfs2/version 
> OCFS2 1.2.5-2-SLES-r3027 Tue Mar 27 16:33:19 EDT 2007 (build sles) 
> pbywebadmin1:~ # uname -a 
> Linux pbywebadmin1 2.6.16.53-0.16-default #1 Tue Oct 2 16:57:49 UTC 2007
> i686 i686 i386 GNU/Linux 
> pbywebadmin1:~ # rpm -qa | grep ocfs 
> ocfs2-tools-1.2.3-0.7
>
>
> Any suggestions on what is needed to get this to work without issues. There
> is no heavy IO on this system the crashes happen with just random light IO.
>
> Byron
>
> Byron Albert
> Prolifics
> balbert@prolifics.com
> Office: 646-201-4981 Mobile: 203-512-7456
>
> 2007 IBM Award Winner for Overall Technical Excellence
> SOA. Building the Future into Your Business 
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Ocfs2 users - Nov 2007 - Nodes keep Fencing

[Ocfs2-users] Nodes keep Fencing

[Ocfs2-users] Nodes keep Fencing