thr3ads.net - Ocfs2 users - [Ocfs2-users] Catatonic nodes under SLES10 [Apr 2007]

If this information is useful, please help other people find it:
Share via:

David Miller

2007-Apr-02 09:01 UTC

[Ocfs2-users] Catatonic nodes under SLES10

Good afternoon all;

I'm planning on implementing a shared storage solution for a primary and 
backup oracle server in the near future.

We can't afford RAC, and we don't have performance or growth issues; we 
just want another system to be able to start up and run if the primary 
fails.


Both servers will be connected to a dual-host external RAID system.  
I've setup ocfs2 on a couple of test systems and everything appears to 
work fine.

Until, that is, one of the systems loses network connectivity.

When the systems can't talk to each other anymore, but the disk 
heartbeat is still alive, the high numbered node goes catatonic.  Under 
SLES 9 it fenced itself off with a kernel panic; under 10 it simply 
stops responding to network or console.  A power cycling is required to 
bring it back up.

The desired behavior would be for the higher numbered node to lose 
access to the ocfs2 file system(s).  I don't really care whether it 
would simply timeout ala stale NFS mounts, or immediately error like 
access to non-existent files.


I'm running the latest SuSE packaged version of the ocfs-tools package:

saltlake:/proc/fs # cat /proc/fs/ocfs2/version
OCFS2 1.2.1-SLES Tue Apr 25 14:46:36 PDT 2006 (build sles)
saltlake:/proc/fs #

I'm using the stock 10.0 release kernel:

saltlake:/proc/fs # uname -a
Linux saltlake 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 
i686 i386 GNU/Linux
saltlake:/proc/fs #





Is there a solution to this?  Is this expected behavior?

Thanks,

--- David

Sunil Mushran

2007-Apr-02 14:05 UTC

head link

[Ocfs2-users] Catatonic nodes under SLES10

In ocfs2, the default network timeouts are too low. The patch fix
to make this timeout configurable is available in sles10 sp1.

David Miller wrote:> Good afternoon all;
>
> I'm planning on implementing a shared storage solution for a primary 
> and backup oracle server in the near future.
>
> We can't afford RAC, and we don't have performance or growth
issues;
> we just want another system to be able to start up and run if the 
> primary fails.
>
>
> Both servers will be connected to a dual-host external RAID system.  
> I've setup ocfs2 on a couple of test systems and everything appears to 
> work fine.
>
> Until, that is, one of the systems loses network connectivity.
>
> When the systems can't talk to each other anymore, but the disk 
> heartbeat is still alive, the high numbered node goes catatonic.  
> Under SLES 9 it fenced itself off with a kernel panic; under 10 it 
> simply stops responding to network or console.  A power cycling is 
> required to bring it back up.
>
> The desired behavior would be for the higher numbered node to lose 
> access to the ocfs2 file system(s).  I don't really care whether it 
> would simply timeout ala stale NFS mounts, or immediately error like 
> access to non-existent files.
>
>
> I'm running the latest SuSE packaged version of the ocfs-tools package:
>
> saltlake:/proc/fs # cat /proc/fs/ocfs2/version
> OCFS2 1.2.1-SLES Tue Apr 25 14:46:36 PDT 2006 (build sles)
> saltlake:/proc/fs #
>
> I'm using the stock 10.0 release kernel:
>
> saltlake:/proc/fs # uname -a
> Linux saltlake 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 
> i686 i686 i386 GNU/Linux
> saltlake:/proc/fs #
>
>
>
>
>
> Is there a solution to this?  Is this expected behavior?
>
> Thanks,
>
> --- David
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Alexei_Roudnev

2007-Apr-09 12:39 UTC

head link

[Ocfs2-users] Catatonic nodes under SLES10

Did you checked

 /proc/sys/kernel/panic  /proc/sys/kernel/panic_on_oops

system variables?

----- Original Message ----- 
From: "David Miller" <syslog@d.sparks.net>
To: <ocfs2-users@oss.oracle.com>
Sent: Monday, April 02, 2007 9:01 AM
Subject: [Ocfs2-users] Catatonic nodes under SLES10

> Good afternoon all;
> 
> I'm planning on implementing a shared storage solution for a primary
and
> backup oracle server in the near future.
> 
> We can't afford RAC, and we don't have performance or growth
issues; we
> just want another system to be able to start up and run if the primary 
> fails.
> 
> 
> Both servers will be connected to a dual-host external RAID system.  
> I've setup ocfs2 on a couple of test systems and everything appears to 
> work fine.
> 
> Until, that is, one of the systems loses network connectivity.
> 
> When the systems can't talk to each other anymore, but the disk 
> heartbeat is still alive, the high numbered node goes catatonic.  Under 
> SLES 9 it fenced itself off with a kernel panic; under 10 it simply 
> stops responding to network or console.  A power cycling is required to 
> bring it back up.
> 
> The desired behavior would be for the higher numbered node to lose 
> access to the ocfs2 file system(s).  I don't really care whether it 
> would simply timeout ala stale NFS mounts, or immediately error like 
> access to non-existent files.
> 
> 
> I'm running the latest SuSE packaged version of the ocfs-tools package:
> 
> saltlake:/proc/fs # cat /proc/fs/ocfs2/version
> OCFS2 1.2.1-SLES Tue Apr 25 14:46:36 PDT 2006 (build sles)
> saltlake:/proc/fs #
> 
> I'm using the stock 10.0 release kernel:
> 
> saltlake:/proc/fs # uname -a
> Linux saltlake 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 
> i686 i386 GNU/Linux
> saltlake:/proc/fs #
> 
> 
> 
> 
> 
> Is there a solution to this?  Is this expected behavior?
> 
> Thanks,
> 
> --- David
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Ocfs2 users - Apr 2007 - Catatonic nodes under SLES10

[Ocfs2-users] Catatonic nodes under SLES10

[Ocfs2-users] Catatonic nodes under SLES10

[Ocfs2-users] Catatonic nodes under SLES10