thr3ads.net - Ocfs2 users - [Ocfs2-users] Shutdown to single user mode causes SysRq Reset [Aug 2009]

If this information is useful, please help other people find it:
Share via:

John McNulty

2009-Aug-13 13:44 UTC

[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

Hello,

I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN
with dual FC cards, dual switches and an HP EVA on the back end.? All
SAN disks are multipathed.? Installed software is:

Redhat 5.3
ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5
ocfs2-tools-1.4.2-1.el5
ocfs2console-1.4.2-1.el5
Oracle RAC 11g ASM
Oracle RAC 11g Clusterware
Oracle RAC 10g databases

OCFS2 isn't being used by RAC, we're using ASM for that, but OCFS2 is
used to provide a shared /usr/local, /home and /apps.

Yesterday I discovered something very unexpected. ? I shutdown node B
to single user mode, and immediately node A crashed. The only message
on the console was SysRq Resetting.? Node A then rebooted normally.
I then exit single user mode on node B to jump back up to run level 3
the system started up ok, but no sooner had I got the login prompt on
the console when it too crashed with SysRq Resetting.

I repeated the steps for a second time and it did exactly the same
thing all over again.? It appears to be repeatable.

The only thing that jumped out at me watching the consoles when this
was going on was that node B fails to stop the OCFS2 service on
shutdown, even going to far as to tell me after the fact with an
"eeeeeee" message.?? I assume that's bad !

There were no other console messages to give me a clue, so this is my
starting point.?? Anyone got any ideas?

Oh, there's one other thing that may or may not be relevant.?? On this
cluster, and another identical cluster, mounted.ocfs2 -f always shows
the node B cluster member as "Unknown" instead of the system name.? As
far as I'm aware I've followed the OCFS2 setup to the letter (it's
not
complicated) and "o2cb_ctl -It node" on either node shows both systems
with all the correct details.? Both nodes mount the cluster
filesystems ok and work just fine.

I've not had chance to try my single user test on the other identical
cluster yet as I've not been able to get a downtime window for it.? If
I do, then I will.

Rgds,

John

Sunil Mushran

2009-Aug-13 17:52 UTC

head link

[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

This is a feature. ;)

If you have mounted a volume on two or more nodes, the expectation
is that the private interconnect will always remain up. If you shutdown
the network on a node, the cluster stack will have to kill a node. It
does so inorder to prevent hangs in cluster operations.

In a 2 node setup, the higher node number will fence. I would imagine
Node A is the higher number. But I am not sure why Node B fenced on
restart. The "eeeeeee" message does not ring a bell.

If you want to get to the bottom of this, setup a netconsole server to
capture the logs.

Or, remember to shut down the cluster before switching to single
user mode.

Sunil

John McNulty wrote:> Hello,
>
> I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN
> with dual FC cards, dual switches and an HP EVA on the back end.  All
> SAN disks are multipathed.  Installed software is:
>
> Redhat 5.3
> ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5
> ocfs2-tools-1.4.2-1.el5
> ocfs2console-1.4.2-1.el5
> Oracle RAC 11g ASM
> Oracle RAC 11g Clusterware
> Oracle RAC 10g databases
>
> OCFS2 isn't being used by RAC, we're using ASM for that, but OCFS2
is
> used to provide a shared /usr/local, /home and /apps.
>
> Yesterday I discovered something very unexpected.   I shutdown node B
> to single user mode, and immediately node A crashed. The only message
> on the console was SysRq Resetting.  Node A then rebooted normally.
> I then exit single user mode on node B to jump back up to run level 3
> the system started up ok, but no sooner had I got the login prompt on
> the console when it too crashed with SysRq Resetting.
>
> I repeated the steps for a second time and it did exactly the same
> thing all over again.  It appears to be repeatable.
>
> The only thing that jumped out at me watching the consoles when this
> was going on was that node B fails to stop the OCFS2 service on
> shutdown, even going to far as to tell me after the fact with an
> "eeeeeee" message.   I assume that's bad !
>
> There were no other console messages to give me a clue, so this is my
> starting point.   Anyone got any ideas?
>
> Oh, there's one other thing that may or may not be relevant.   On this
> cluster, and another identical cluster, mounted.ocfs2 -f always shows
> the node B cluster member as "Unknown" instead of the system
name.  As
> far as I'm aware I've followed the OCFS2 setup to the letter
(it's not
> complicated) and "o2cb_ctl -It node" on either node shows both
systems
> with all the correct details.  Both nodes mount the cluster
> filesystems ok and work just fine.
>
> I've not had chance to try my single user test on the other identical
> cluster yet as I've not been able to get a downtime window for it.  If
> I do, then I will.
>
> Rgds,
>
> John
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Apparently Analagous Threads

Search for more reasonably related threads

Ocfs2 users - Aug 2009 - Shutdown to single user mode causes SysRq Reset

[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

Apparently Analagous Threads