Colin.Farley@ecarecenters.com
2006-Nov-12 16:25 UTC
[Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem
I'm testing a 2 node cluster in a VMWare ESX environment for use as a high availability FTP server to support a CRM application. Both nodes run Unbreakable 2.0 x86_64. They access a 300GB OCFS2 volume on an RDM LUN on an HP EVA. All disk connectivity is fine and haven't seen any problems there. The problem comes when doing some IP failover testing. The IP failover is done using UCARP so to test failover I tried unplugging one nodes virtual network cable to see what happens. If I unplug node 1 everything is fine, node 1 eventually panics and reboots while node 0 chugs along fine. The problem comes when unplugging node 0. When node 0 loses network connectivity it does not panic and eventually node 1 panics and reboots. Is there a reason why the lower node does not panic if it loses network connectivity? Heartbeat thresholds are the same on each node at 31 and both nodes are set to reboot on panic, node0 just never panics. All software installed are versions that come with Unbreakable 2.0. I didn't do the config on these boxes so the first thing I'm going to do on Tuesday when I work on this is rebuild both nodes from scratch but I figured I would ask first to see if it was an easy question for someone on the list to answer. Thanks, Colin Farley Network Administrator E-Care Contact Center Services Phone:(204) 940-6244 Fax:(204) 940-7394
Considering o2net only cares whether it is connected to the other node or not, it should not make a difference whether one unplugs node 0 or node 1. The result should be the same. Node 1 should fence in both cases. Do you see messages indicating that the node(s) have lost connectivity? If so, could you share them. It would be easiest if you could file a bug on oss.oracle.com/bugzilla with the messages file and listing the course of events... as in, unplugged cable on node 0 at time x, etc. Colin.Farley@ecarecenters.com wrote:> I'm testing a 2 node cluster in a VMWare ESX environment for use as a high > availability FTP server to support a CRM application. Both nodes run > Unbreakable 2.0 x86_64. They access a 300GB OCFS2 volume on an RDM LUN on > an HP EVA. All disk connectivity is fine and haven't seen any problems > there. The problem comes when doing some IP failover testing. The IP > failover is done using UCARP so to test failover I tried unplugging one > nodes virtual network cable to see what happens. > > If I unplug node 1 everything is fine, node 1 eventually panics and reboots > while node 0 chugs along fine. The problem comes when unplugging node 0. > When node 0 loses network connectivity it does not panic and eventually > node 1 panics and reboots. Is there a reason why the lower node does not > panic if it loses network connectivity? > > Heartbeat thresholds are the same on each node at 31 and both nodes are set > to reboot on panic, node0 just never panics. All software installed are > versions that come with Unbreakable 2.0. > > I didn't do the config on these boxes so the first thing I'm going to do on > Tuesday when I work on this is rebuild both nodes from scratch but I > figured I would ask first to see if it was an easy question for someone on > the list to answer. > > Thanks, > > Colin Farley > Network Administrator > E-Care Contact Center Services > Phone:(204) 940-6244 > Fax:(204) 940-7394 > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >