thr3ads.net - similar to: "2 node cluster, Heartbeat2, self-fencing"

Displaying 20 results from an estimated 10000 matches similar to: "2 node cluster, Heartbeat2, self-fencing"

2008 Jul 01

ocfs2 fencing problem

Hi, Sunil or Tao, I have a 4 nodes OCFS2 cluster running OCFS2 1.2.8 on SuSE 9 SP4. When I tried to do failover testing (shutting down one node), the whole cluster hung (I can not even login to any server in the cluster). I have to bring all of them up and then be able to use the system. What kind of behavior is it? Is it the fence of OCFS2? Below is my configuration. aopcer13:~ #

Good tutorial about using heartbeat2, ocfs2 and evms with xen 3.x

2007 Dec 01

Good tutorial about using heartbeat2, ocfs2 and evms with xen 3.x

Hi all Sombedody can points me to a good tutorial about using high availabilty clusters with xen using heratbeat2, ocfs2 and evms under rhel/centos, debian or sles?? I am doing various searches without a result ... (google shows me a lot of references, mailing lists, etc but not a good doc) Many thanks. -- CL Martinez carlopmart {at} gmail {d0t} com

cluster with 2 nodes - heartbeat problem fencing

2008 Mar 05

cluster with 2 nodes - heartbeat problem fencing

Hi to all, this is My first time on this mailinglist. I have a problem with Ocfs2 on Debian etch 4.0 I'd like when a node go down or freeze without unmount the ocfs2 partition the heartbeat not fence the server that work well ( kernel panic ). I'd like disable or heartbeat or fencing. So we can work also with only 1 node. Thanks

[PATCH] Call userspace script when self-fencing

2010 Dec 09

[PATCH] Call userspace script when self-fencing

Hi, According to comments in file fs/ocfs2/cluster/quorum.c:70 about the self-fencing operations : /* It should instead flip the file ?* system RO and call some userspace script. */ So, I tried to add it (but i did'nt find a way to flip the fs in RO). Here is a proposal for this functionnality, based on ocfs2-1.4.7. This patch add an entry 'fence_cmd' in /sys to specify an

disabling self-fencing

2010 Nov 02

disabling self-fencing

Hi all, on my nodes I have another cluster manager running that takes care of fencing the node, is it safe to disable ocfs2 self-fencing? Details: I'm using ocfs2 over drbd in master/master aka primary/primary mode. In case of loss of network connectivity I would like to disconnect the drbd device, invalidate it and unmount the filesystem but ocfs2 reboots the node.... Is it possible to

Private Interconnect and self fencing

2006 Jul 28

Private Interconnect and self fencing

I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this

Ocfs2 and debian

2007 Sep 04

Ocfs2 and debian

Hi. I'm pretty new to ocfs2 and clusters. I'm trying to make ocfs2 running over a drbd device. I know it's not the best solution but for now i must deal with this. I set up drbd and work perfectly. I set up ocfs and i'm not able to make it to work. /etc/init.d/o2cb status: Module "configfs": Loaded Filesystem "configfs": Mounted Module

Self-fencing issues (RHEL4)

2006 Apr 18

Self-fencing issues (RHEL4)

Hi. I'm running RHEL4 for my test system, Adaptec Firewire controllers, Maxtor One Touch III shared disk (see the details below), 100Mb/s dedicated interconnect. It panics with no load about each 20 minutes (error message from netconsole attached) Any clues? Yegor --- [root at rac1 ~]# cat /proc/fs/ocfs2/version OCFS2 1.2.0 Tue Mar 7 15:51:20 PST 2006 (build

OCFSv2 in a mail cluster environment

2007 Mar 13

OCFSv2 in a mail cluster environment

Hello, First, thanks to all the people who helped to license this software under the GPL. This is a very important piece of work for the Free Software in the Enterprise Market. Just a few newbie questions. We've been thinking about implmenting a mail cluster with a Fiber SAN (IBM DS-4000) and OCFSv2 as the storage backend. The information on this volume will essentially be Postfix Maildirs,

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

2009 Nov 17

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

By default, o2cb fences the box by calling emergency_restart(). While this scheme works well in production, it comes in the way during testing as it does not let the tester take stack/core dumps for analysis. This patch allows user to dynamically change the fence method to panic() by: # echo "panic" > /sys/kernel/config/cluster/<clustername>/fence_method Signed-off-by: Sunil

Fencing OCFS

2009 Oct 20

Fencing OCFS

I people, i install the ocfs in my virtual machine, with centos 5.3 and xen. But when i turn off the machine1, the ocfs start the fencing off the machine2. I read the doc in oracle.com, but could not solve the problem. Someone help? My conf and my package version. cluster.conf node: ip_port = 7777 ip_address = 192.168.1.35 number = 0 name = x1 cluster

Node fence on RHEL4 machine running 1.2.8-2

2008 Jul 14

Node fence on RHEL4 machine running 1.2.8-2

Hello, We have a four-node RHEL4 RAC cluster running OCFS2 version 1.2.8-2 and the 2.6.9-67.0.4hugemem kernel. The cluster has been really stable since we upgraded to 1.2.8-2 early this year, but this morning, one of the nodes fenced and rebooted itself, and I wonder if anyone could glance at the below remote syslogs and offer an opinion as to why. First, here's the output of

Getting Closer (was: Fencing options)

2010 Jan 18

Getting Closer (was: Fencing options)

One more follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions.

strange fencing behavior

2009 Sep 24

strange fencing behavior

I have 10 servers in a cluster running Debian Etch with 2.6.26-bpo.2 with a backport of ocfs2-tools-1.4.1-1 I'm using AoE to export the drives from a Debian Lenny server in the cluster. My problem is if I mount the ocfs2 partition on the server that is exporting it via AoE it fences the entire cluster. Looking at the logs exporting the ocfs2 partition doesn't give much information...

OCFS2 v1.4 hangs

2009 Jun 04

OCFS2 v1.4 hangs

I have four database servers in a high-availability, load-balancing configuration. Each machine has a mount to a common data source which is an OCFS2 v1.4 file-system. While working on three of the servers, I restarted the IP network and found after-wards the fourth machine hung. I could not reboot and could not unmount the ocfs2 partitions. I am pretty sure this was all caused by my taking down

Network 10 sec timeout setting?

2007 Feb 06

Network 10 sec timeout setting?

Hello! Hey didnt a setting for the 10 second network timeout get into the 2.6.20 kernel? if so how do we set this? I am getting OCFS2 1.3.3 (2201,0):o2net_connect_expired:1547 ERROR: no connection established with node 1 after 10.0 seconds, giving up and returning errors. (2458,0):dlm_request_join:802 ERROR: status = -107 (2458,0):dlm_try_to_join_domain:950 ERROR: status = -107

Failover testing problem and a heartbeat question

2010 May 26

Failover testing problem and a heartbeat question

We have a setup with 15 hosts fibre attached via a switch to a common SAN. Each host has a single fibre port, the SAN has two controllers each with two ports. The SAN is exposing four OCFS2 v1.4.2 volumes. While performing a failover test, we observed 8 hosts fence and 2 reboot _without_ fencing. The OCFS2 FAQ recommends a default disk heartbeat of 31 - 61 loops for multipath io users. Our initial

Question about increasing node slots

2007 Sep 30

Question about increasing node slots

We have a test 10gR2 RAC cluster using ocfs2 filesystems for the Clusterware files and the Database files. We need to increase the node slots to accomodate new RAC nodes. Is it true that we will need to umount these filesystems for the upgrade (i.e. Database and Clusterware also)? We are planning to use the following command format to perform the node slot increase: # tunefs.ocfs2 ?N 3

mount point is not unique among all nodes

2009 Feb 12

mount point is not unique among all nodes

Hi list, Here is a bug report on novell bugzilla (https://bugzilla.novell.com/show_bug.cgi?id=456280) that mount point inside node A can be removed from node B. The problem is, node B does not know an empty dir is be using as mount point on another node. Is there any solution to return -EBUSY when a dir is be using as mount point on another node ? Thanks in advance. -- Coly Li SuSE Labs

Preventing DomU corruption in case of Split-Brain of heartbeat

2008 Oct 17

Preventing DomU corruption in case of Split-Brain of heartbeat

Hi Xen-Users! We run an large HA XEN system based on heartbeat2. Storage base is an infiniband storage cluster exporting iSCSI devices to the frontend HA XEN Machines. The iSCSI devices are used as pysical devices for the domUs using the block-iscsi mechanism (by the way thanks for this cool script). Recently we had a split brain in our heartbeat system. This causes both of our XEN servers to

similar to: 2 node cluster, Heartbeat2, self-fencing