thr3ads.net - similar to: "Failover testing problem and a heartbeat question"

Displaying 20 results from an estimated 10000 matches similar to: "Failover testing problem and a heartbeat question"

2010 Jan 18

Getting Closer (was: Fencing options)

One more follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions.

6 node cluster with unexplained reboots

2007 Jul 29

6 node cluster with unexplained reboots

We just installed a new cluster with 6 HP DL380g5, dual single port Qlogic 24xx HBAs connected via two HP 4/16 Storageworks switches to a 3Par S400. We are using the 3Par recommended config for the Qlogic driver and device-mapper-multipath giving us 4 paths to the SAN. We do see some SCSI errors where DM-MP is failing a path after get a 0x2000 error from the SAN controller, but the path gets puts

Heartbeat Timeout Threshold

2009 Aug 08

Heartbeat Timeout Threshold

I've been using OCFS2 on a 3 way Centos 5.2 Xen cluster for a while now using it to share the VM disk images. In this way I can have live and transparent VM migration. I'd been having intermittent (every 2-3 weeks) incidents where a server would self fence. After configuring netconsole I managed to see that the fencing was due to a heartbeat threshold timeout so I have now increased

strange fencing behavior

2009 Sep 24

strange fencing behavior

I have 10 servers in a cluster running Debian Etch with 2.6.26-bpo.2 with a backport of ocfs2-tools-1.4.1-1 I'm using AoE to export the drives from a Debian Lenny server in the cluster. My problem is if I mount the ocfs2 partition on the server that is exporting it via AoE it fences the entire cluster. Looking at the logs exporting the ocfs2 partition doesn't give much information...

Network 10 sec timeout setting?

2007 Feb 06

Network 10 sec timeout setting?

Hello! Hey didnt a setting for the 10 second network timeout get into the 2.6.20 kernel? if so how do we set this? I am getting OCFS2 1.3.3 (2201,0):o2net_connect_expired:1547 ERROR: no connection established with node 1 after 10.0 seconds, giving up and returning errors. (2458,0):dlm_request_join:802 ERROR: status = -107 (2458,0):dlm_try_to_join_domain:950 ERROR: status = -107

Newbie questions -- is OCFS2 what I even want?

2006 Nov 03

Newbie questions -- is OCFS2 what I even want?

Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate

cluster with 2 nodes - heartbeat problem fencing

2008 Mar 05

cluster with 2 nodes - heartbeat problem fencing

Hi to all, this is My first time on this mailinglist. I have a problem with Ocfs2 on Debian etch 4.0 I'd like when a node go down or freeze without unmount the ocfs2 partition the heartbeat not fence the server that work well ( kernel panic ). I'd like disable or heartbeat or fencing. So we can work also with only 1 node. Thanks

ocfs2 fencing problem

2008 Jul 01

ocfs2 fencing problem

Hi, Sunil or Tao, I have a 4 nodes OCFS2 cluster running OCFS2 1.2.8 on SuSE 9 SP4. When I tried to do failover testing (shutting down one node), the whole cluster hung (I can not even login to any server in the cluster). I have to bring all of them up and then be able to use the system. What kind of behavior is it? Is it the fence of OCFS2? Below is my configuration. aopcer13:~ #

Cannot set heartbeat dead threshold

2009 Nov 13

Cannot set heartbeat dead threshold

Hi I have: SLES 10 SP2 (2.6.16.60-0.21-smp) ocfs2-tools-1.4.0-0.3 ocfs2console-1.4.0-0.3 and I can't change "heartbeat dead threshold" value. Content of /etc/sysconfig/o2cb: # O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=ocfs2 # O2CB_HEARTBEAT_THRESHOLD:

Shutdown to single user mode causes SysRq Reset

2009 Aug 13

Shutdown to single user mode causes SysRq Reset

Hello, I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN with dual FC cards, dual switches and an HP EVA on the back end.? All SAN disks are multipathed.? Installed software is: Redhat 5.3 ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5 ocfs2-tools-1.4.2-1.el5 ocfs2console-1.4.2-1.el5 Oracle RAC 11g ASM Oracle RAC 11g Clusterware Oracle RAC 10g databases OCFS2 isn't being used by

Error message whil booting system

2009 Jul 29

Error message whil booting system

Hi, When system booting getting error message "modprobe: FATAL: Module ocfs2_stackglue not found" in message. Some nodes reboot without any error message. ------------------------------------------------- ul 27 10:02:19 alf3 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Jul 27 10:02:19 alf3 kernel: Netfilter messages via NETLINK v0.30. Jul 27 10:02:19 alf3 kernel:

node eviction

2011 Mar 04

node eviction

Hello... I wonder if someone have had similar problem like this... a node evicts almost in a weekly basis and I have not found the root cause yet.... Mar 2 10:20:57 xirisoas3 kernel: ocfs2_dlm: Node 1 joins domain 129859624F7042EAB9829B18CA65FC88 Mar 2 10:20:57 xirisoas3 kernel: ocfs2_dlm: Nodes in domain ("129859624F7042EAB9829B18CA65FC88"): 1 2 3 4 Mar 3 16:18:02 xirisoas3 kernel:

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted".

2008 Sep 10

mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted".

Hi, I am trying to configure a two node cluster on SLES10SP2 using user level heartbeat. Here is my configuration. ocfs2-tools-1.4.0-0.3 **user level heartbeat** -> lsmod | grep ocfs ocfs2_user_heartbeat 20992 1 ocfs2_dlmfs 37776 1 ocfs2_dlm 204456 1 ocfs2_dlmfs ocfs2_nodemanager 223384 6 ocfs2_user_heartbeat,ocfs2_dlmfs,ocfs2_dlm configfs 44700 3 ocfs2_user_heartbeat,ocfs2_nodemanager

problem mounting ocfs2: heartbeat

2005 Jul 12

problem mounting ocfs2: heartbeat

When attempting to mount the OCFS2 file system I'm getting the following error message: ocfs2_hb_ctl: Internal logic failure while starting heartbeat mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted" I followed the steps given in the users_guide: modprobe ocfs2_dlmfs mount -t configfs none /config mount -t ocfs2_dlmfs none /dlm o2cb_ctl

ocfs2 <-> 10G (10.2.01) Clusterware

2005 Sep 23

ocfs2 <-> 10G (10.2.01) Clusterware

RHEL 4 (CENT OS) Am I waisting my time trying to get the 10G Clusterware installer to use OCFS2 volumes for the voting and OCR disks ? The ocfs2 setup seems happy on both nodes but the 10G installer says the location entered for the oracle cluster registry (OCR) is not shared across all the nodes in the cluster Do the volumes need to mounted ? I did with no change . [root@green rc5.d]#

AW: ocfs2_search_chain: Group Descriptor has bad signature

2006 Aug 01

AW: ocfs2_search_chain: Group Descriptor has bad signature

I'm using ocfs2 and all modules from Suse (SLES9), no self compilations. Here are the details: * 32-bit machine (writing to ocfs2 partition/LUN and where the corruption was reported): Kernel: 2.6.5-7.257-bigsmp #1 SMP i686 i386 GNU/Linux OCFS2 rpms: ocfs2console-1.2.1-4.2 ocfs2-tools-1.2.1-4.2 o2cb_ctl -V: o2cb_ctl version 1.2.1 /etc/init.d/o2cb status: Module "configfs":

Unable to access cluster service

2005 Oct 12

Unable to access cluster service

hello, I'm running Ubuntu Breezy with the OCFS2 modules in the standard kernel. I installed ocfs2console and ocfs2-tools I've formatted a partition with ocfs2. But I can't add any node or mount the device(with the ocfs2console). because I get a "Unable to access cluster service" I can't find the cause nor the solution to this. root@lenaeja:~# /etc/init.d/o2cb status

Self-fencing issues (RHEL4)

2006 Apr 18

Self-fencing issues (RHEL4)

Hi. I'm running RHEL4 for my test system, Adaptec Firewire controllers, Maxtor One Touch III shared disk (see the details below), 100Mb/s dedicated interconnect. It panics with no load about each 20 minutes (error message from netconsole attached) Any clues? Yegor --- [root at rac1 ~]# cat /proc/fs/ocfs2/version OCFS2 1.2.0 Tue Mar 7 15:51:20 PST 2006 (build

similar to: Failover testing problem and a heartbeat question