thr3ads.net - search: "o2hb_write

Displaying 14 results from an estimated 14 matches for "o2hb_write_timeout".

2006 Jul 28

Private Interconnect and self fencing

I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing It is my understanding that OCFS is expect...

ocfs2 crash with bugs reports (dlmmaster.c)

2011 Feb 28

ocfs2 crash with bugs reports (dlmmaster.c)

...sent versions are: kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports) ocfs2-tolls 1.4.4-3 (from debian squeeze) We didn't noticed any problems in logs untill last friday, when the whole ocfs2 cluster crashed. We know that it started with some problems on node 7 (esiprap01). It reported o2hb_write_timeout error and it rebooted automatically. Could you please explain what have happend with other nodes? Some of them reported bug: kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241! one of them (es1prap03 - node 4) reported: kernel BUG at /tmp/buildd...

Self-fencing issues (RHEL4)

2006 Apr 18

Self-fencing issues (RHEL4)

...:388 node 1 Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:384 Nodes in my domain ("8BD4774D69C44FDC8FD8EC5E13EA9996"): Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 0 Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 1 Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device sda5 after 30000 milliseconds Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_stop_all_regions:1727 ERROR: stopping heartbeat on all active regions. Apr 18 15:56:43 rac2/rac2 Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing A...

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

2006 Jun 09

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

Hello, I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2 1.2.1 RPMs. About once a week, one of the nodes crashes itself (self- fencing) and I get a full vmcore on my netdump server. The netdump log file shows the shared filesystem LUN (/dev/dm-6) did not respond within 12000ms. I have not changed the default heartbeat values in /etc/sysconfig/o2cb. There was no other IO

problems with ocfs2

2006 Mar 14

problems with ocfs2

An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060314/b38f73eb/attachment.html

Getting Closer (was: Fencing options)

2010 Jan 18

Getting Closer (was: Fencing options)

One more follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions. E01-netconsole.log:Jan 18 09:45:10 E01 ocfs2 is very sorry to be fencing this system by restarting d...

Newbie questions -- is OCFS2 what I even want?

2006 Nov 03

Newbie questions -- is OCFS2 what I even want?

Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

6 node cluster with unexplained reboots

2007 Jul 29

6 node cluster with unexplained reboots

We just installed a new cluster with 6 HP DL380g5, dual single port Qlogic 24xx HBAs connected via two HP 4/16 Storageworks switches to a 3Par S400. We are using the 3Par recommended config for the Qlogic driver and device-mapper-multipath giving us 4 paths to the SAN. We do see some SCSI errors where DM-MP is failing a path after get a 0x2000 error from the SAN controller, but the path gets puts

Node fence on RHEL4 machine running 1.2.8-2

2008 Jul 14

Node fence on RHEL4 machine running 1.2.8-2

...ERROR: status = -107 Jul 14 05:37:04 node3 (9283,2):dlm_wait_for_node_death:365 0E8DC7044BA147F68D1407509F9AF3F3: waiting 5000ms for notification of death of node 0 Things went along like this until: Jul 14 05:55:59 node1 Index 9: took 0 ms to do bio add page write Jul 14 05:55:59 node1 (13,3):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device sdh1 after 120000 milliseconds Jul 14 05:55:59 node1 Index 3: took 0 ms to do allocating bios for read Jul 14 05:55:59 node1 Index 4: took 0 ms to do bio alloc read Jul 14 05:55:59 node1 Heartbeat thread (13) printing last 24 blocking operations (cur...

[PATCH 01/11] ocfs2: event-driven quorum

2006 Jan 09

[PATCH 01/11] ocfs2: event-driven quorum

...eat.c linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c --- linux-2.6.15-staging1/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:23:29.376721976 -0500 +++ linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:15:23.647564032 -0500 @@ -158,6 +158,7 @@ struct o2hb_bio_wait_ctxt { static void o2hb_write_timeout(void *arg) { struct o2hb_region *reg = arg; + struct o2nm_node *node = o2nm_get_node_by_num(o2nm_this_node()); mlog(ML_ERROR, "Heartbeat write timeout to device %s after %u " "milliseconds\n", reg->hr_dev_name, @@ -588,6 +589,7 @@ static void o2hb_queue_node_eve...

O2CB global heartbeat - hopefully final drop!

2010 Oct 08

O2CB global heartbeat - hopefully final drop!

All, This is hopefully the final drop of the patches for adding global heartbeat to the o2cb stack. The diff from the previous set is here: http://oss.oracle.com/~smushran/global-hb-diff-2010-10-07 Implemented most of the suggestions provided by Joel and Wengang. The most important one was to activate the feature only at the end, Also, got mostly a clean run with checkpatch.pl. Sunil

Another node is heartbeating in our slot! errors with LUN removal/addition

2008 Oct 22

Another node is heartbeating in our slot! errors with LUN removal/addition

Greetings, Last night I manually unpresented and deleted a LUN (a SAN snapshot) that was presented to one node in a four node RAC environment running OCFS2 v1.4.1-1. The system then rebooted with the following error: Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device dm-24 after 120000 milliseconds Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active regions. I'm assuming that dm-24 was the LUN that was deleted. Looking back in the syslog, I see many of th...

search for: o2hb_write_timeout