search for: o2hb_write_timeout

Displaying 14 results from an estimated 14 matches for "o2hb_write_timeout".

2006 Jul 28
3
Private Interconnect and self fencing
I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing It is my understanding that OCFS is expect...
2011 Feb 28
2
ocfs2 crash with bugs reports (dlmmaster.c)
...sent versions are: kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports) ocfs2-tolls 1.4.4-3 (from debian squeeze) We didn't noticed any problems in logs untill last friday, when the whole ocfs2 cluster crashed. We know that it started with some problems on node 7 (esiprap01). It reported o2hb_write_timeout error and it rebooted automatically. Could you please explain what have happend with other nodes? Some of them reported bug: kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241! one of them (es1prap03 - node 4) reported: kernel BUG at /tmp/buildd...
2006 Apr 18
1
Self-fencing issues (RHEL4)
...:388 node 1 Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:384 Nodes in my domain ("8BD4774D69C44FDC8FD8EC5E13EA9996"): Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 0 Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 1 Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device sda5 after 30000 milliseconds Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_stop_all_regions:1727 ERROR: stopping heartbeat on all active regions. Apr 18 15:56:43 rac2/rac2 Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing A...
2006 Jun 09
1
RHEL 4 U2 / OCFS 1.2.1 weekly crash?
Hello, I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2 1.2.1 RPMs. About once a week, one of the nodes crashes itself (self- fencing) and I get a full vmcore on my netdump server. The netdump log file shows the shared filesystem LUN (/dev/dm-6) did not respond within 12000ms. I have not changed the default heartbeat values in /etc/sysconfig/o2cb. There was no other IO
2006 Mar 14
1
problems with ocfs2
An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060314/b38f73eb/attachment.html
2010 Jan 18
1
Getting Closer (was: Fencing options)
One more follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions. E01-netconsole.log:Jan 18 09:45:10 E01 ocfs2 is very sorry to be fencing this system by restarting d...
2006 Nov 03
2
Newbie questions -- is OCFS2 what I even want?
Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate
2009 Jun 24
3
Unexplained reboots in DRBD82 + OCFS2 setup
We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0
2009 Jun 24
3
Unexplained reboots in DRBD82 + OCFS2 setup
We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0
2007 Jul 29
1
6 node cluster with unexplained reboots
We just installed a new cluster with 6 HP DL380g5, dual single port Qlogic 24xx HBAs connected via two HP 4/16 Storageworks switches to a 3Par S400. We are using the 3Par recommended config for the Qlogic driver and device-mapper-multipath giving us 4 paths to the SAN. We do see some SCSI errors where DM-MP is failing a path after get a 0x2000 error from the SAN controller, but the path gets puts
2008 Jul 14
1
Node fence on RHEL4 machine running 1.2.8-2
...ERROR: status = -107 Jul 14 05:37:04 node3 (9283,2):dlm_wait_for_node_death:365 0E8DC7044BA147F68D1407509F9AF3F3: waiting 5000ms for notification of death of node 0 Things went along like this until: Jul 14 05:55:59 node1 Index 9: took 0 ms to do bio add page write Jul 14 05:55:59 node1 (13,3):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device sdh1 after 120000 milliseconds Jul 14 05:55:59 node1 Index 3: took 0 ms to do allocating bios for read Jul 14 05:55:59 node1 Index 4: took 0 ms to do bio alloc read Jul 14 05:55:59 node1 Heartbeat thread (13) printing last 24 blocking operations (cur...
2006 Jan 09
0
[PATCH 01/11] ocfs2: event-driven quorum
...eat.c linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c --- linux-2.6.15-staging1/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:23:29.376721976 -0500 +++ linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:15:23.647564032 -0500 @@ -158,6 +158,7 @@ struct o2hb_bio_wait_ctxt { static void o2hb_write_timeout(void *arg) { struct o2hb_region *reg = arg; + struct o2nm_node *node = o2nm_get_node_by_num(o2nm_this_node()); mlog(ML_ERROR, "Heartbeat write timeout to device %s after %u " "milliseconds\n", reg->hr_dev_name, @@ -588,6 +589,7 @@ static void o2hb_queue_node_eve...
2010 Oct 08
23
O2CB global heartbeat - hopefully final drop!
All, This is hopefully the final drop of the patches for adding global heartbeat to the o2cb stack. The diff from the previous set is here: http://oss.oracle.com/~smushran/global-hb-diff-2010-10-07 Implemented most of the suggestions provided by Joel and Wengang. The most important one was to activate the feature only at the end, Also, got mostly a clean run with checkpatch.pl. Sunil
2008 Oct 22
2
Another node is heartbeating in our slot! errors with LUN removal/addition
Greetings, Last night I manually unpresented and deleted a LUN (a SAN snapshot) that was presented to one node in a four node RAC environment running OCFS2 v1.4.1-1. The system then rebooted with the following error: Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device dm-24 after 120000 milliseconds Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active regions. I'm assuming that dm-24 was the LUN that was deleted. Looking back in the syslog, I see many of th...