Displaying 14 results from an estimated 14 matches for "o2hb_write_timeout".
2006 Jul 28
3
Private Interconnect and self fencing
I have an OCFS2 filesystem on a coraid AOE device.
It mounts fine, but with heavy I/O the server self fences claiming a
write timeout:
(16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device
etherd/e0.1p1 after 12000 milliseconds
(16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
active regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
system by panicing
It is my understanding that OCFS is expect...
2011 Feb 28
2
ocfs2 crash with bugs reports (dlmmaster.c)
...sent versions are:
kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)
ocfs2-tolls 1.4.4-3 (from debian squeeze)
We didn't noticed any problems in logs untill last friday, when the whole
ocfs2 cluster crashed.
We know that it started with some problems on node 7 (esiprap01). It reported
o2hb_write_timeout error and it rebooted automatically.
Could you please explain what have happend with other nodes?
Some of them reported bug:
kernel BUG at
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241!
one of them (es1prap03 - node 4) reported:
kernel BUG at
/tmp/buildd...
2006 Apr 18
1
Self-fencing issues (RHEL4)
...:388 node 1
Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:384 Nodes in my
domain ("8BD4774D69C44FDC8FD8EC5E13EA9996"):
Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 0
Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 1
Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_write_timeout:164 ERROR: Heartbeat
write timeout to device sda5 after 30000 milliseconds
Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_stop_all_regions:1727 ERROR:
stopping heartbeat on all active regions.
Apr 18 15:56:43 rac2/rac2 Kernel panic - not syncing: ocfs2 is very
sorry to be fencing this system by panicing
A...
2006 Jun 09
1
RHEL 4 U2 / OCFS 1.2.1 weekly crash?
Hello,
I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2
1.2.1 RPMs. About once a week, one of the nodes crashes itself (self-
fencing) and I get a full vmcore on my netdump server. The netdump log
file shows the shared filesystem LUN (/dev/dm-6) did not respond within
12000ms. I have not changed the default heartbeat values
in /etc/sysconfig/o2cb. There was no other IO
2006 Mar 14
1
problems with ocfs2
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060314/b38f73eb/attachment.html
2010 Jan 18
1
Getting Closer (was: Fencing options)
One more follow on,
The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to
have netted the culrptit:
E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137
ERROR: Heartbeat write timeout to device dm-12 after 60000
milliseconds
E01-netconsole.log:Jan 18 09:45:10 E01
(10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all
active regions.
E01-netconsole.log:Jan 18 09:45:10 E01 ocfs2 is very sorry to be
fencing this system by restarting
d...
2006 Nov 03
2
Newbie questions -- is OCFS2 what I even want?
Dear Sirs and Madams,
I run a small visual effects production company, Hammerhead Productions.
We'd like to have an easily extensible inexpensive relatively
high-performance
storage network using open-source components. I was hoping that OCFS2
would be that system.
I have a half-dozen 2 TB fileservers I'd like the rest of the network to see
as a single 12 TB disk, with the aggregate
2009 Jun 24
3
Unexplained reboots in DRBD82 + OCFS2 setup
We're trying to setup a dual-primary DRBD environment, with a shared
disk with either OCFS2 or GFS. The environment is a Centos 5.3 with
DRBD82 (but also tried with DRBD83 from testing) .
Setting up a single primary disk and running bonnie++ on it works.
Setting up a dual-primary disk, only mounting it on one node (ext3) and
running bonnie++ works
When setting up ocfs2 on the /dev/drbd0
2009 Jun 24
3
Unexplained reboots in DRBD82 + OCFS2 setup
We're trying to setup a dual-primary DRBD environment, with a shared
disk with either OCFS2 or GFS. The environment is a Centos 5.3 with
DRBD82 (but also tried with DRBD83 from testing) .
Setting up a single primary disk and running bonnie++ on it works.
Setting up a dual-primary disk, only mounting it on one node (ext3) and
running bonnie++ works
When setting up ocfs2 on the /dev/drbd0
2007 Jul 29
1
6 node cluster with unexplained reboots
We just installed a new cluster with 6 HP DL380g5, dual single port Qlogic 24xx HBAs connected via two HP 4/16 Storageworks switches to a 3Par S400. We are using the 3Par recommended config for the Qlogic driver and device-mapper-multipath giving us 4 paths to the SAN. We do see some SCSI errors where DM-MP is failing a path after get a 0x2000 error from the SAN controller, but the path gets puts
2008 Jul 14
1
Node fence on RHEL4 machine running 1.2.8-2
...ERROR: status = -107
Jul 14 05:37:04 node3 (9283,2):dlm_wait_for_node_death:365
0E8DC7044BA147F68D1407509F9AF3F3: waiting 5000ms for notification of
death of node 0
Things went along like this until:
Jul 14 05:55:59 node1 Index 9: took 0 ms to do bio add page write
Jul 14 05:55:59 node1 (13,3):o2hb_write_timeout:269 ERROR: Heartbeat
write timeout to device sdh1 after 120000 milliseconds
Jul 14 05:55:59 node1 Index 3: took 0 ms to do allocating bios for read
Jul 14 05:55:59 node1 Index 4: took 0 ms to do bio alloc read
Jul 14 05:55:59 node1 Heartbeat thread (13) printing last 24 blocking
operations (cur...
2006 Jan 09
0
[PATCH 01/11] ocfs2: event-driven quorum
...eat.c linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c
--- linux-2.6.15-staging1/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:23:29.376721976 -0500
+++ linux-2.6.15-staging2/fs/ocfs2/cluster/heartbeat.c 2006-01-08 18:15:23.647564032 -0500
@@ -158,6 +158,7 @@ struct o2hb_bio_wait_ctxt {
static void o2hb_write_timeout(void *arg)
{
struct o2hb_region *reg = arg;
+ struct o2nm_node *node = o2nm_get_node_by_num(o2nm_this_node());
mlog(ML_ERROR, "Heartbeat write timeout to device %s after %u "
"milliseconds\n", reg->hr_dev_name,
@@ -588,6 +589,7 @@ static void o2hb_queue_node_eve...
2010 Oct 08
23
O2CB global heartbeat - hopefully final drop!
All,
This is hopefully the final drop of the patches for adding global heartbeat
to the o2cb stack.
The diff from the previous set is here:
http://oss.oracle.com/~smushran/global-hb-diff-2010-10-07
Implemented most of the suggestions provided by Joel and Wengang.
The most important one was to activate the feature only at the end,
Also, got mostly a clean run with checkpatch.pl.
Sunil
2008 Oct 22
2
Another node is heartbeating in our slot! errors with LUN removal/addition
Greetings,
Last night I manually unpresented and deleted a LUN (a SAN snapshot)
that was presented to one node in a four node RAC environment running
OCFS2 v1.4.1-1. The system then rebooted with the following error:
Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR:
Heartbeat write timeout to device dm-24 after 120000 milliseconds
Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_stop_all_regions:1873
ERROR: stopping heartbeat on all active regions.
I'm assuming that dm-24 was the LUN that was deleted. Looking back in
the syslog, I see many of th...