thr3ads.net - search: "o2hb_stop_all

Displaying 17 results from an estimated 17 matches for "o2hb_stop_all_regions".

2006 Jul 28

Private Interconnect and self fencing

I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing It is my understanding that OCFS is expecting that the only heartbeat available to be on disk the same disk that I am writing to? Is there any way like wi...

[RFC: 2.6 patch] fs/ocfs2/: remove unused exports

2006 Apr 14

[RFC: 2.6 patch] fs/ocfs2/: remove unused exports

This patch removes the following unused EXPORT_SYMBOL_GPL's: - cluster/heartbeat.c: o2hb_check_node_heartbeating_from_callback - cluster/heartbeat.c: o2hb_stop_all_regions - cluster/nodemanager.c: o2nm_get_node_by_num - cluster/nodemanager.c: o2nm_configured_node_map - cluster/nodemanager.c: o2nm_get_node_by_ip - cluster/nodemanager.c: o2nm_node_put - cluster/nodemanager.c: o2nm_node_get - dlm/dlmmaster.c: dlm_migrate_lockres Signed-off-by: Adrian Bunk <bunk at s...

[PATCH 14/14] ocfs2: include disk heartbeat in ocfs2_nodemanager to avoid userspace changes

2006 Feb 21

[PATCH 14/14] ocfs2: include disk heartbeat in ocfs2_nodemanager to avoid userspace changes

...-2.6.16-rc4.ocfs2-staging2/fs/ocfs2/cluster/disk_heartbeat.c --- linux-2.6.16-rc4.ocfs2-staging1/fs/ocfs2/cluster/disk_heartbeat.c 2006-02-21 11:44:46.000000000 -0500 +++ linux-2.6.16-rc4.ocfs2-staging2/fs/ocfs2/cluster/disk_heartbeat.c 2006-02-21 11:44:53.000000000 -0500 @@ -1509,7 +1509,7 @@ void o2hb_stop_all_regions(void) spin_unlock(&o2hb_live_lock); } -static int __init o2hb_disk_heartbeat_init(void) +int o2hb_disk_heartbeat_init(void) { int i; @@ -1520,14 +1520,10 @@ static int __init o2hb_disk_heartbeat_in return o2hb_register_heartbeat_group(&disk_heartbeat_group); } -static void __...

2 Node cluster crashing

2006 Jul 10

2 Node cluster crashing

...r connected to node rac1.globoforce.com at 198.87.235.244:7777 Jul 7 14:56:42 rac2 kernel: (10201,0):o2net_check_quorum:1468 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Jul 7 14:56:42 rac2 kernel: (10201,0):o2hb_stop_all_regions:1589 ERROR: stopping heartbeat on all active regions. Jul 7 14:56:42 rac2 kernel: Kernel panic: ocfs2 is very sorry to be fencing this system by panicing I opened up an SR with Oracle and they recommended that we upgrade to SLES 9 SP3 because they don't support the OCFS version that we are...

Self-fencing issues (RHEL4)

2006 Apr 18

Self-fencing issues (RHEL4)

...96"): Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 0 Apr 18 15:54:51 rac1/rac1 (2858,1):__dlm_print_nodes:388 node 1 Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device sda5 after 30000 milliseconds Apr 18 15:56:43 rac2/rac2 (3,0):o2hb_stop_all_regions:1727 ERROR: stopping heartbeat on all active regions. Apr 18 15:56:43 rac2/rac2 Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing Apr 18 15:56:43 rac2/rac2 Apr 18 15:56:45 rac1/rac1 (2903,0):o2net_set_nn_state:411 no longer connected to node rac2 (num 1) at 10...

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

2006 Jun 09

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

Hello, I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2 1.2.1 RPMs. About once a week, one of the nodes crashes itself (self- fencing) and I get a full vmcore on my netdump server. The netdump log file shows the shared filesystem LUN (/dev/dm-6) did not respond within 12000ms. I have not changed the default heartbeat values in /etc/sysconfig/o2cb. There was no other IO

another fencing question

2010 Jan 14

another fencing question

...ode 0 after 35.0 seconds, giving up and returning errors. Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2quo_make_decision:146 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2hb_stop_all_regions:1967 ERROR: stopping heartbeat on all active regions. I'm sure there are no network connectivity problem but it is possible that there are heavy IO loads, is this the intended behaviour? Why under heavy load the loaded node is fenced? I'm using ocfs2-1.4.4 on rhel5 kernel-2.6.18-164.6....

problems with ocfs2

2006 Mar 14

problems with ocfs2

An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060314/b38f73eb/attachment.html

Node crashed after remove a path

2006 May 18

Node crashed after remove a path

...7 (6360,0):dlm_wait_for_node_death:285 EDB955CBD81B44C78CD9258B99F91E4C: waiting 5000ms for notification of death of node 0 (6,0):o2quo_make_decision:143 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 (6,0):o2hb_stop_all_regions:1727 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing ------------[ cut here ]------------ kernel BUG at kernel/panic.c:74! invalid operand: 0000 [#1] SMP Modules linked in: nfs lockd ocfs2(U) debugfs(U) md5 i...

Getting Closer (was: Fencing options)

2010 Jan 18

Getting Closer (was: Fencing options)

...re follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions. E01-netconsole.log:Jan 18 09:45:10 E01 ocfs2 is very sorry to be fencing this system by restarting dm-12 maps to my evms volume... iostat for dm-12 doesn't indicate that it's overly taxed. Can we get some ideas from the info provided...

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

2009 Nov 17

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

...itmap[BITS_TO_LONGS(O2NM_MAX_NODES)]; diff --git a/fs/ocfs2/cluster/quorum.c b/fs/ocfs2/cluster/quorum.c index bbacf7d..cc6ed4e 100644 --- a/fs/ocfs2/cluster/quorum.c +++ b/fs/ocfs2/cluster/quorum.c @@ -74,8 +74,18 @@ static void o2quo_fence_self(void) * threads can still schedule, etc, etc */ o2hb_stop_all_regions(); - printk("ocfs2 is very sorry to be fencing this system by restarting\n"); - emergency_restart(); + switch (o2nm_single_cluster->cl_fence_method) { + case O2NM_FENCE_PANIC: + panic("*** ocfs2 is very sorry to be fencing this system by " + "panicing ***\n&quo...

Newbie questions -- is OCFS2 what I even want?

2006 Nov 03

Newbie questions -- is OCFS2 what I even want?

Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

[PATCH 01/11] ocfs2: event-driven quorum

2006 Jan 09

[PATCH 01/11] ocfs2: event-driven quorum

..._CONN_DOWN_CB, /* When a TCP connection fails */ + O2HB_CONN_UP_CB, /* When a TCP connection is made */ O2HB_NUM_CB }; @@ -78,5 +80,8 @@ int o2hb_check_node_heartbeating(u8 node int o2hb_check_node_heartbeating_from_callback(u8 node_num); int o2hb_check_local_node_heartbeating(void); void o2hb_stop_all_regions(void); +void o2hb_notify(enum o2hb_callback_type type, struct o2nm_node *node, + int node_num); + #endif /* O2CLUSTER_HEARTBEAT_H */ diff -ruNpX dontdiff linux-2.6.15-staging1/fs/ocfs2/cluster/nodemanager.c linux-2.6.15-staging2/fs/ocfs2/cluster/nodemanager.c --- linux-2.6.15-sta...

O2CB global heartbeat - hopefully final drop!

2010 Oct 08

O2CB global heartbeat - hopefully final drop!

All, This is hopefully the final drop of the patches for adding global heartbeat to the o2cb stack. The diff from the previous set is here: http://oss.oracle.com/~smushran/global-hb-diff-2010-10-07 Implemented most of the suggestions provided by Joel and Wengang. The most important one was to activate the feature only at the end, Also, got mostly a clean run with checkpatch.pl. Sunil

Another node is heartbeating in our slot! errors with LUN removal/addition

2008 Oct 22

Another node is heartbeating in our slot! errors with LUN removal/addition

...o one node in a four node RAC environment running OCFS2 v1.4.1-1. The system then rebooted with the following error: Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device dm-24 after 120000 milliseconds Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active regions. I'm assuming that dm-24 was the LUN that was deleted. Looking back in the syslog, I see many of these errors since the time the snapshot was taken until the reboot: Oct 21 16:42:54 ausracdb03 kernel: (6624,2):o2hb_do_disk_heartbeat:770 E...

search for: o2hb_stop_all_regions