thr3ads.net - similar to: "problems with ocfs2"

Displaying 20 results from an estimated 200 matches similar to: "problems with ocfs2"

2006 Apr 18

Self-fencing issues (RHEL4)

Hi. I'm running RHEL4 for my test system, Adaptec Firewire controllers, Maxtor One Touch III shared disk (see the details below), 100Mb/s dedicated interconnect. It panics with no load about each 20 minutes (error message from netconsole attached) Any clues? Yegor --- [root at rac1 ~]# cat /proc/fs/ocfs2/version OCFS2 1.2.0 Tue Mar 7 15:51:20 PST 2006 (build

2 Node cluster crashing

2006 Jul 10

2 Node cluster crashing

Hi, We have a two node cluster running SLES 9 SP2 connecting directly to an EMC CX300 for storage. We are using OCFS(OCFS2 DLM 0.99.15-SLES) for the voting disk etc, and ASM for data files. The system has been running until last Friday when the whole cluster went down with the following error messages in the /var/log/messages files : rac1: Jul 7 14:56:23 rac1 kernel:

Private Interconnect and self fencing

2006 Jul 28

Private Interconnect and self fencing

I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this

[PATCH 01/11] ocfs2: event-driven quorum

2006 Jan 09

[PATCH 01/11] ocfs2: event-driven quorum

This patch separates o2net and o2quo from knowing about one another as much as possible. This is the first in a series of patches that will allow userspace cluster interaction. Quorum is separated out first, and will ultimately only be associated with the disk heartbeat as a separate module. To do so, this patch performs the following changes: * o2hb_notify() is added to handle injection of

[PATCH] o2net: Reconnect after idle time out.

2008 Feb 04

[PATCH] o2net: Reconnect after idle time out.

Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation, we either fence (unnecessarily) or wait for its disk hb to die (and sometimes hang

Node crashed after remove a path

2006 May 18

Node crashed after remove a path

Hi, I have a 2-node cluster on 2 Dell PowerEdge 2650. When remove a device path, and both nodes crashed. Any help would be appreciated. Thanks! Roger--- Configuration: Oracle: 10.2.0.1.0 x86 Oracle home: on OCFS2 shared with multipath Oracle datafiles: OCFS2 shared with multipath cat redhat-release Red Hat Enterprise Linux ES release 4 (Nahant Update 2) uname -a Linux sqa-pe2650-40

Getting Closer (was: Fencing options)

2010 Jan 18

Getting Closer (was: Fencing options)

One more follow on, The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to have netted the culrptit: E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137 ERROR: Heartbeat write timeout to device dm-12 after 60000 milliseconds E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all active regions.

[PATCH] o2net: Reconnect after idle time out.V2

2008 Feb 13

[PATCH] o2net: Reconnect after idle time out.V2

Modification from V1 to V2: 1. Use atomic ops instead of spin_lock in timer. 2. Add some comments when querying connect_expired work. These comments are copied form Zach's mail.;) Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get

Error while Mounting

2006 Jun 25

Error while Mounting

I am attempting to setup a 2 node ocfs2 cluster. At this point, I have the latest 1.2.1 version of the tools on both nodes. They are not running identical kernels (one is 2.6.16.18, the other is 2.6.17.1) both are using the kernels built in OCFS2 modules, not using from source. I can mount my iscsi volume on either node individually, but when I attempt to mount two nodes, I get the following

[RFC: 2.6 patch] fs/ocfs2/: remove unused exports

2006 Apr 14

[RFC: 2.6 patch] fs/ocfs2/: remove unused exports

This patch removes the following unused EXPORT_SYMBOL_GPL's: - cluster/heartbeat.c: o2hb_check_node_heartbeating_from_callback - cluster/heartbeat.c: o2hb_stop_all_regions - cluster/nodemanager.c: o2nm_get_node_by_num - cluster/nodemanager.c: o2nm_configured_node_map - cluster/nodemanager.c: o2nm_get_node_by_ip - cluster/nodemanager.c: o2nm_node_put - cluster/nodemanager.c: o2nm_node_get -

[PATCH 14/14] ocfs2: include disk heartbeat in ocfs2_nodemanager to avoid userspace changes

2006 Feb 21

[PATCH 14/14] ocfs2: include disk heartbeat in ocfs2_nodemanager to avoid userspace changes

This patch removes disk heartbeat's modularity which makes it the default. Without this patch, userspace changes are required. This patch is not intended for permanent application, just to make it easier for users not interested in testing the userspace clustering implementation to use ocfs2. In order to switch to user clustering, use "o2cb offline" to shut down the cluster,

strange fencing behavior

2009 Sep 24

strange fencing behavior

I have 10 servers in a cluster running Debian Etch with 2.6.26-bpo.2 with a backport of ocfs2-tools-1.4.1-1 I'm using AoE to export the drives from a Debian Lenny server in the cluster. My problem is if I mount the ocfs2 partition on the server that is exporting it via AoE it fences the entire cluster. Looking at the logs exporting the ocfs2 partition doesn't give much information...

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

2006 Jun 09

RHEL 4 U2 / OCFS 1.2.1 weekly crash?

Hello, I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2 1.2.1 RPMs. About once a week, one of the nodes crashes itself (self- fencing) and I get a full vmcore on my netdump server. The netdump log file shows the shared filesystem LUN (/dev/dm-6) did not respond within 12000ms. I have not changed the default heartbeat values in /etc/sysconfig/o2cb. There was no other IO

ocfs2 crash with bugs reports (dlmmaster.c)

2011 Feb 28

ocfs2 crash with bugs reports (dlmmaster.c)

Hi, After problem described in http://oss.oracle.com/pipermail/ocfs2-users/2010- December/004854.html we've upgraded kernels and ocfs2-tools on every node. The present versions are: kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports) ocfs2-tolls 1.4.4-3 (from debian squeeze) We didn't noticed any problems in logs untill last friday, when the whole ocfs2 cluster crashed. We know

How to break out the unstop loop in the recovery thread? Thanks a lot.

2013 Nov 01

How to break out the unstop loop in the recovery thread? Thanks a lot.

Hi everyone, I have one OCFS2 issue. The OS is Ubuntu, using linux kernel is 3.2.50. There are three node in the OCFS2 cluster, and all the node is using the iSCSI SAN of HP 4330 as the storage. As the storage restarted, there were two node restarted for fence without heartbeating writting on to the storage. But the last one does not restart, and it still write error message into syslog as below:

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

2009 Nov 17

[PATCH 1/1] ocfs2/cluster: Make fence method configurable

By default, o2cb fences the box by calling emergency_restart(). While this scheme works well in production, it comes in the way during testing as it does not let the tester take stack/core dumps for analysis. This patch allows user to dynamically change the fence method to panic() by: # echo "panic" > /sys/kernel/config/cluster/<clustername>/fence_method Signed-off-by: Sunil

another fencing question

2010 Jan 14

another fencing question

Hi, periodically one of on my two nodes cluster is fenced here are the logs: Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2- rc.minint.it (num 0) at 1.1.1.6:7777 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR: link to 0 went down! Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -112 Jan 14 07:01:44

Another node is heartbeating in our slot! errors with LUN removal/addition

2008 Oct 22

Another node is heartbeating in our slot! errors with LUN removal/addition

Greetings, Last night I manually unpresented and deleted a LUN (a SAN snapshot) that was presented to one node in a four node RAC environment running OCFS2 v1.4.1-1. The system then rebooted with the following error: Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device dm-24 after 120000 milliseconds Oct 21 16:45:34 ausracdb03 kernel:

Newbie questions -- is OCFS2 what I even want?

2006 Nov 03

Newbie questions -- is OCFS2 what I even want?

Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate

Unexplained reboots in DRBD82 + OCFS2 setup

2009 Jun 24

Unexplained reboots in DRBD82 + OCFS2 setup

We're trying to setup a dual-primary DRBD environment, with a shared disk with either OCFS2 or GFS. The environment is a Centos 5.3 with DRBD82 (but also tried with DRBD83 from testing) . Setting up a single primary disk and running bonnie++ on it works. Setting up a dual-primary disk, only mounting it on one node (ext3) and running bonnie++ works When setting up ocfs2 on the /dev/drbd0

similar to: problems with ocfs2