Displaying 20 results from an estimated 200 matches similar to: "problems with ocfs2"
2006 Apr 18
1
Self-fencing issues (RHEL4)
Hi.
I'm running RHEL4 for my test system, Adaptec Firewire controllers,
Maxtor One Touch III shared disk (see the details below),
100Mb/s dedicated interconnect. It panics with no load about each
20 minutes (error message from netconsole attached)
Any clues?
Yegor
---
[root at rac1 ~]# cat /proc/fs/ocfs2/version
OCFS2 1.2.0 Tue Mar 7 15:51:20 PST 2006 (build
2006 Jul 10
1
2 Node cluster crashing
Hi,
We have a two node cluster running SLES 9 SP2 connecting directly to an
EMC CX300 for storage.
We are using OCFS(OCFS2 DLM 0.99.15-SLES) for the voting disk etc, and
ASM for data files.
The system has been running until last Friday when the whole cluster
went down with the following error messages in the /var/log/messages
files :
rac1:
Jul 7 14:56:23 rac1 kernel:
2006 Jul 28
3
Private Interconnect and self fencing
I have an OCFS2 filesystem on a coraid AOE device.
It mounts fine, but with heavy I/O the server self fences claiming a
write timeout:
(16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device
etherd/e0.1p1 after 12000 milliseconds
(16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all
active regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
2006 Jan 09
0
[PATCH 01/11] ocfs2: event-driven quorum
This patch separates o2net and o2quo from knowing about one another as much
as possible. This is the first in a series of patches that will allow
userspace cluster interaction. Quorum is separated out first, and will
ultimately only be associated with the disk heartbeat as a separate module.
To do so, this patch performs the following changes:
* o2hb_notify() is added to handle injection of
2008 Feb 04
0
[PATCH] o2net: Reconnect after idle time out.
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.
It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get overloaded
enough that the network connection breaks but the disk hb does not.
And if we get into that situation, we either fence (unnecessarily)
or wait for its disk hb to die (and sometimes hang
2006 May 18
0
Node crashed after remove a path
Hi,
I have a 2-node cluster on 2 Dell PowerEdge 2650.
When remove a device path, and both nodes crashed.
Any help would be appreciated.
Thanks!
Roger---
Configuration:
Oracle: 10.2.0.1.0 x86
Oracle home: on OCFS2 shared with multipath
Oracle datafiles: OCFS2 shared with multipath
cat redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 2)
uname -a
Linux sqa-pe2650-40
2010 Jan 18
1
Getting Closer (was: Fencing options)
One more follow on,
The combination of kernel.panic=60 and kernel.printk=7 4 1 7 seems to
have netted the culrptit:
E01-netconsole.log:Jan 18 09:45:10 E01 (10,0):o2hb_write_timeout:137
ERROR: Heartbeat write timeout to device dm-12 after 60000
milliseconds
E01-netconsole.log:Jan 18 09:45:10 E01
(10,0):o2hb_stop_all_regions:1517 ERROR: stopping heartbeat on all
active regions.
2008 Feb 13
2
[PATCH] o2net: Reconnect after idle time out.V2
Modification from V1 to V2:
1. Use atomic ops instead of spin_lock in timer.
2. Add some comments when querying connect_expired work.
These comments are copied form Zach's mail.;)
Currently, o2net connects to a node on hb_up and disconnects on
hb_down and net timeout.
It disconnects on net timeout is ok, but it should attempt to
reconnect back. This is because sometimes nodes get
2006 Jun 25
1
Error while Mounting
I am attempting to setup a 2 node ocfs2 cluster. At this point, I have the
latest 1.2.1 version of the tools on both nodes. They are not running
identical kernels (one is 2.6.16.18, the other is 2.6.17.1) both are using
the kernels built in OCFS2 modules, not using from source.
I can mount my iscsi volume on either node individually, but when I attempt
to mount two nodes, I get the following
2006 Apr 14
1
[RFC: 2.6 patch] fs/ocfs2/: remove unused exports
This patch removes the following unused EXPORT_SYMBOL_GPL's:
- cluster/heartbeat.c: o2hb_check_node_heartbeating_from_callback
- cluster/heartbeat.c: o2hb_stop_all_regions
- cluster/nodemanager.c: o2nm_get_node_by_num
- cluster/nodemanager.c: o2nm_configured_node_map
- cluster/nodemanager.c: o2nm_get_node_by_ip
- cluster/nodemanager.c: o2nm_node_put
- cluster/nodemanager.c: o2nm_node_get
-
2006 Feb 21
0
[PATCH 14/14] ocfs2: include disk heartbeat in ocfs2_nodemanager to avoid userspace changes
This patch removes disk heartbeat's modularity which makes it the default.
Without this patch, userspace changes are required.
This patch is not intended for permanent application, just to make it easier
for users not interested in testing the userspace clustering implementation
to use ocfs2.
In order to switch to user clustering, use "o2cb offline" to shut down the
cluster,
2009 Sep 24
1
strange fencing behavior
I have 10 servers in a cluster running Debian Etch with 2.6.26-bpo.2
with a backport of ocfs2-tools-1.4.1-1
I'm using AoE to export the drives from a Debian Lenny server in the
cluster.
My problem is if I mount the ocfs2 partition on the server that is
exporting it via AoE it fences the entire cluster. Looking at the logs
exporting the ocfs2 partition doesn't give much information...
2006 Jun 09
1
RHEL 4 U2 / OCFS 1.2.1 weekly crash?
Hello,
I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2
1.2.1 RPMs. About once a week, one of the nodes crashes itself (self-
fencing) and I get a full vmcore on my netdump server. The netdump log
file shows the shared filesystem LUN (/dev/dm-6) did not respond within
12000ms. I have not changed the default heartbeat values
in /etc/sysconfig/o2cb. There was no other IO
2011 Feb 28
2
ocfs2 crash with bugs reports (dlmmaster.c)
Hi,
After problem described in http://oss.oracle.com/pipermail/ocfs2-users/2010-
December/004854.html we've upgraded kernels and ocfs2-tools on every node.
The present versions are:
kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)
ocfs2-tolls 1.4.4-3 (from debian squeeze)
We didn't noticed any problems in logs untill last friday, when the whole
ocfs2 cluster crashed.
We know
2013 Nov 01
1
How to break out the unstop loop in the recovery thread? Thanks a lot.
Hi everyone,
I have one OCFS2 issue.
The OS is Ubuntu, using linux kernel is 3.2.50.
There are three node in the OCFS2 cluster, and all the node is using the iSCSI SAN of HP 4330 as the storage.
As the storage restarted, there were two node restarted for fence without heartbeating writting on to the storage.
But the last one does not restart, and it still write error message into syslog as below:
2009 Nov 17
1
[PATCH 1/1] ocfs2/cluster: Make fence method configurable
By default, o2cb fences the box by calling emergency_restart(). While this
scheme works well in production, it comes in the way during testing as it
does not let the tester take stack/core dumps for analysis.
This patch allows user to dynamically change the fence method to panic() by:
# echo "panic" > /sys/kernel/config/cluster/<clustername>/fence_method
Signed-off-by: Sunil
2010 Jan 14
1
another fencing question
Hi,
periodically one of on my two nodes cluster is fenced here are the logs:
Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2-
rc.minint.it (num 0) at 1.1.1.6:7777
Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR:
link to 0 went down!
Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR:
status = -112
Jan 14 07:01:44
2008 Oct 22
2
Another node is heartbeating in our slot! errors with LUN removal/addition
Greetings,
Last night I manually unpresented and deleted a LUN (a SAN snapshot)
that was presented to one node in a four node RAC environment running
OCFS2 v1.4.1-1. The system then rebooted with the following error:
Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR:
Heartbeat write timeout to device dm-24 after 120000 milliseconds
Oct 21 16:45:34 ausracdb03 kernel:
2006 Nov 03
2
Newbie questions -- is OCFS2 what I even want?
Dear Sirs and Madams,
I run a small visual effects production company, Hammerhead Productions.
We'd like to have an easily extensible inexpensive relatively
high-performance
storage network using open-source components. I was hoping that OCFS2
would be that system.
I have a half-dozen 2 TB fileservers I'd like the rest of the network to see
as a single 12 TB disk, with the aggregate
2009 Jun 24
3
Unexplained reboots in DRBD82 + OCFS2 setup
We're trying to setup a dual-primary DRBD environment, with a shared
disk with either OCFS2 or GFS. The environment is a Centos 5.3 with
DRBD82 (but also tried with DRBD83 from testing) .
Setting up a single primary disk and running bonnie++ on it works.
Setting up a dual-primary disk, only mounting it on one node (ext3) and
running bonnie++ works
When setting up ocfs2 on the /dev/drbd0