thr3ads.net - similar to: "6 node cluster with unexplained reboots"

Displaying 20 results from an estimated 10000 matches similar to: "6 node cluster with unexplained reboots"

2008 Jan 23

OCFS2 DLM problems

Hello everyone, once again. We are running into a problem, which has shown now 2 times, possible 3 (once the systems looked different.) The environment is 6 HP DL360/380 g5 servers with eth0 being the public interface, eth1 and bond0 (eth2 and eth3) used for clusterware and bond0 also used for OCFS2. The bond0 interface is in active/passive mode. There are no network errors counters showing and

fsck.ocfs2 using huge amount of memory?

2010 May 21

fsck.ocfs2 using huge amount of memory?

We are setting up 2 new EL5 U4 machines to replace our current database servers running our demo environment. We use 3Par SANs and their snap clone options. The current production system we snap clone from is EL4 U5 with ocfs2 1.2.9, the new servers have ocfs2 1.4.3 installed. Part of the refresh process is to run fsck.ocfs2 on the volume to recover, but right now as I am trying to run it on our

OCFS2 and Cloning

2008 Feb 25

OCFS2 and Cloning

I am working currently on cloning on a regular basis our production OCFS2 volumes to our test environment. For the database (Oracle 10G R2 RAC) we put it into backup mode, then execute a Snapclone on our 3Par SAN. Then we use RemoteCopy and SnapClone to our development 3Par SAN. To recover the OCFS2 volume I got through the following steps: Stop database umount /export/<volume name> Log

What could cause slow down betwen OCFS2 1.2.9 and 1.4.4

2011 Mar 11

What could cause slow down betwen OCFS2 1.2.9 and 1.4.4

We upgraded our production database cluster (6 node) from EL4 Update 5 to EL5 Update 5, including upgrading OCFS2 from 1.2.9 to 1.4.4. We are now noticing slowdown of batch jobs in Oracle, while hotbackup runs faster. One thing we saw is that journal mode changed from write-back to ordered, as we don't specify journal mode during mount. Oracle sees this as slowdown based on higher IO latency,

Anyone have an idea how to find file i/o throughput?

2008 Feb 17

Anyone have an idea how to find file i/o throughput?

We got a remote Oracle 10g R2 standby running on OCFS2. Initial when we started the standby, read I/O was < 5MB/sec on average. Since then it has grown to over 40MB/sec (longer average, it peaks much higher). Here is a graph showing this: http://www.alameda.net/~ulf/dbphx01.png We also have a local standby running (on EXT3) which is not showing the same symptom. I am trying to find where all

Another node is heartbeating in our slot! errors with LUN removal/addition

2008 Oct 22

Another node is heartbeating in our slot! errors with LUN removal/addition

Greetings, Last night I manually unpresented and deleted a LUN (a SAN snapshot) that was presented to one node in a four node RAC environment running OCFS2 v1.4.1-1. The system then rebooted with the following error: Oct 21 16:45:34 ausracdb03 kernel: (27,1):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device dm-24 after 120000 milliseconds Oct 21 16:45:34 ausracdb03 kernel:

[PATCH] ocfs2: return f_fsid info in ocfs2_statfs()

2009 Jan 15

[PATCH] ocfs2: return f_fsid info in ocfs2_statfs()

Currently f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (at least it should be filled with 0). Since in some conditions, f_fsid value might be used as (f_fsid, ino) pare to uniquely identify a file, ocfs2 should return a defined unique f_fsid value from ocfs2_statfs(). This patch uses uuid_hash as a unique ID to initiate f_fsid value, the 32bits width is enough for ocfs2

OCFS2 performance - disk random access time problem

2010 May 30

OCFS2 performance - disk random access time problem

Hello. I plan to use OCFS2 + DRBD for email server. Problem: I use "seeker" for testing http://www.linuxinsight.com/how_fast_is_your_disk.html And get this: Results: 65 seeks/second, 15.23 ms random access time Then I do rm of many files - it fals to 10 seeks/second and performance is terrible. What can I do to increase it? What`s wrong? Below is many info. What we have: Debian

[PATCH] ocfs2: return f_fsid info in ocfs2_statfs(), v4

2009 Jan 16

[PATCH] ocfs2: return f_fsid info in ocfs2_statfs(), v4

Currently f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (vfs layer fills 0 as default). Since in some conditions, f_fsid value might be used as (f_fsid, ino) pair to uniquely identify a file, ocfs2 should return a unique defined f_fsid value from ocfs2_statfs(). Because uuid_str is identified no mater on big or litlle endian machine, it's also endian consistent to use

[PATCH 1/2] ocfs2: Fix metaecc error messages

2010 Aug 12

[PATCH 1/2] ocfs2: Fix metaecc error messages

Like tools, the checksum validate function now prints the values in hex. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> --- fs/ocfs2/blockcheck.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/blockcheck.c b/fs/ocfs2/blockcheck.c index ec6d123..c7ee03c 100644 --- a/fs/ocfs2/blockcheck.c +++ b/fs/ocfs2/blockcheck.c @@ -439,7 +439,7 @@ int

OCFS processes active after a umount [SEC=UNOFFICIAL]

2008 Jul 21

OCFS processes active after a umount [SEC=UNOFFICIAL]

Hello, I have two OCFS file file systems mounted at /ocfs_1 and /ocfs_2. I have unmounted both OCFS file systems and was trying to then offline and unload OCFS. The offline command failed with - # ./o2cb offline Stopping O2CB cluster ocfs2: Failed Unable to stop cluster as heartbeat region still active Looking at the processes on this box shows a number of OCFS processes are still active -

Backport patches to ocfs2 1.4 tree from mainline

2009 Jan 14

Backport patches to ocfs2 1.4 tree from mainline

Found 15 patches (out of 162) that appeared relevant to ocfs2 1.4. Please review. Sunil

O2CB global heartbeat - hopefully final drop!

2010 Oct 08

O2CB global heartbeat - hopefully final drop!

All, This is hopefully the final drop of the patches for adding global heartbeat to the o2cb stack. The diff from the previous set is here: http://oss.oracle.com/~smushran/global-hb-diff-2010-10-07 Implemented most of the suggestions provided by Joel and Wengang. The most important one was to activate the feature only at the end, Also, got mostly a clean run with checkpatch.pl. Sunil

OCFS2 1.4: Patches backported from mainline

2009 Apr 17

OCFS2 1.4: Patches backported from mainline

Please review the list of patches being applied to the ocfs2 1.4 tree. All patches list the mainline commit hash. Thanks Sunil

ocfs2 filesystem seems out of sync

2008 Sep 25

ocfs2 filesystem seems out of sync

Hi there I recently installed an OCFS2 filesystem on our FC-SAN. Everything seemed to work fine and I could read & write the filesystem from both servers that are mounting it. After a while though, writes coming from one node do not appear on the other node and vice versa. I am not sure what's causing this, and not very experienced at debugging filesystems. If anybody has any

ocfs2 kernel bug in Fedora Core 4 update kernel

2007 Jan 23

ocfs2 kernel bug in Fedora Core 4 update kernel

OS: Fedora Core release 4 (Stentz) KERNEL: Linux rack1.ape 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006 i686 i686 i386 GNU/Linux CLUSTER: 11 Linux kernels, mixed environment FC4,FC5,FC6 SAN: FC Infortrend storage, QLogic16 port FC switch, FC adapter LSI FC929X (21224,1):ocfs2_truncate_file:242 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)

[PATCH 0/62] Ocfs2 updates for 2.6.26-rc1

2008 Apr 02

[PATCH 0/62] Ocfs2 updates for 2.6.26-rc1

The following series of patches comprises the bulk of our outstanding changes for Ocfs2. Aside from the usual set of cleanups and fixes that were inappropriate for 2.6.25, there are a few highlights: The '/sys/o2cb' directory has been moved to '/sys/fs/o2cb'. The new location meshes better with modern sysfs layout. A symbolic link has been placed in the old location so as to

Unable to stop cluster as heartbeat region still active

2011 Oct 18

Unable to stop cluster as heartbeat region still active

Hi, I have a 2 nodes ocfs2 cluster running UEK 2.6.32-100.0.19.el5, ocfs2console-1.6.3-2.el5, ocfs2-tools-1.6.3-2.el5. My problem is that all the time when i try to run /etc/init.d/o2cb stop it fails with this error: Stopping O2CB cluster CLUSTER: Failed Unable to stop cluster as heartbeat region still active There is no active mount point. I tried to manually stop the heartdbeat with

OCFS file system used as archived redo destination is corrupted

2005 Feb 11

OCFS file system used as archived redo destination is corrupted

we started using an ocfs file system about 4 months ago as the shared archived redo destination for the 4-node rac instances (HP dl380, msa1000, RH AS 2.1) . last night we are seeing some weird behavior, and my guess is the inode directory in the file system is getting corrupted. I've always had a bad feeling about OCFS not being very robust at handling constant file creation and deletion

Adding new nodes to OCFS2?

2007 Jul 07

Adding new nodes to OCFS2?

I looked around, found older post which seems not applicable anymore. I have a cluster of 2 nodes right now, which has 3 OCFS2 file systems. All the file systems were formatted with 4 node slots. I added the two news nodes (by hand, by ocfs2console and o2cb_ctl), so my /etc/ofcfs/cluster.conf looks right: node: ip_port = 7777 ip_address = 192.168.201.1 number = 0

similar to: 6 node cluster with unexplained reboots