thr3ads.net - similar to: "OCFS2 problems when connectivity lost"

Displaying 20 results from an estimated 100 matches similar to: "OCFS2 problems when connectivity lost"

Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)

2011 Apr 01

Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)

I am running a two-node web cluster on OCFS2 via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working great, except during testing of hard-boot scenarios. Whenever I hard-boot one of the nodes, the other node is successfully fenced and marked ?Outdated? * <resource minor="0" cs="WFConnection" ro1="Primary" ro2="Unknown"

How to break out the unstop loop in the recovery thread? Thanks a lot.

2013 Nov 01

How to break out the unstop loop in the recovery thread? Thanks a lot.

Hi everyone, I have one OCFS2 issue. The OS is Ubuntu, using linux kernel is 3.2.50. There are three node in the OCFS2 cluster, and all the node is using the iSCSI SAN of HP 4330 as the storage. As the storage restarted, there were two node restarted for fence without heartbeating writting on to the storage. But the last one does not restart, and it still write error message into syslog as below:

ocfs2 - Kernel panic on many write/read from both

2011 Dec 20

ocfs2 - Kernel panic on many write/read from both

Sorry i don`t copy everything: TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604 246266859 TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 6074335 30371669 285493670 TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604

Regd: Ethernet Channel Bonding Clarification is Needed

2008 Sep 12

Regd: Ethernet Channel Bonding Clarification is Needed

Dear All, Please ignore my previous mail I have using CentOS 4.4 Linux and Kernel Version is 2.6.9-42.EL I have configured Cluster Suite with 2 servers Server 1 : 192.168.13.110 IP Address and hostname is primary Server 2 : 192.168.13.179 IP Address and hostname is secondary Floating : 192.168.13.83 IP Address (Assumed by currently active server) I have configured Ethernet Channel Bonding in

Diagnosing some OCFS2 error messages

2010 Jun 14

Diagnosing some OCFS2 error messages

Hello. I am experimenting with OCFS2 on Suse Linux Enterprise Server 11 Service Pack 1. I am performing various stress tests. My current exercise involves writing to files using a shared-writable mmap() from two nodes. (Each node mmaps and writes to different files; I am not trying to access the same file from multiple nodes.) Both nodes are logging messages like these: [94355.116255]

Cluster Services

2007 Apr 04

Cluster Services

Hello, we are running CentOS 4.3 and the latest cluster suite packages from the csgfs yum repository for this release and need to delete a cluster member. According to the documentation we need to restart all cluster related services on all remaining nodes in the cluster after the node has been removed. This is a four node cluster so removing one node obviously degrades the cluster to three

RHCS on CentOS4 - 2 node cluster problem

2007 Jan 15

RHCS on CentOS4 - 2 node cluster problem

Hello fellows, I have a problem with a 2 node RHCS cluster (CentOS 4) where node 1 failed and node 2 became active. That happened already last year and due to holidays the customer didn't recognize it. The cluster is just a failover for Apache and has no shared storage space. Customer now saw the situation, tried to fix it by rebooting node 1, which then failed to come back up. As

Corosync on a home network

2017 Sep 10

Corosync on a home network

I've been trying to build a model cluster using three virtual machines on my home server. Each VM boots off its own dedicated partition (CentOS 7.3). One partition is designated to be the common /home partition for the VMs, (on the real machine it will mount as /cluster). I'm intending to run GFS2 on the shared partition, so I need to configure DLM and corosync. That's where I'm

Dovecot + DRBD/GFS mailstore

2009 Jun 05

Dovecot + DRBD/GFS mailstore

Hi guys, I'm looking at the possibility of running a pair of servers with Dovecot LDA/imap/pop3 using internal drives with DRBD and GFS (or other clustered FS) for the mail storage and ext3 for the root drive. I'm currently using maildrop for delivery and Dovecot imap/pop3 with the stores over NFS. I'm looking for better performance but still keeping the HA element I have now with

CentOS 6.5 RHCS fence loops

2014 Oct 29

CentOS 6.5 RHCS fence loops

Hi Guys, I'm using centos 6.5 as guest on RHEV and rhcs for cluster web environment. The environtment : web1.example.com web2.example.com When cluster being quorum, the web1 reboots by web2. When web2 is going up, web2 reboots by web1. Does anybody know how to solving this "fence loop" ? master_wins="1" is not working properly, qdisk also. Below the cluster.conf, I

Corosync init-script broken on CentOS6

2011 Nov 23

Corosync init-script broken on CentOS6

Hello all, I am trying to create a corosync/pacemaker cluster using CentOS 6.0. However, I'm having a great deal of difficulty doing so. Corosync has a valid configuration file and an authkey has been generated. When I run /etc/init.d/corosync I see that only corosync is started. >From experience working with corosync/pacemaker before, I know that this is not enough to have a functioning

[PATCH] ocfs2: flush dentry lock drop when sync ocfs2 volume.

2009 Jul 20

[PATCH] ocfs2: flush dentry lock drop when sync ocfs2 volume.

In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we move the dentry lock put process into ocfs2_wq. This is OK for most case, but as for umount, it lead to at least 2 bugs. See http://oss.oracle.com/bugzilla/show_bug.cgi?id=1133 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1135. And it happens easily if we have opened a lot of inodes. For 1135, the reason is that during umount will call

live migration

2012 Jul 10

live migration

Hello, everybody.? ? ?I use?NFS to do live migration?After input ?virsh --connect=qemu:///system --quiet migrate --live vm12 qemu+tcp://pcmk-1/system ?(vm12 ?is vm name,/pcmk-1 is host name)it use almost 10s for preparation.?During the 10s,the vm is still runing and can ping other vm. But if i input mkdir pcmk-6 in vm during the 10s,it say :mkdir: cannot create directory `pcmk-6': Read-only

[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

2023 Apr 30

[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

fstest generic cases 347 361 628 629 trigger a same issue: When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit journal. This commit gives ocfs2 ability to handle jbd2 ABORT case. Signed-off-by: Heming Zhao <heming.zhao at suse.com> --- fs/ocfs2/alloc.c | 10 ++++++---- fs/ocfs2/journal.c | 5 +++++ fs/ocfs2/localalloc.c | 3 +++ 3 files changed, 14

[PATCH] ocfs2: Don't delete orphaned files if we are in the process of umount.

2010 Aug 20

[PATCH] ocfs2: Don't delete orphaned files if we are in the process of umount.

Generally, orphan scan run in ocfs2_wq and is used to replay orphan dir. So for some low end iscsi device, the delete_inode may take a long time(In some devices, I have seen that delete 500 files will take about 15 secs). This will eventually cause umount to livelock(umount has to flush ocfs2_wq which will wait until orphan scan to finish). So this patch just try to finish the orphan scan

remote forwarding in 2.9p2

2001 Jul 04

remote forwarding in 2.9p2

Hi, It looks like remote forwarding with SSH v2 is not working on my Solaris machines (and from what I understand from the source, it may not work elsewhere either). When looking at channel_post_port_listener() in channels.c, I found that nextstate was defined as : nextstate = (c->host_port == 0) ? SSH_CHANNEL_DYNAMIC : SSH_CHANNEL_OPENING; And later comes the call : if

[PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync

2023 Apr 30

[PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync

fstest generic cases 266 272 281 trigger hanging issue when umount. I use 266 to describe the root cause. ``` 49 _dmerror_unmount 50 _dmerror_mount 51 52 echo "Compare files" 53 md5sum $testdir/file1 | _filter_scratch 54 md5sum $testdir/file2 | _filter_scratch 55 56 echo "CoW and unmount" 57 sync 58 _dmerror_load_error_table 59 urk=$($XFS_IO_PROG -f -c "pwrite

Hardware error or ocfs2 error?

2010 Apr 29

Hardware error or ocfs2 error?

Hello, today I noticed the following on *only* one node: ----- cut here ----- Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task ocfs2_wq:5214 blocked for more than 120 seconds. Apr 29 11:01:18 node06 kernel: [2569440.616056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D

[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

2023 May 09

[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status

On 5/9/23 12:40 AM, Heming Zhao wrote: > Sorry for reply late, I am a little bit busy recently. > > On Fri, May 05, 2023 at 11:42:51AM +0800, Joseph Qi wrote: >> >> >> On 5/5/23 12:20 AM, Heming Zhao wrote: >>> On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 5/4/23 4:02 PM, Heming Zhao wrote:

Trouble Ticket System

2008 Feb 06

Trouble Ticket System

Hi, Last year, I tried to installed and evaluate the following OSS web base trouble ticketing system. This is for me to track history on our IT related issues. Those that I tried are the PHP Ticket, DanPHPSupport, Epix Power Support, ruQueue, Ticket Express, OTRS, PMOS Help Desk and eTicket. From those, PMOS and eTicket are my top picks. I have problem installing OTRS the last time, so I am

similar to: OCFS2 problems when connectivity lost