similar to: OCFS2 problems when connectivity lost

Displaying 20 results from an estimated 100 matches similar to: "OCFS2 problems when connectivity lost"

2011 Apr 01
1
Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)
I am running a two-node web cluster on OCFS2 via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working great, except during testing of hard-boot scenarios. Whenever I hard-boot one of the nodes, the other node is successfully fenced and marked ?Outdated? * <resource minor="0" cs="WFConnection" ro1="Primary" ro2="Unknown"
2013 Nov 01
1
How to break out the unstop loop in the recovery thread? Thanks a lot.
Hi everyone, I have one OCFS2 issue. The OS is Ubuntu, using linux kernel is 3.2.50. There are three node in the OCFS2 cluster, and all the node is using the iSCSI SAN of HP 4330 as the storage. As the storage restarted, there were two node restarted for fence without heartbeating writting on to the storage. But the last one does not restart, and it still write error message into syslog as below:
2011 Dec 20
8
ocfs2 - Kernel panic on many write/read from both
Sorry i don`t copy everything: TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604 246266859 TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 6074335 30371669 285493670 TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604
2008 Sep 12
1
Regd: Ethernet Channel Bonding Clarification is Needed
Dear All, Please ignore my previous mail I have using CentOS 4.4 Linux and Kernel Version is 2.6.9-42.EL I have configured Cluster Suite with 2 servers Server 1 : 192.168.13.110 IP Address and hostname is primary Server 2 : 192.168.13.179 IP Address and hostname is secondary Floating : 192.168.13.83 IP Address (Assumed by currently active server) I have configured Ethernet Channel Bonding in
2010 Jun 14
3
Diagnosing some OCFS2 error messages
Hello. I am experimenting with OCFS2 on Suse Linux Enterprise Server 11 Service Pack 1. I am performing various stress tests. My current exercise involves writing to files using a shared-writable mmap() from two nodes. (Each node mmaps and writes to different files; I am not trying to access the same file from multiple nodes.) Both nodes are logging messages like these: [94355.116255]
2007 Apr 04
1
Cluster Services
Hello, we are running CentOS 4.3 and the latest cluster suite packages from the csgfs yum repository for this release and need to delete a cluster member. According to the documentation we need to restart all cluster related services on all remaining nodes in the cluster after the node has been removed. This is a four node cluster so removing one node obviously degrades the cluster to three
2007 Jan 15
1
RHCS on CentOS4 - 2 node cluster problem
Hello fellows, I have a problem with a 2 node RHCS cluster (CentOS 4) where node 1 failed and node 2 became active. That happened already last year and due to holidays the customer didn't recognize it. The cluster is just a failover for Apache and has no shared storage space. Customer now saw the situation, tried to fix it by rebooting node 1, which then failed to come back up. As
2017 Sep 10
4
Corosync on a home network
I've been trying to build a model cluster using three virtual machines on my home server. Each VM boots off its own dedicated partition (CentOS 7.3). One partition is designated to be the common /home partition for the VMs, (on the real machine it will mount as /cluster). I'm intending to run GFS2 on the shared partition, so I need to configure DLM and corosync. That's where I'm
2009 Jun 05
2
Dovecot + DRBD/GFS mailstore
Hi guys, I'm looking at the possibility of running a pair of servers with Dovecot LDA/imap/pop3 using internal drives with DRBD and GFS (or other clustered FS) for the mail storage and ext3 for the root drive. I'm currently using maildrop for delivery and Dovecot imap/pop3 with the stores over NFS. I'm looking for better performance but still keeping the HA element I have now with
2014 Oct 29
2
CentOS 6.5 RHCS fence loops
Hi Guys, I'm using centos 6.5 as guest on RHEV and rhcs for cluster web environment. The environtment : web1.example.com web2.example.com When cluster being quorum, the web1 reboots by web2. When web2 is going up, web2 reboots by web1. Does anybody know how to solving this "fence loop" ? master_wins="1" is not working properly, qdisk also. Below the cluster.conf, I
2011 Nov 23
1
Corosync init-script broken on CentOS6
Hello all, I am trying to create a corosync/pacemaker cluster using CentOS 6.0. However, I'm having a great deal of difficulty doing so. Corosync has a valid configuration file and an authkey has been generated. When I run /etc/init.d/corosync I see that only corosync is started. >From experience working with corosync/pacemaker before, I know that this is not enough to have a functioning
2009 Jul 20
1
[PATCH] ocfs2: flush dentry lock drop when sync ocfs2 volume.
In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we move the dentry lock put process into ocfs2_wq. This is OK for most case, but as for umount, it lead to at least 2 bugs. See http://oss.oracle.com/bugzilla/show_bug.cgi?id=1133 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1135. And it happens easily if we have opened a lot of inodes. For 1135, the reason is that during umount will call
2012 Jul 10
1
live migration
Hello, everybody.? ? ?I use?NFS to do live migration?After input ?virsh --connect=qemu:///system --quiet migrate --live vm12 qemu+tcp://pcmk-1/system ?(vm12 ?is vm name,/pcmk-1 is host name)it use almost 10s for preparation.?During the 10s,the vm is still runing and can ping other vm. But if i input mkdir pcmk-6 in vm during the 10s,it say :mkdir: cannot create directory `pcmk-6': Read-only
2023 Apr 30
3
[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
fstest generic cases 347 361 628 629 trigger a same issue: When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit journal. This commit gives ocfs2 ability to handle jbd2 ABORT case. Signed-off-by: Heming Zhao <heming.zhao at suse.com> --- fs/ocfs2/alloc.c | 10 ++++++---- fs/ocfs2/journal.c | 5 +++++ fs/ocfs2/localalloc.c | 3 +++ 3 files changed, 14
2010 Aug 20
0
[PATCH] ocfs2: Don't delete orphaned files if we are in the process of umount.
Generally, orphan scan run in ocfs2_wq and is used to replay orphan dir. So for some low end iscsi device, the delete_inode may take a long time(In some devices, I have seen that delete 500 files will take about 15 secs). This will eventually cause umount to livelock(umount has to flush ocfs2_wq which will wait until orphan scan to finish). So this patch just try to finish the orphan scan
2001 Jul 04
1
remote forwarding in 2.9p2
Hi, It looks like remote forwarding with SSH v2 is not working on my Solaris machines (and from what I understand from the source, it may not work elsewhere either). When looking at channel_post_port_listener() in channels.c, I found that nextstate was defined as : nextstate = (c->host_port == 0) ? SSH_CHANNEL_DYNAMIC : SSH_CHANNEL_OPENING; And later comes the call : if
2023 Apr 30
2
[PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync
fstest generic cases 266 272 281 trigger hanging issue when umount. I use 266 to describe the root cause. ``` 49 _dmerror_unmount 50 _dmerror_mount 51 52 echo "Compare files" 53 md5sum $testdir/file1 | _filter_scratch 54 md5sum $testdir/file2 | _filter_scratch 55 56 echo "CoW and unmount" 57 sync 58 _dmerror_load_error_table 59 urk=$($XFS_IO_PROG -f -c "pwrite
2010 Apr 29
2
Hardware error or ocfs2 error?
Hello, today I noticed the following on *only* one node: ----- cut here ----- Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task ocfs2_wq:5214 blocked for more than 120 seconds. Apr 29 11:01:18 node06 kernel: [2569440.616056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D
2023 May 09
1
[PATCH 2/2] ocfs2: add error handling path when jbd2 enter ABORT status
On 5/9/23 12:40 AM, Heming Zhao wrote: > Sorry for reply late, I am a little bit busy recently. > > On Fri, May 05, 2023 at 11:42:51AM +0800, Joseph Qi wrote: >> >> >> On 5/5/23 12:20 AM, Heming Zhao wrote: >>> On Thu, May 04, 2023 at 05:41:29PM +0800, Joseph Qi wrote: >>>> >>>> >>>> On 5/4/23 4:02 PM, Heming Zhao wrote:
2008 Feb 06
10
Trouble Ticket System
Hi, Last year, I tried to installed and evaluate the following OSS web base trouble ticketing system. This is for me to track history on our IT related issues. Those that I tried are the PHP Ticket, DanPHPSupport, Epix Power Support, ruQueue, Ticket Express, OTRS, PMOS Help Desk and eTicket. From those, PMOS and eTicket are my top picks. I have problem installing OTRS the last time, so I am