thr3ads.net - search: "dlm_wait_for_lock

2007 Nov 29

1

Troubles with two node

..._lock_resource:915 ERROR: status = -107 Nov 28 15:29:46 web-ha2 kernel: (23443,0):dlm_do_master_request:1331 ERROR: link to 0 went down! ERROR: status = -107 [...] Nov 22 18:14:50 web-ha2 kernel: (17634,0):dlm_restart_lock_mastery:1215 ERROR: node down! 0 Nov 22 18:14:50 web-ha2 kernel: (17634,0):dlm_wait_for_lock_mastery:1036 ERROR: status = -11 Nov 22 18:14:51 web-ha2 kernel: (17619,1):dlm_restart_lock_mastery:1215 ERROR: node down! 0 Nov 22 18:14:51 web-ha2 kernel: (17619,1):dlm_wait_for_lock_mastery:1036 ERROR: status = -11 Nov 22 18:14:51 web-ha2 kernel: (17798,1):dlm_restart_lock_mastery:1215 ERROR: node down!...

[BUG] ocfs2/dlm: possible data races in dlm_drop_lockres_ref_done() and dlm_get_lock_resource()

2023 Jun 16

1

[BUG] ocfs2/dlm: possible data races in dlm_drop_lockres_ref_done() and dlm_get_lock_resource()

...ged during the lockres lifecycle. So this won't cause any real problem since now it holds a reference. > > dlm_get_lock_resource() --> Line 701 in dlmmaster.c > if (res->owner != dlm->node_num) --> Line 1023 in dlmmaster.c (Access > res->owner) Do you mean in dlm_wait_for_lock_mastery()? Even if owner changes suddenly, it will recheck, so I think it is also fine. Thanks, Joseph > > The variables res->lockname.name and res->owner are accessed respectively > without holding the lock res->spinlock, and thus data races can occur. > > I am not quite sure w...

ocfs2 cluster becomes unresponsive

2007 Mar 08

4

ocfs2 cluster becomes unresponsive

...st master $RECOVERY lock now Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 Recovering node 2 from slot 1 on device (253,1) Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 ERROR: node down! 2 Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11 Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214 ERROR: node down! 2 Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11 Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:121...

Problem installing on RH3 U8

2007 Jul 25

4

Problem installing on RH3 U8

Hi, i dont seem to be able to get ocfs running on RH3 U8 32Bit [root@libra-devb-db1 root]# uname -a Linux devb-db1.mydomain 2.4.21-47.ELsmp #1 SMP Wed Jul 5 20:38:41 EDT 2006 i686 athlon i386 GNU/Linux [root@devb-db1 root]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 3 (Taroon Update 8) [root@devb-db1 root]# rpm -ivh ocfs-2.4.21-EL-smp-1.0.14-1.i686.rpm Preparing...

May be deadlock for wrong locking order, patch request reviewed, thanks

2014 Sep 11

1

May be deadlock for wrong locking order, patch request reviewed, thanks

...is held and the node did not release it which cause the cluster hangs up. root at cvknode-21:~# ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D PID STAT COMMAND WIDE-WCHAN-COLUMN 7489 D jbd2/sdh-621 jbd2_journal_commit_transaction 16218 D ls iterate_dir 16533 D mkdir dlm_wait_for_lock_mastery 31195 D+ ls iterate_dir So the code reviewed, and I found the order of the lock may wrong. In the function dlm_master_request_handler, the resource lock is held and so after the lock of &dlm->master_lock is locked. But in the function dlm_get_lock_resource, the &dlm->master_lock...

May be deadlock for wrong locking order, patch request reviewed, thanks

2014 Sep 11

1

May be deadlock for wrong locking order, patch request reviewed, thanks

...is held and the node did not release it which cause the cluster hangs up. root at cvknode-21:~# ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D PID STAT COMMAND WIDE-WCHAN-COLUMN 7489 D jbd2/sdh-621 jbd2_journal_commit_transaction 16218 D ls iterate_dir 16533 D mkdir dlm_wait_for_lock_mastery 31195 D+ ls iterate_dir So the code reviewed, and I found the order of the lock may wrong. In the function dlm_master_request_handler, the resource lock is held and so after the lock of &dlm->master_lock is locked. But in the function dlm_get_lock_resource, the &dlm->master_lock...

another fencing question

2010 Jan 14

1

another fencing question

Hi, periodically one of on my two nodes cluster is fenced here are the logs: Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2- rc.minint.it (num 0) at 1.1.1.6:7777 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR: link to 0 went down! Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -112 Jan 14 07:01:44

Kernel Panic, Server not coming back up

2010 Apr 05

1

Kernel Panic, Server not coming back up

...o2net: accepted connection from node qa-web2 (num 2) at 147.178.220.32:7777 ocfs2_dlm: Node 2 joins domain 6A03E81A818641A68FD8DC23854E12D3 ocfs2_dlm: Nodes in domain ("6A03E81A818641A68FD8DC23854E12D3"): 0 1 2 (12701,1):dlm_restart_lock_mastery:1216 node 2 up while restarting (12701,1):dlm_wait_for_lock_mastery:1040 ERROR: status = -11 Any suggestions? Is there anymore data I can provide? Thanks for any help. Kevin

[BUG] ocfs2/dlm: possible data races in dlm_drop_lockres_ref_done() and dlm_get_lock_resource()

2023 Jun 13

1

[BUG] ocfs2/dlm: possible data races in dlm_drop_lockres_ref_done() and dlm_get_lock_resource()

Hello, Our static analysis tool finds some possible data races in the OCFS2 file system in Linux 6.4.0-rc6. In most calling contexts, the variables such as res->lockname.name and res->owner are accessed with holding the lock res->spinlock. Here is an example: lockres_seq_start() --> Line 539 in dlmdebug.c spin_lock(&res->spinlock); --> Line 574 in dlmdebug.c (Lock

ocfs2 - Kernel panic on many write/read from both

2011 Dec 20

8

ocfs2 - Kernel panic on many write/read from both

Sorry i don`t copy everything: TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604 246266859 TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 6074335 30371669 285493670 TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604

servers blocked on ocfs2

2010 Dec 09

2

servers blocked on ocfs2

...5.650496:1291450475.650501) Dec 4 09:15:06 parmenides kernel: o2net: no longer connected to node heraclito (num 0) at 192.168.1.3:7777 Dec 4 09:15:06 parmenides kernel: (snmpd,12342,11):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (minilogd,12700,0):dlm_wait_for_lock_mastery:1117 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: (smbd,25555,12):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (python,12439,9):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (python,12439,9):dlm_g...

Is it one issue. Do you have some good ideas, thanks a lot.

2013 Apr 28

2

Is it one issue. Do you have some good ideas, thanks a lot.

...D-VM6 kernel: [ 4231.992497] (dlm_reco_thread,14227,3):dlm_get_lock_resource:917 ERROR: status = -107 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993204] (dlm_reco_thread,13736,2):dlm_restart_lock_mastery:1221 ERROR: node down! 2 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993214] (dlm_reco_thread,13736,2):dlm_wait_for_lock_mastery:1038 ERROR: status = -11 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993223] (dlm_reco_thread,13736,2):dlm_do_master_requery:1656 ERROR: Error -107 when sending message 514 (key 0xe00bcbbe) to node 3 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993232] (dlm_reco_thread,13736,2):dlm_pre_master_reco_lockres:2...

Is it one issue. Do you have some good ideas, thanks a lot.

2013 Apr 28

2

Is it one issue. Do you have some good ideas, thanks a lot.

...D-VM6 kernel: [ 4231.992497] (dlm_reco_thread,14227,3):dlm_get_lock_resource:917 ERROR: status = -107 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993204] (dlm_reco_thread,13736,2):dlm_restart_lock_mastery:1221 ERROR: node down! 2 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993214] (dlm_reco_thread,13736,2):dlm_wait_for_lock_mastery:1038 ERROR: status = -11 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993223] (dlm_reco_thread,13736,2):dlm_do_master_requery:1656 ERROR: Error -107 when sending message 514 (key 0xe00bcbbe) to node 3 Apr 27 17:44:18 ZHJD-VM6 kernel: [ 4231.993232] (dlm_reco_thread,13736,2):dlm_pre_master_reco_lockres:2...

OCF2 and LVM

2007 Oct 08

2

OCF2 and LVM

Does anybody knows if is there a certified procedure in to backup a RAC DB 10.2.0.3 based on OCFS2 , via split mirror or snaphots technology ? Using Linux LVM and OCFS2, does anybody knows if is possible to dinamically extend an OCFS filesystem, once the underlying LVM Volume has been extended ? Thanks in advance Riccardo Paganini

add error check for ocfs2_read_locked_inode() call

2009 May 12

2

add error check for ocfs2_read_locked_inode() call

After upgrading from 2.6.28.10 to 2.6.29.3 I've saw following new errors in kernel log: May 12 14:46:41 falcon-cl5 May 12 14:46:41 falcon-cl5 (6757,7):ocfs2_read_locked_inode:466 ERROR: status = -22 Only one node is mounted volumes in cluster: /dev/sde on /home/apache/users/D1 type ocfs2 (rw,_netdev,noatime,heartbeat=local) /dev/sdd on /home/apache/users/D2 type ocfs2

search for: dlm_wait_for_lock_mastery