thr3ads.net - search: "ocfs2_replay

Displaying 20 results from an estimated 25 matches for "ocfs2_replay_journal".

2009 Sep 24

strange fencing behavior

...mation on journal Sep 24 14:07:57 storage0 kernel: [650683.566388] kjournald starting. Commit interval 5 seconds Sep 24 14:07:57 storage0 kernel: [650683.566388] ocfs2: Mounting device (8,18) on (node 0, slot 10) with ordered data mode. Sep 24 14:07:57 storage0 kernel: [650683.566388] (12231,1):ocfs2_replay_journal:1149 Recovering node 10 from slot 0 on device (8,18) Sep 24 14:08:00 storage0 kernel: [650687.138110] kjournald starting. Commit interval 5 seconds Sep 24 14:08:00 storage0 kernel: [650687.268898] (12231,1):ocfs2_replay_journal:1149 Recovering node 2 from slot 1 on device (8,18) Sep 24 14:08:0...

How to break out the unstop loop in the recovery thread? Thanks a lot.

2013 Nov 01

How to break out the unstop loop in the recovery thread? Thanks a lot.

...ng on to the storage. But the last one does not restart, and it still write error message into syslog as below: Oct 30 02:01:01 server177 kernel: [25786.227598] (ocfs2rec,14787,13):ocfs2_read_journal_inode:1463 ERROR: status = -5 Oct 30 02:01:01 server177 kernel: [25786.227615] (ocfs2rec,14787,13):ocfs2_replay_journal:1496 ERROR: status = -5 Oct 30 02:01:01 server177 kernel: [25786.227631] (ocfs2rec,14787,13):ocfs2_recover_node:1652 ERROR: status = -5 Oct 30 02:01:01 server177 kernel: [25786.227648] (ocfs2rec,14787,13):__ocfs2_recovery_thread:1358 ERROR: Error -5 recovering node 2 on device (8,32)! Oct 30 02:01:...

Self-fencing issues (RHEL4)

2006 Apr 18

Self-fencing issues (RHEL4)

...:45 rac1/rac1 (2903,0):o2net_set_nn_state:411 no longer connected to node rac2 (num 1) at 10.0.1.2:7777 Apr 18 15:56:45 rac1/rac1 (2897,1):dlm_send_proxy_ast_msg:448 ERROR: status = -107 Apr 18 15:56:45 rac1/rac1 (2897,1):dlm_flush_asts:556 ERROR: status = -107 Apr 18 15:56:46 rac1/rac1 (19545,0):ocfs2_replay_journal:1172 Recovering node 1 from slot 1 on device (8,41) Apr 18 15:56:46 rac1/rac1 (19544,0):ocfs2_replay_journal:1172 Recovering node 1 from slot 1 on device (8,37) Apr 18 15:56:51 rac2/rac2 <0>Rebooting in 60 seconds..<5>(3,0):o2net_idle_timer:1310 connection to node rac1 (num 0) at 10...

add error check for ocfs2_read_locked_inode() call

2009 May 12

add error check for ocfs2_read_locked_inode() call

After upgrading from 2.6.28.10 to 2.6.29.3 I've saw following new errors in kernel log: May 12 14:46:41 falcon-cl5 May 12 14:46:41 falcon-cl5 (6757,7):ocfs2_read_locked_inode:466 ERROR: status = -22 Only one node is mounted volumes in cluster: /dev/sde on /home/apache/users/D1 type ocfs2 (rw,_netdev,noatime,heartbeat=local) /dev/sdd on /home/apache/users/D2 type ocfs2

2 Node cluster crashing

2006 Jul 10

2 Node cluster crashing

...e_change:512 connection to node rac2.globoforce.com num 1 at 198.87.235.246:7777 has been idle for 10 seconds, shutting it down. Jul 7 14:56:23 rac1 kernel: (10042,0):o2net_set_nn_state:414 no longer connected to node rac2.globoforce.com at 198.87.235.246:7777 Jul 7 14:56:56 rac1 kernel: (14410,3):ocfs2_replay_journal:1123 Recovering node 1 from slot 1 on device (8,65) rac2: Jul 7 14:56:24 rac2 kernel: (0,0):o2net_state_change:512 connection to node rac1.globoforce.com num 0 at 198.87.235.244:7777 has been idle for 10 seconds, shutting it down. Jul 7 14:56:24 rac2 kernel: (10201,0):o2net_set_nn_state:414 no l...

Strange dmesg messages

2009 Feb 04

Strange dmesg messages

...ore lock mastery can begin (6968,7):dlm_get_lock_resource:947 F59B45831EEA41F384BADE6C4B7A932B: recovery map is not empty, but must master $RECOVERY lock now (6968,7):dlm_do_recovery:524 (6968) Node 1 is the Recovery Master for the Dead Node 0 for Domain F59B45831EEA41F384BADE6C4B7A932B (12281,2):ocfs2_replay_journal:1004 Recovering node 0 from slot 0 on device (8,33) (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 66251376 to 66251415 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 3176 and revoked 0/0 blocks kjournald starting. Commit interval 5 sec...

Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)

2011 Apr 01

Node Recovery locks I/O in two-node OCFS2 cluster (DRBD 8.3.8 / Ubuntu 10.10)

...ly way I?ve been able to successfully regain I/O within the cluster is to bring back up the other node. While monitoring the logs, it seems that it is OCFS2 that?s establishing the lock/unlock and not DRBD at all. > > > Apr 1 12:07:19 ubu10a kernel: [ 1352.739777] > (ocfs2rec,3643,0):ocfs2_replay_journal:1605 Recovering node 1124116672 from > slot 1 on device (147,0) > Apr 1 12:07:19 ubu10a kernel: [ 1352.900874] > (ocfs2rec,3643,0):ocfs2_begin_quota_recovery:407 Beginning quota recovery in > slot 1 > Apr 1 12:07:19 ubu10a kernel: [ 1352.902509] > (ocfs2_wq,1213,0):ocfs2_finish_...

Troubles with two node

2007 Nov 29

Troubles with two node

...54FF88030591B1210C560:$RECOVERY: at least one node (0) torecover before lock mastery can begin Nov 22 18:14:54 web-ha2 kernel: (3550,0):dlm_get_lock_resource:876 86472C5C33A54FF88030591B1210C560: recovery map is not empty, but must master $RECOVERY lock now Nov 22 18:14:54 web-ha2 kernel: (17893,0):ocfs2_replay_journal:1184 Recovering node 0 from slot 0 on device (8,17) Nov 22 18:14:55 web-ha2 kernel: (17803,1):dlm_restart_lock_mastery:1215 ERROR: node down! 0 Nov 22 18:14:55 web-ha2 kernel: (17803,1):dlm_wait_for_lock_mastery:1036 ERROR: status = -11 Nov 22 18:14:55 web-ha2 kernel: (17602,0):dlm_restart_lock_mas...

[PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount

2009 Mar 04

[PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount

During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery

problems with ocfs2

2006 Mar 14

problems with ocfs2

An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060314/b38f73eb/attachment.html

ocfs2 reboot

2006 Sep 21

ocfs2 reboot

...ow : o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1158758358.807993 now 1158758368.805980 dr 1158758358.807964adv 1158758358.808000:1158758358.808001 func (23633ca3:504) 1158757938.878265: 1158757938.878271) Sep 20 15:20:02 src-rac-duplicati1 kernel: (10047,0):ocfs2_replay_journal:1174 Recovering node 1 from slot 0 on device (104,1) Sep 20 15:20:05 src-rac-duplicati1 kernel: (2062,1):dlm_get_lock_resource:847 6AEF3479C4784E9895BDE697EFCAC035:$RECOVERY: at least one node (1) torecover before lock mastery can begin Sep 20 15:20:05 src-rac-duplicati1 kernel: (2062,1):dlm_get_lo...

ocfs2 - Kernel panic on many write/read from both

2011 Dec 20

ocfs2 - Kernel panic on many write/read from both

Sorry i don`t copy everything: TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604 246266859 TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 6074335 30371669 285493670 TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc debugfs.ocfs2 1.6.4 5239722 26198604

another fencing question

2010 Jan 14

another fencing question

Hi, periodically one of on my two nodes cluster is fenced here are the logs: Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2- rc.minint.it (num 0) at 1.1.1.6:7777 Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334 ERROR: link to 0 went down! Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458 ERROR: status = -112 Jan 14 07:01:44

ocfs2 cluster becomes unresponsive

2007 Mar 08

ocfs2 cluster becomes unresponsive

...958DB:$RECOVERY: at least one node (2) torecover before lock mastery can begin Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:874 B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master $RECOVERY lock now Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 Recovering node 2 from slot 1 on device (253,1) Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 ERROR: node down! 2 Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11 Mar 8 07:23:41 groupwise-1-mht kernel: (929,1)...

Backport to 1.4 of patch that recovers orphans from offline slots

2009 Apr 07

Backport to 1.4 of patch that recovers orphans from offline slots

The following patch is a backport of patch that recovers orphans from offline slots. It is being backported from mainline to 1.4 mainline patch: 0001-Patch-to-recover-orphans-in-offline-slots-during-rec.patch Thanks, --Srini

[PATCH 1/1] ocfs2: recover orphans in offline slots during recovery and mount

2009 Mar 06

[PATCH 1/1] ocfs2: recover orphans in offline slots during recovery and mount

...bail: mutex_lock(&osb->recovery_lock); @@ -1314,6 +1414,7 @@ bail: goto restart; } + ocfs2_free_replay_slots(osb); osb->recovery_thread_task = NULL; mb(); /* sync with ocfs2_recovery_thread_running */ wake_up(&osb->recovery_event); @@ -1465,6 +1566,9 @@ static int ocfs2_replay_journal(struct ocfs2_super *osb, goto done; } + /* we need to run complete recovery for offline orphan slots */ + ocfs2_replay_map_set_state(osb, REPLAY_NEEDED); + mlog(ML_NOTICE, "Recovering node %d from slot %d on device (%u,%u)\n", node_num, slot_num, MAJOR(osb->sb-&g...

[PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount (revised)

2009 Mar 06

[PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount (revised)

OCF2 and LVM

2007 Oct 08

OCF2 and LVM

Does anybody knows if is there a certified procedure in to backup a RAC DB 10.2.0.3 based on OCFS2 , via split mirror or snaphots technology ? Using Linux LVM and OCFS2, does anybody knows if is possible to dinamically extend an OCFS filesystem, once the underlying LVM Volume has been extended ? Thanks in advance Riccardo Paganini

Another node is heartbeating in our slot! errors with LUN removal/addition

2008 Oct 22

Another node is heartbeating in our slot! errors with LUN removal/addition

...1745 File system was not unmounted cleanly, recovering volume. Oct 22 03:16:30 ausracdb03 kernel: kjournald starting. Commit interval 5 seconds Oct 22 03:16:30 ausracdb03 kernel: ocfs2: Mounting device (253,28) on (node 2, slot 0) with ordered data mode. Oct 22 03:16:30 ausracdb03 kernel: (9939,1):ocfs2_replay_journal:1076 Recovering node 0 from slot 3 on device (253,28) Oct 22 03:16:32 ausracdb03 kernel: (9861,2):o2hb_do_disk_heartbeat:770 ERROR: Device "dm-28": another node is heartbeating in our slot! Oct 22 03:16:34 ausracdb03 kernel: (9861,2):o2hb_do_disk_heartbeat:770 ERROR: Device "dm-28&qu...

[PATCH 0/3] ocfs2: Switch over to JBD2.

2008 Sep 04

[PATCH 0/3] ocfs2: Switch over to JBD2.

ocfs2 currently uses the Journaled Block Device (JBD) for its journaling. This is a very stable and tested codebase. However, JBD is limited by architecture to 32bit block numbers. This means an ocfs2 filesystem is limited to 2^32 blocks. With a 4K blocksize, that's 16TB. People want larger volumes. Fortunately, there is now JBD2. JBD2 adds 64bit block number support and some other

search for: ocfs2_replay_journal