Hello Everyone, For the first time I eoerienced a dlm lock: [ 9721.831813] OCFS2 DLM 1.5.0 [ 9721.917032] ocfs2: Registered cluster interface o2cb [ 9722.170848] OCFS2 DLMFS 1.5.0 [ 9722.179018] OCFS2 User DLM kernel interface loaded [ 9755.743195] ocfs2_dlm: Nodes in domain ("3A791AB36DED41008E58CEF52EBEEFD3"): 1 [ 9755.852798] ocfs2: Mounting device (147,0) on (node 1, slot 0) with ordered data mode. [ 9783.240424] block drbd0: Handshake successful: Agreed network protocol version 91 [ 9783.242922] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC [ 9783.243074] block drbd0: conn( WFConnection -> WFReportParams ) [ 9783.243205] block drbd0: Starting asender thread (from drbd0_receiver [4390]) [ 9783.271014] block drbd0: data-integrity-alg: <not-used> [ 9783.271298] block drbd0: drbd_sync_handshake: [ 9783.271318] block drbd0: self 964FFEDA732A512B:0ABD16D2597E52D9:54E3AEC293CEDC7E:120384BD0E3A5705 bits:3 flags:0 [ 9783.271342] block drbd0: peer B4C81B0FD76EFAC2:0ABD16D2597E52D9:54E3AEC293CEDC7F:120384BD0E3A5705 bits:0 flags:0 [ 9783.271364] block drbd0: uuid_compare()=100 by rule 90 [ 9783.271380] block drbd0: Split-Brain detected, 1 primaries, automatically solved. Sync from this node [ 9783.271417] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) [ 9783.399967] block drbd0: peer( Secondary -> Primary ) [ 9783.515979] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Outdated -> Inconsistent ) [ 9783.522521] block drbd0: Began resync as SyncSource (will sync 12 KB [3 bits set]). [ 9783.629758] block drbd0: Implicitly set pdsk Inconsistent! [ 9783.799387] block drbd0: Resync done (total 1 sec; paused 0 sec; 12 K/sec) [ 9783.799956] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) [ 9795.430801] o2net: accepted connection from node astdrbd2 (num 2) at 192.168.2.111:7777 [ 9800.231650] ocfs2_dlm: Node 2 joins domain 3A791AB36DED41008E58CEF52EBEEFD3 [ 9800.231668] ocfs2_dlm: Nodes in domain ("3A791AB36DED41008E58CEF52EBEEFD3"): 1 2 [ 9861.922744] OCFS2: ERROR (device drbd0): ocfs2_validate_inode_block: Invalid dinode #35348: OCFS2_VALID_FL not set [ 9861.922767] [ 9861.927278] File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. [ 9861.928231] (8009,0):ocfs2_read_locked_inode:496 ERROR: status = -22 Not sure where to start, but with your appreciated help I am sure we can get it resolved. Thanks in Advance, Nick.
This has nothing to do with the dlm. The error states that the fs encountered a bad inode on disk. Possible disk corruption. On encountering the fs goes readonly and asks the user to run fsck. On 11/09/2011 11:51 AM, Nick Khamis wrote:> Hello Everyone, > > For the first time I eoerienced a dlm lock: > > [ 9721.831813] OCFS2 DLM 1.5.0 > [ 9721.917032] ocfs2: Registered cluster interface o2cb > [ 9722.170848] OCFS2 DLMFS 1.5.0 > [ 9722.179018] OCFS2 User DLM kernel interface loaded > [ 9755.743195] ocfs2_dlm: Nodes in domain > ("3A791AB36DED41008E58CEF52EBEEFD3"): 1 > [ 9755.852798] ocfs2: Mounting device (147,0) on (node 1, slot 0) with > ordered data mode. > [ 9783.240424] block drbd0: Handshake successful: Agreed network > protocol version 91 > [ 9783.242922] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC > [ 9783.243074] block drbd0: conn( WFConnection -> WFReportParams ) > [ 9783.243205] block drbd0: Starting asender thread (from drbd0_receiver [4390]) > [ 9783.271014] block drbd0: data-integrity-alg:<not-used> > [ 9783.271298] block drbd0: drbd_sync_handshake: > [ 9783.271318] block drbd0: self > 964FFEDA732A512B:0ABD16D2597E52D9:54E3AEC293CEDC7E:120384BD0E3A5705 > bits:3 flags:0 > [ 9783.271342] block drbd0: peer > B4C81B0FD76EFAC2:0ABD16D2597E52D9:54E3AEC293CEDC7F:120384BD0E3A5705 > bits:0 flags:0 > [ 9783.271364] block drbd0: uuid_compare()=100 by rule 90 > [ 9783.271380] block drbd0: Split-Brain detected, 1 primaries, > automatically solved. Sync from this node > [ 9783.271417] block drbd0: peer( Unknown -> Secondary ) conn( > WFReportParams -> WFBitMapS ) > [ 9783.399967] block drbd0: peer( Secondary -> Primary ) > [ 9783.515979] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( > Outdated -> Inconsistent ) > [ 9783.522521] block drbd0: Began resync as SyncSource (will sync 12 > KB [3 bits set]). > [ 9783.629758] block drbd0: Implicitly set pdsk Inconsistent! > [ 9783.799387] block drbd0: Resync done (total 1 sec; paused 0 sec; 12 K/sec) > [ 9783.799956] block drbd0: conn( SyncSource -> Connected ) pdsk( > Inconsistent -> UpToDate ) > [ 9795.430801] o2net: accepted connection from node astdrbd2 (num 2) > at 192.168.2.111:7777 > [ 9800.231650] ocfs2_dlm: Node 2 joins domain 3A791AB36DED41008E58CEF52EBEEFD3 > [ 9800.231668] ocfs2_dlm: Nodes in domain > ("3A791AB36DED41008E58CEF52EBEEFD3"): 1 2 > [ 9861.922744] OCFS2: ERROR (device drbd0): > ocfs2_validate_inode_block: Invalid dinode #35348: OCFS2_VALID_FL not > set > [ 9861.922767] > [ 9861.927278] File system is now read-only due to the potential of > on-disk corruption. Please run fsck.ocfs2 once the file system is > unmounted. > [ 9861.928231] (8009,0):ocfs2_read_locked_inode:496 ERROR: status = -22 > > Not sure where to start, but with your appreciated help I am sure we > can get it resolved. > > Thanks in Advance, > > Nick. > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
All Fixed! Just a few questions. Is there any documentation on howto diagnose on ocfs2 filesystem: * How to transfer an image file for testing onto a different machine. As you did with "o2image.out" * Does "fsck.ocfs2 -fy /dev/loop0" pretty much fix all the common problems * What can I do with the files in lost+found Thanks Again, Nick. On Fri, Nov 11, 2011 at 8:02 PM, Sunil Mushran <sunil.mushran at oracle.com> wrote:> So it detected one cluster that was doubly allocated. It fixed it. > Details below. The other fixes could be because the o2image was > taken on a live volume. > > As to how this could happen... I would look at the storage. > > > # fsck.ocfs2 -fy /dev/loop0 > fsck.ocfs2 1.6.3 > Checking OCFS2 filesystem in /dev/loop0: > Label: AsteriskServer > UUID: 3A791AB36DED41008E58CEF52EBEEFD3 > Number of blocks: 592384 > Block size: 4096 > Number of clusters: 592384 > Cluster size: 4096 > Number of slots: 2 > > /dev/loop0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > Pass 0b: Checking inode allocation chains > Pass 0c: Checking extent block allocation chains > Pass 1: Checking inodes and blocks. > Duplicate clusters detected. Pass 1b will be run > Running additional passes to resolve clusters claimed by more than one > inode... > Pass 1b: Determining ownership of multiply-claimed clusters > Pass 1c: Determining the names of inodes owning multiply-claimed clusters > Pass 1d: Reconciling multiply-claimed clusters > Cluster 161335 is claimed by the following inodes: > /asterisk/extensions.conf > /moh/macroform-cold_day.wav > [DUP_CLUSTERS_CLONE] Inode "/asterisk/extensions.conf" may be cloned or > deleted to break the claim it has on its clusters. Clone inode > "/asterisk/extensions.conf" to break claims on clusters it shares with other > inodes? y > [DUP_CLUSTERS_CLONE] Inode "/moh/macroform-cold_day.wav" may be cloned or > deleted to break the claim it has on its clusters. Clone inode > "/moh/macroform-cold_day.wav" to break claims on clusters it shares with > other inodes? y > Pass 2: Checking directory entries. > [DIRENT_INODE_FREE] Directory entry 'musiconhold.conf' refers to inode > number 35348 which isn't allocated, clear the entry? y > Pass 3: Checking directory connectivity. > [LOSTFOUND_MISSING] /lost+found does not exist. Create it so that we can > possibly fill it with orphaned inodes? y > Pass 4a: checking for orphaned inodes > Pass 4b: Checking inodes link counts. > [INODE_COUNT] Inode 96783 has a link count of 1 on disk but directory entry > references come to 2. Update the count on disk to match? y > [INODE_NOT_CONNECTED] Inode 96784 isn't referenced by any directory entries. > Move it to lost+found? y > [INODE_NOT_CONNECTED] Inode 96785 isn't referenced by any directory entries. > Move it to lost+found? y > [INODE_NOT_CONNECTED] Inode 96794 isn't referenced by any directory entries. > Move it to lost+found? y > [INODE_NOT_CONNECTED] Inode 96796 isn't referenced by any directory entries. > Move it to lost+found? y > All passes succeeded. > Slot 0's journal dirty flag removed > Slot 1's journal dirty flag removed > > > [root at ca-test92 ocfs2]# fsck.ocfs2 -fy /dev/loop0 > fsck.ocfs2 1.6.3 > Checking OCFS2 filesystem in /dev/loop0: > Label: AsteriskServer > UUID: 3A791AB36DED41008E58CEF52EBEEFD3 > Number of blocks: 592384 > Block size: 4096 > Number of clusters: 592384 > Cluster size: 4096 > Number of slots: 2 > > /dev/loop0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > Pass 0b: Checking inode allocation chains > Pass 0c: Checking extent block allocation chains > Pass 1: Checking inodes and blocks. > Pass 2: Checking directory entries. > Pass 3: Checking directory connectivity. > Pass 4a: checking for orphaned inodes > Pass 4b: Checking inodes link counts. > All passes succeeded.