Michael Ulbrich
2016-Mar-23 22:38 UTC
[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check
Hi ocfs2-users, my first post to this list from yesterday probably didn't get through. Anyway, I've made some progress in the meantime and may now ask more specific questions ... I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy: Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux the kernel modules are: modinfo ocfs2 -> version: 1.5.0 using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri. As an alternative I cloned and built the latest ocfs2-tools from markfasheh's ocfs2-tools on github which should be version 1.8.4. The filesystem runs on top of drbd, is used to roughly 40 % and suffers from read-only remounts and hanging clients since the last reboot. This may be DLM problems but I suspect they stem from some corrupt disk structures. Before that it all ran stable for months. This situation made me want to run fsck.ocfs2 and now I wonder how to do that. The filesystem is not mounted. With the stock ocfs-tools 1.6.4: root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 fsck.ocfs2 1.6.4 Checking OCFS2 filesystem in /dev/drbd1: Label: ocfs2_ASSET UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 Number of blocks: 5557283182 Block size: 2048 Number of clusters: 2778641591 Cluster size: 4096 Number of slots: 16 I'm checking fsck_drbd1.log and find that it is making progress in Pass 0a: Checking cluster allocation chains until it reaches "chain 73" and goes into an infinite loop filling the logfile with breathtaking speed. With the newly built ocfs-tools 1.8.4 I get: root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 fsck.ocfs2 1.8.4 Checking OCFS2 filesystem in /dev/drbd1: Label: ocfs2_ASSET UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 Number of blocks: 5557283182 Block size: 2048 Number of clusters: 2778641591 Cluster size: 4096 Number of slots: 16 Again watching the verbose output in fsck_drbd1.log I find that this time it proceeds up to Pass 0a: Checking cluster allocation chains o2fsck_pass0:1360 | found inode alloc 13 at block 13 and stays there without any further progress. I've terminated this process after waiting for more than an hour. Now - I'm lost somehow ... and would very much appreciate if anybody on this list would share his knowledge and give me a hint what to do next. What could be done to get this file system checked and repaired? Am I missing something important or do I just have to wait a little bit longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will perform as expected? I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away from taking that risk without any clue of whether that might solve my problem ... Thanks in advance ... Michael Ulbrich
Hi Michael, Could you please use debugfs to check the output? # debugfs.ocfs2 -R 'stat //global_bitmap' <device> Thanks, Joseph On 2016/3/24 6:38, Michael Ulbrich wrote:> Hi ocfs2-users, > > my first post to this list from yesterday probably didn't get through. > > Anyway, I've made some progress in the meantime and may now ask more > specific questions ... > > I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy: > > Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux > > the kernel modules are: > > modinfo ocfs2 -> version: 1.5.0 > > using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri. > > As an alternative I cloned and built the latest ocfs2-tools from > markfasheh's ocfs2-tools on github which should be version 1.8.4. > > The filesystem runs on top of drbd, is used to roughly 40 % and suffers > from read-only remounts and hanging clients since the last reboot. This > may be DLM problems but I suspect they stem from some corrupt disk > structures. Before that it all ran stable for months. > > This situation made me want to run fsck.ocfs2 and now I wonder how to do > that. The filesystem is not mounted. > > With the stock ocfs-tools 1.6.4: > > root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 > fsck.ocfs2 1.6.4 > Checking OCFS2 filesystem in /dev/drbd1: > Label: ocfs2_ASSET > UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 > Number of blocks: 5557283182 > Block size: 2048 > Number of clusters: 2778641591 > Cluster size: 4096 > Number of slots: 16 > > I'm checking fsck_drbd1.log and find that it is making progress in > > Pass 0a: Checking cluster allocation chains > > until it reaches "chain 73" and goes into an infinite loop filling the > logfile with breathtaking speed. > > With the newly built ocfs-tools 1.8.4 I get: > > root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1 > fsck.ocfs2 1.8.4 > Checking OCFS2 filesystem in /dev/drbd1: > Label: ocfs2_ASSET > UUID: 6A1A0189A3F94E32B6B9A526DF9060F3 > Number of blocks: 5557283182 > Block size: 2048 > Number of clusters: 2778641591 > Cluster size: 4096 > Number of slots: 16 > > Again watching the verbose output in fsck_drbd1.log I find that this > time it proceeds up to > > Pass 0a: Checking cluster allocation chains > o2fsck_pass0:1360 | found inode alloc 13 at block 13 > > and stays there without any further progress. I've terminated this > process after waiting for more than an hour. > > Now - I'm lost somehow ... and would very much appreciate if anybody on > this list would share his knowledge and give me a hint what to do next. > > What could be done to get this file system checked and repaired? Am I > missing something important or do I just have to wait a little bit > longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will > perform as expected? > > I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away > from taking that risk without any clue of whether that might solve my > problem ... > > Thanks in advance ... Michael Ulbrich > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users > >