thr3ads.net - Ocfs2 users - [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Michael Ulbrich

2016-Mar-23 22:38 UTC

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Hi ocfs2-users,

my first post to this list from yesterday probably didn't get through.

Anyway, I've made some progress in the meantime and may now ask more
specific questions ...

I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy:

Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

the kernel modules are:

modinfo ocfs2 -> version: 1.5.0

using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.

As an alternative I cloned and built the latest ocfs2-tools from
markfasheh's ocfs2-tools on github which should be version 1.8.4.

The filesystem runs on top of drbd, is used to roughly 40 % and suffers
from read-only remounts and hanging clients since the last reboot. This
may be DLM problems but I suspect they stem from some corrupt disk
structures. Before that it all ran stable for months.

This situation made me want to run fsck.ocfs2 and now I wonder how to do
that. The filesystem is not mounted.

With the stock ocfs-tools 1.6.4:

root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.6.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:              ocfs2_ASSET
  UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size:         2048
  Number of clusters: 2778641591
  Cluster size:       4096
  Number of slots:    16

I'm checking fsck_drbd1.log and find that it is making progress in

Pass 0a: Checking cluster allocation chains

until it reaches "chain 73" and goes into an infinite loop filling the
logfile with breathtaking speed.

With the newly built ocfs-tools 1.8.4 I get:

root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.8.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:              ocfs2_ASSET
  UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size:         2048
  Number of clusters: 2778641591
  Cluster size:       4096
  Number of slots:    16

Again watching the verbose output in fsck_drbd1.log I find that this
time it proceeds up to

Pass 0a: Checking cluster allocation chains
o2fsck_pass0:1360 | found inode alloc 13 at block 13

and stays there without any further progress. I've terminated this
process after waiting for more than an hour.

Now - I'm lost somehow ... and would very much appreciate if anybody on
this list would share his knowledge and give me a hint what to do next.

What could be done to get this file system checked and repaired? Am I
missing something important or do I just have to wait a little bit
longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will
perform as expected?

I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away
from taking that risk without any clue of whether that might solve my
problem ...

Thanks in advance ... Michael Ulbrich

Joseph Qi

2016-Mar-24 00:30 UTC

head link

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Hi Michael,
Could you please use debugfs to check the output?
# debugfs.ocfs2 -R 'stat //global_bitmap' <device>

Thanks,
Joseph

On 2016/3/24 6:38, Michael Ulbrich wrote:> Hi ocfs2-users,
> 
> my first post to this list from yesterday probably didn't get through.
> 
> Anyway, I've made some progress in the meantime and may now ask more
> specific questions ...
> 
> I'm having issues with an 11 TB ocfs2 shared filesystem on Debian
Wheezy:
> 
> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
> 
> the kernel modules are:
> 
> modinfo ocfs2 -> version: 1.5.0
> 
> using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.
> 
> As an alternative I cloned and built the latest ocfs2-tools from
> markfasheh's ocfs2-tools on github which should be version 1.8.4.
> 
> The filesystem runs on top of drbd, is used to roughly 40 % and suffers
> from read-only remounts and hanging clients since the last reboot. This
> may be DLM problems but I suspect they stem from some corrupt disk
> structures. Before that it all ran stable for months.
> 
> This situation made me want to run fsck.ocfs2 and now I wonder how to do
> that. The filesystem is not mounted.
> 
> With the stock ocfs-tools 1.6.4:
> 
> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
> fsck.ocfs2 1.6.4
> Checking OCFS2 filesystem in /dev/drbd1:
>   Label:              ocfs2_ASSET
>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>   Number of blocks:   5557283182
>   Block size:         2048
>   Number of clusters: 2778641591
>   Cluster size:       4096
>   Number of slots:    16
> 
> I'm checking fsck_drbd1.log and find that it is making progress in
> 
> Pass 0a: Checking cluster allocation chains
> 
> until it reaches "chain 73" and goes into an infinite loop
filling the
> logfile with breathtaking speed.
> 
> With the newly built ocfs-tools 1.8.4 I get:
> 
> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
> fsck.ocfs2 1.8.4
> Checking OCFS2 filesystem in /dev/drbd1:
>   Label:              ocfs2_ASSET
>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>   Number of blocks:   5557283182
>   Block size:         2048
>   Number of clusters: 2778641591
>   Cluster size:       4096
>   Number of slots:    16
> 
> Again watching the verbose output in fsck_drbd1.log I find that this
> time it proceeds up to
> 
> Pass 0a: Checking cluster allocation chains
> o2fsck_pass0:1360 | found inode alloc 13 at block 13
> 
> and stays there without any further progress. I've terminated this
> process after waiting for more than an hour.
> 
> Now - I'm lost somehow ... and would very much appreciate if anybody on
> this list would share his knowledge and give me a hint what to do next.
> 
> What could be done to get this file system checked and repaired? Am I
> missing something important or do I just have to wait a little bit
> longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will
> perform as expected?
> 
> I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away
> from taking that risk without any clue of whether that might solve my
> problem ...
> 
> Thanks in advance ... Michael Ulbrich
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
> 
>

Ocfs2 users - Mar 2016 - fsck.ocfs2 loops + hangs but does not check

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check