thr3ads.net - Ocfs2 users - [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Michael Ulbrich

2016-Mar-24 13:47 UTC

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Hi Joseph,

thanks for this information although this does not sound too optimistic ...

So, if I understand you correctly, if we had a metadata backup from
o2image _before_ the crash we could have looked up the missing info to
remove the loop from group chain 73, right?

But how could the loop issue be fixed and at the same time the damage to
the data be minimized? There is a recent file level backup from which
damaged or missing files could be restored later.

151   4054438912        15872    2152     13720    10606    1984
152   4094595072        15872    10753    5119     5119     1984
153   4090944512        15872    1818     14054    9646     1984 <--
154   4083643392        15872    571      15301    4914     1984
155   4510758912        15872    4834     11038    6601     1984
156   4492506112        15872    6532     9340     5119     1984

Could you describe a "brute force" way how to dd out and edit record
#153 to remove the loop and minimize potential loss of data at the same
time? So that fsck would have a chance to complete and fix the remaining
issues?

Thanks a lot for your help ... Michael

On 03/24/2016 02:10 PM, Joseph Qi wrote:> Hi Michael,
> So I think the block of record #153 goes wrong, which points next to
> block 4083643392 of record #19.
> But the problem is we don't know the right info of the block of record
> #153, otherwise we can dd out, edit it and then dd in to fix it.
> 
> Thanks,
> Joseph
> 
> On 2016/3/24 18:38, Michael Ulbrich wrote:
>> Hi Joseph,
>>
>> ok, got it! Here's the loop in chain 73:
>>
>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>> CRC32: 00000000   ECC: 0000
>> ##   Block#            Total    Used     Free     Contig   Size
>> 0    4280773632        15872    11487    4385     1774     1984
>> 1    2583263232        15872    5341     10531    5153     1984
>> 2    4543613952        15872    5329     10543    5119     1984
>> 3    4532662272        15872    10753    5119     5119     1984
>> 4    4539963392        15872    3223     12649    7530     1984
>> 5    4536312832        15872    5219     10653    5534     1984
>> 6    4529011712        15872    6047     9825     3359     1984
>> 7    4525361152        15872    4475     11397    5809     1984
>> 8    4521710592        15872    3182     12690    5844     1984
>> 9    4518060032        15872    5881     9991     5131     1984
>> 10   4236966912        15872    10753    5119     5119     1984
>> 11   4098245632        15872    10756    5116     3388     1984
>> 12   4514409472        15872    8826     7046     5119     1984
>> 13   3441144832        15872    15       15857    9680     1984
>> 14   4404892672        15872    7563     8309     5119     1984
>> 15   4233316352        15872    9398     6474     5114     1984
>> 16   4488855552        15872    6358     9514     5119     1984
>> 17   3901115392        15872    9932     5940     3757     1984
>> 18   4507108352        15872    6557     9315     6166     1984
>> 19   4083643392        15872    571      15301    4914     1984 <--
>> 20   4510758912        15872    4834     11038    6601     1984
>> 21   4492506112        15872    6532     9340     5119     1984
>> 22   4496156672        15872    10753    5119     5119     1984
>> 23   4503457792        15872    10718    5154     5119     1984
>> ...
>> 154   4083643392        15872    571      15301    4914     1984 <--
>> 155   4510758912        15872    4834     11038    6601     1984
>> 156   4492506112        15872    6532     9340     5119     1984
>> 157   4496156672        15872    10753    5119     5119     1984
>> 158   4503457792        15872    10718    5154     5119     1984
>> ...
>> 289   4083643392        15872    571      15301    4914     1984 <--
>> 290   4510758912        15872    4834     11038    6601     1984
>> 291   4492506112        15872    6532     9340     5119     1984
>> 292   4496156672        15872    10753    5119     5119     1984
>> 293   4503457792        15872    10718    5154     5119     1984
>>
>> etc.
>>
>> So the loop begins at record #154 and spans 135 records, right?
>>
>> Will backup fs metadata as soon as I have some external storage at
hand.
>>
>> Thanks a lot so far ... Michael
>>
>> On 03/24/2016 10:41 AM, Joseph Qi wrote:
>>> Hi Michael,
>>> It seems that dead loop happens in chain 73. You have formatted
using 2K
>>> block and 4K cluster, so each chain should have 1522 or 1521
records.
>>> But at first glance, I cannot figure out which block goes wrong,
because
>>> the output you pasted indicates all blocks are different. So I
suggest
>>> you investigate the all blocks which belong to chain 73 and try to
find
>>> out if there is a loop there.
>>> BTW, have you backed up the metadata using o2image?
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/3/24 16:40, Michael Ulbrich wrote:
>>>> Hi Joseph,
>>>>
>>>> thanks a lot for your help. It is very much appreciated!
>>>>
>>>> I ran debugsfs.ocfs2 from ocfs2-tools 1.6.4 on the mounted file
system:
>>>>
>>>> root at s1a:~# debugfs.ocfs2 -R 'stat //global_bitmap'
/dev/drbd1 >
>>>> debugfs_drbd1.log 2>&1
>>>>
>>>> Inode: 13   Mode: 0644   Generation: 1172963971 (0x45ea0283)
>>>> FS Generation: 1172963971 (0x45ea0283)
>>>> CRC32: 00000000   ECC: 0000
>>>> Type: Regular   Attr: 0x0   Flags: Valid System Allocbitmap
Chain
>>>> Dynamic Features: (0x0)
>>>> User: 0 (root)   Group: 0 (root)   Size: 11381315956736
>>>> Links: 1   Clusters: 2778641591
>>>> ctime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>> atime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>> mtime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>> dtime: 0x0 -- Thu Jan  1 01:00:00 1970
>>>> ctime_nsec: 0x00000000 -- 0
>>>> atime_nsec: 0x00000000 -- 0
>>>> mtime_nsec: 0x00000000 -- 0
>>>> Refcount Block: 0
>>>> Last Extblk: 0   Orphan Slot: 0
>>>> Sub Alloc Slot: Global   Sub Alloc Bit: 7
>>>> Bitmap Total: 2778641591   Used: 1083108631   Free: 1695532960
>>>> Clusters per Group: 15872   Bits per Cluster: 1
>>>> Count: 115   Next Free Rec: 115
>>>> ##   Total        Used         Free         Block#
>>>> 0    24173056     9429318      14743738     4533995520
>>>> 1    24173056     9421663      14751393     4548629504
>>>> 2    24173056     9432421      14740635     4588817408
>>>> 3    24173056     9427533      14745523     4548692992
>>>> 4    24173056     9433978      14739078     4508568576
>>>> 5    24173056     9436974      14736082     4636369920
>>>> 6    24173056     9428411      14744645     4563390464
>>>> 7    24173056     9426950      14746106     4479459328
>>>> 8    24173056     9428099      14744957     4548851712
>>>> 9    24173056     9431794      14741262     4585389056
>>>> ...
>>>> 105   24157184     9414241      14742943     4690652160
>>>> 106   24157184     9419715      14737469     4467999744
>>>> 107   24157184     9411479      14745705     4431525888
>>>> 108   24157184     9413235      14743949     4559327232
>>>> 109   24157184     9417948      14739236     4500950016
>>>> 110   24157184     9411013      14746171     4566691840
>>>> 111   24157184     9421252      14735932     4522916864
>>>> 112   24157184     9416726      14740458     4537550848
>>>> 113   24157184     9415358      14741826     4676303872
>>>> 114   24157184     9420448      14736736     4526662656
>>>>
>>>> Group Chain: 0   Parent Inode: 13  Generation: 1172963971
>>>> CRC32: 00000000   ECC: 0000
>>>> ##   Block#            Total    Used     Free     Contig   Size
>>>> 0    4533995520        15872    6339     9533     3987     1984
>>>> 1    4530344960        15872    10755    5117     5117     1984
>>>> 2    2997109760        15872    10753    5119     5119     1984
>>>> 3    4526694400        15872    10753    5119     5119     1984
>>>> 4    3022663680        15872    10753    5119     5119     1984
>>>> 5    4512092160        15872    9043     6829     2742     1984
>>>> 6    4523043840        15872    4948     10924    9612     1984
>>>> 7    4519393280        15872    6150     9722     5595     1984
>>>> 8    4515742720        15872    4323     11549    6603     1984
>>>> 9    3771028480        15872    10753    5119     5119     1984
>>>> ...
>>>> 1513   5523297280        15872    1        15871    15871   
1984
>>>> 1514   5526947840        15872    1        15871    15871   
1984
>>>> 1515   5530598400        15872    1        15871    15871   
1984
>>>> 1516   5534248960        15872    1        15871    15871   
1984
>>>> 1517   5537899520        15872    1        15871    15871   
1984
>>>> 1518   5541550080        15872    1        15871    15871   
1984
>>>> 1519   5545200640        15872    1        15871    15871   
1984
>>>> 1520   5548851200        15872    1        15871    15871   
1984
>>>> 1521   5552501760        15872    1        15871    15871   
1984
>>>> 1522   5556152320        15872    1        15871    15871   
1984
>>>>
>>>> Group Chain: 1   Parent Inode: 13  Generation: 1172963971
>>>> CRC32: 00000000   ECC: 0000
>>>> ##   Block#            Total    Used     Free     Contig   Size
>>>> 0    4548629504        15872    10755    5117     2496     1984
>>>> 1    2993490944        15872    59       15813    14451    1984
>>>> 2    2489713664        15872    10758    5114     3726     1984
>>>> 3    3117609984        15872    3958     11914    6165     1984
>>>> 4    2544472064        15872    10753    5119     5119     1984
>>>> 5    3040948224        15872    10753    5119     5119     1984
>>>> 6    2971587584        15872    10753    5119     5119     1984
>>>> 7    4493871104        15872    8664     7208     3705     1984
>>>> 8    4544978944        15872    8711     7161     2919     1984
>>>> 9    4417209344        15872    3253     12619    6447     1984
>>>> ...
>>>> 1513   5523329024        15872    1        15871    15871   
1984
>>>> 1514   5526979584        15872    1        15871    15871   
1984
>>>> 1515   5530630144        15872    1        15871    15871   
1984
>>>> 1516   5534280704        15872    1        15871    15871   
1984
>>>> 1517   5537931264        15872    1        15871    15871   
1984
>>>> 1518   5541581824        15872    1        15871    15871   
1984
>>>> 1519   5545232384        15872    1        15871    15871   
1984
>>>> 1520   5548882944        15872    1        15871    15871   
1984
>>>> 1521   5552533504        15872    1        15871    15871   
1984
>>>> 1522   5556184064        15872    1        15871    15871   
1984
>>>>
>>>> ... all following group chains are similarly structured up to
#73 which
>>>> looks as follows:
>>>>
>>>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>>>> CRC32: 00000000   ECC: 0000
>>>> ##   Block#            Total    Used     Free     Contig   Size
>>>> 0    2583263232        15872    5341     10531    5153     1984
>>>> 1    4543613952        15872    5329     10543    5119     1984
>>>> 2    4532662272        15872    10753    5119     5119     1984
>>>> 3    4539963392        15872    3223     12649    7530     1984
>>>> 4    4536312832        15872    5219     10653    5534     1984
>>>> 5    4529011712        15872    6047     9825     3359     1984
>>>> 6    4525361152        15872    4475     11397    5809     1984
>>>> 7    4521710592        15872    3182     12690    5844     1984
>>>> 8    4518060032        15872    5881     9991     5131     1984
>>>> 9    4236966912        15872    10753    5119     5119     1984
>>>> ...
>>>> 2059651   4299026432        15872    4334     11538    4816    
1984
>>>> 2059652   4087293952        15872    7003     8869     2166    
1984
>>>> 2059653   4295375872        15872    6626     9246     5119    
1984
>>>> 2059654   4288074752        15872    509      15363    9662    
1984
>>>> 2059655   4291725312        15872    6151     9721     5119    
1984
>>>> 2059656   4284424192        15872    10052    5820     5119    
1984
>>>> 2059657   4277123072        15872    7383     8489     5120    
1984
>>>> 2059658   4273472512        15872    14       15858    5655    
1984
>>>> 2059659   4269821952        15872    2637     13235    7060    
1984
>>>> 2059660   4266171392        15872    10758    5114     3674    
1984
>>>> ...
>>>>
>>>> Assuming this would go on forever I stopped debugfs.ocfs2.
>>>>
>>>> With debugs.ocfs2 from ocfs2-tools 1.8.4 I get an identical
result.
>>>>
>>>> Please let me know if I can provide any further information and
help to
>>>> fix this issue.
>>>>
>>>> Thanks again + Best regards ... Michael
>>>>
>>>> On 03/24/2016 01:30 AM, Joseph Qi wrote:
>>>>> Hi Michael,
>>>>> Could you please use debugfs to check the output?
>>>>> # debugfs.ocfs2 -R 'stat //global_bitmap'
<device>
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>> On 2016/3/24 6:38, Michael Ulbrich wrote:
>>>>>> Hi ocfs2-users,
>>>>>>
>>>>>> my first post to this list from yesterday probably
didn't get through.
>>>>>>
>>>>>> Anyway, I've made some progress in the meantime and
may now ask more
>>>>>> specific questions ...
>>>>>>
>>>>>> I'm having issues with an 11 TB ocfs2 shared
filesystem on Debian Wheezy:
>>>>>>
>>>>>> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64
GNU/Linux
>>>>>>
>>>>>> the kernel modules are:
>>>>>>
>>>>>> modinfo ocfs2 -> version: 1.5.0
>>>>>>
>>>>>> using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.
>>>>>>
>>>>>> As an alternative I cloned and built the latest
ocfs2-tools from
>>>>>> markfasheh's ocfs2-tools on github which should be
version 1.8.4.
>>>>>>
>>>>>> The filesystem runs on top of drbd, is used to roughly
40 % and suffers
>>>>>> from read-only remounts and hanging clients since the
last reboot. This
>>>>>> may be DLM problems but I suspect they stem from some
corrupt disk
>>>>>> structures. Before that it all ran stable for months.
>>>>>>
>>>>>> This situation made me want to run fsck.ocfs2 and now I
wonder how to do
>>>>>> that. The filesystem is not mounted.
>>>>>>
>>>>>> With the stock ocfs-tools 1.6.4:
>>>>>>
>>>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 >
fsck_drbd1.log 2>&1
>>>>>> fsck.ocfs2 1.6.4
>>>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>>>   Label:              ocfs2_ASSET
>>>>>>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>>>>>>   Number of blocks:   5557283182
>>>>>>   Block size:         2048
>>>>>>   Number of clusters: 2778641591
>>>>>>   Cluster size:       4096
>>>>>>   Number of slots:    16
>>>>>>
>>>>>> I'm checking fsck_drbd1.log and find that it is
making progress in
>>>>>>
>>>>>> Pass 0a: Checking cluster allocation chains
>>>>>>
>>>>>> until it reaches "chain 73" and goes into an
infinite loop filling the
>>>>>> logfile with breathtaking speed.
>>>>>>
>>>>>> With the newly built ocfs-tools 1.8.4 I get:
>>>>>>
>>>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 >
fsck_drbd1.log 2>&1
>>>>>> fsck.ocfs2 1.8.4
>>>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>>>   Label:              ocfs2_ASSET
>>>>>>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>>>>>>   Number of blocks:   5557283182
>>>>>>   Block size:         2048
>>>>>>   Number of clusters: 2778641591
>>>>>>   Cluster size:       4096
>>>>>>   Number of slots:    16
>>>>>>
>>>>>> Again watching the verbose output in fsck_drbd1.log I
find that this
>>>>>> time it proceeds up to
>>>>>>
>>>>>> Pass 0a: Checking cluster allocation chains
>>>>>> o2fsck_pass0:1360 | found inode alloc 13 at block 13
>>>>>>
>>>>>> and stays there without any further progress. I've
terminated this
>>>>>> process after waiting for more than an hour.
>>>>>>
>>>>>> Now - I'm lost somehow ... and would very much
appreciate if anybody on
>>>>>> this list would share his knowledge and give me a hint
what to do next.
>>>>>>
>>>>>> What could be done to get this file system checked and
repaired? Am I
>>>>>> missing something important or do I just have to wait a
little bit
>>>>>> longer? Is there a version of ocfs2-tools / fsck.ocfs2
which will
>>>>>> perform as expected?
>>>>>>
>>>>>> I'm prepared to upgrade the kernel to
3.16.0-0.bpo.4-amd64 but shy away
>>>>>> from taking that risk without any clue of whether that
might solve my
>>>>>> problem ...
>>>>>>
>>>>>> Thanks in advance ... Michael Ulbrich
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
>

Joseph Qi

2016-Mar-25 00:36 UTC

head link

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Hi Michael,

On 2016/3/24 21:47, Michael Ulbrich wrote:> Hi Joseph,
> 
> thanks for this information although this does not sound too optimistic ...
> 
> So, if I understand you correctly, if we had a metadata backup from
> o2image _before_ the crash we could have looked up the missing info to
> remove the loop from group chain 73, right?If we have metadata backup, we can use o2image to restore it back, but
this may loss some data.
> 
> But how could the loop issue be fixed and at the same time the damage to
> the data be minimized? There is a recent file level backup from which
> damaged or missing files could be restored later.
> 
> 151   4054438912        15872    2152     13720    10606    1984
> 152   4094595072        15872    10753    5119     5119     1984
> 153   4090944512        15872    1818     14054    9646     1984 <--
> 154   4083643392        15872    571      15301    4914     1984
> 155   4510758912        15872    4834     11038    6601     1984
> 156   4492506112        15872    6532     9340     5119     1984
> 
> Could you describe a "brute force" way how to dd out and edit
record
> #153 to remove the loop and minimize potential loss of data at the same
> time? So that fsck would have a chance to complete and fix the remaining
> issues?This is dangerous until we can know exactly what's info the block should
store.

My idea is to find out the actual block of record #154 and let block
4090944512 of record #153 points to it. This must be a bit complicated
and should be done under deep understanding of the disk layout.

I have went though fsck.ocfs2 patches, and found the following may help:
commit efca4b0f2241 (Break a chain loop in group desc)
But as you said, you have already upgraded to version 1.8.4. So I'm sorry
currently I don't have a better idea.

Thanks,
Joseph> 
> Thanks a lot for your help ... Michael
> 
> On 03/24/2016 02:10 PM, Joseph Qi wrote:
>> Hi Michael,
>> So I think the block of record #153 goes wrong, which points next to
>> block 4083643392 of record #19.
>> But the problem is we don't know the right info of the block of
record
>> #153, otherwise we can dd out, edit it and then dd in to fix it.
>>
>> Thanks,
>> Joseph
>>
>> On 2016/3/24 18:38, Michael Ulbrich wrote:
>>> Hi Joseph,
>>>
>>> ok, got it! Here's the loop in chain 73:
>>>
>>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>>> CRC32: 00000000   ECC: 0000
>>> ##   Block#            Total    Used     Free     Contig   Size
>>> 0    4280773632        15872    11487    4385     1774     1984
>>> 1    2583263232        15872    5341     10531    5153     1984
>>> 2    4543613952        15872    5329     10543    5119     1984
>>> 3    4532662272        15872    10753    5119     5119     1984
>>> 4    4539963392        15872    3223     12649    7530     1984
>>> 5    4536312832        15872    5219     10653    5534     1984
>>> 6    4529011712        15872    6047     9825     3359     1984
>>> 7    4525361152        15872    4475     11397    5809     1984
>>> 8    4521710592        15872    3182     12690    5844     1984
>>> 9    4518060032        15872    5881     9991     5131     1984
>>> 10   4236966912        15872    10753    5119     5119     1984
>>> 11   4098245632        15872    10756    5116     3388     1984
>>> 12   4514409472        15872    8826     7046     5119     1984
>>> 13   3441144832        15872    15       15857    9680     1984
>>> 14   4404892672        15872    7563     8309     5119     1984
>>> 15   4233316352        15872    9398     6474     5114     1984
>>> 16   4488855552        15872    6358     9514     5119     1984
>>> 17   3901115392        15872    9932     5940     3757     1984
>>> 18   4507108352        15872    6557     9315     6166     1984
>>> 19   4083643392        15872    571      15301    4914     1984
<--
>>> 20   4510758912        15872    4834     11038    6601     1984
>>> 21   4492506112        15872    6532     9340     5119     1984
>>> 22   4496156672        15872    10753    5119     5119     1984
>>> 23   4503457792        15872    10718    5154     5119     1984
>>> ...
>>> 154   4083643392        15872    571      15301    4914     1984
<--
>>> 155   4510758912        15872    4834     11038    6601     1984
>>> 156   4492506112        15872    6532     9340     5119     1984
>>> 157   4496156672        15872    10753    5119     5119     1984
>>> 158   4503457792        15872    10718    5154     5119     1984
>>> ...
>>> 289   4083643392        15872    571      15301    4914     1984
<--
>>> 290   4510758912        15872    4834     11038    6601     1984
>>> 291   4492506112        15872    6532     9340     5119     1984
>>> 292   4496156672        15872    10753    5119     5119     1984
>>> 293   4503457792        15872    10718    5154     5119     1984
>>>
>>> etc.
>>>
>>> So the loop begins at record #154 and spans 135 records, right?
>>>
>>> Will backup fs metadata as soon as I have some external storage at
hand.
>>>
>>> Thanks a lot so far ... Michael
>>>
>>> On 03/24/2016 10:41 AM, Joseph Qi wrote:
>>>> Hi Michael,
>>>> It seems that dead loop happens in chain 73. You have formatted
using 2K
>>>> block and 4K cluster, so each chain should have 1522 or 1521
records.
>>>> But at first glance, I cannot figure out which block goes
wrong, because
>>>> the output you pasted indicates all blocks are different. So I
suggest
>>>> you investigate the all blocks which belong to chain 73 and try
to find
>>>> out if there is a loop there.
>>>> BTW, have you backed up the metadata using o2image?
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>> On 2016/3/24 16:40, Michael Ulbrich wrote:
>>>>> Hi Joseph,
>>>>>
>>>>> thanks a lot for your help. It is very much appreciated!
>>>>>
>>>>> I ran debugsfs.ocfs2 from ocfs2-tools 1.6.4 on the mounted
file system:
>>>>>
>>>>> root at s1a:~# debugfs.ocfs2 -R 'stat
//global_bitmap' /dev/drbd1 >
>>>>> debugfs_drbd1.log 2>&1
>>>>>
>>>>> Inode: 13   Mode: 0644   Generation: 1172963971
(0x45ea0283)
>>>>> FS Generation: 1172963971 (0x45ea0283)
>>>>> CRC32: 00000000   ECC: 0000
>>>>> Type: Regular   Attr: 0x0   Flags: Valid System Allocbitmap
Chain
>>>>> Dynamic Features: (0x0)
>>>>> User: 0 (root)   Group: 0 (root)   Size: 11381315956736
>>>>> Links: 1   Clusters: 2778641591
>>>>> ctime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>>> atime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>>> mtime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>>>>> dtime: 0x0 -- Thu Jan  1 01:00:00 1970
>>>>> ctime_nsec: 0x00000000 -- 0
>>>>> atime_nsec: 0x00000000 -- 0
>>>>> mtime_nsec: 0x00000000 -- 0
>>>>> Refcount Block: 0
>>>>> Last Extblk: 0   Orphan Slot: 0
>>>>> Sub Alloc Slot: Global   Sub Alloc Bit: 7
>>>>> Bitmap Total: 2778641591   Used: 1083108631   Free:
1695532960
>>>>> Clusters per Group: 15872   Bits per Cluster: 1
>>>>> Count: 115   Next Free Rec: 115
>>>>> ##   Total        Used         Free         Block#
>>>>> 0    24173056     9429318      14743738     4533995520
>>>>> 1    24173056     9421663      14751393     4548629504
>>>>> 2    24173056     9432421      14740635     4588817408
>>>>> 3    24173056     9427533      14745523     4548692992
>>>>> 4    24173056     9433978      14739078     4508568576
>>>>> 5    24173056     9436974      14736082     4636369920
>>>>> 6    24173056     9428411      14744645     4563390464
>>>>> 7    24173056     9426950      14746106     4479459328
>>>>> 8    24173056     9428099      14744957     4548851712
>>>>> 9    24173056     9431794      14741262     4585389056
>>>>> ...
>>>>> 105   24157184     9414241      14742943     4690652160
>>>>> 106   24157184     9419715      14737469     4467999744
>>>>> 107   24157184     9411479      14745705     4431525888
>>>>> 108   24157184     9413235      14743949     4559327232
>>>>> 109   24157184     9417948      14739236     4500950016
>>>>> 110   24157184     9411013      14746171     4566691840
>>>>> 111   24157184     9421252      14735932     4522916864
>>>>> 112   24157184     9416726      14740458     4537550848
>>>>> 113   24157184     9415358      14741826     4676303872
>>>>> 114   24157184     9420448      14736736     4526662656
>>>>>
>>>>> Group Chain: 0   Parent Inode: 13  Generation: 1172963971
>>>>> CRC32: 00000000   ECC: 0000
>>>>> ##   Block#            Total    Used     Free     Contig  
Size
>>>>> 0    4533995520        15872    6339     9533     3987    
1984
>>>>> 1    4530344960        15872    10755    5117     5117    
1984
>>>>> 2    2997109760        15872    10753    5119     5119    
1984
>>>>> 3    4526694400        15872    10753    5119     5119    
1984
>>>>> 4    3022663680        15872    10753    5119     5119    
1984
>>>>> 5    4512092160        15872    9043     6829     2742    
1984
>>>>> 6    4523043840        15872    4948     10924    9612    
1984
>>>>> 7    4519393280        15872    6150     9722     5595    
1984
>>>>> 8    4515742720        15872    4323     11549    6603    
1984
>>>>> 9    3771028480        15872    10753    5119     5119    
1984
>>>>> ...
>>>>> 1513   5523297280        15872    1        15871    15871  
1984
>>>>> 1514   5526947840        15872    1        15871    15871  
1984
>>>>> 1515   5530598400        15872    1        15871    15871  
1984
>>>>> 1516   5534248960        15872    1        15871    15871  
1984
>>>>> 1517   5537899520        15872    1        15871    15871  
1984
>>>>> 1518   5541550080        15872    1        15871    15871  
1984
>>>>> 1519   5545200640        15872    1        15871    15871  
1984
>>>>> 1520   5548851200        15872    1        15871    15871  
1984
>>>>> 1521   5552501760        15872    1        15871    15871  
1984
>>>>> 1522   5556152320        15872    1        15871    15871  
1984
>>>>>
>>>>> Group Chain: 1   Parent Inode: 13  Generation: 1172963971
>>>>> CRC32: 00000000   ECC: 0000
>>>>> ##   Block#            Total    Used     Free     Contig  
Size
>>>>> 0    4548629504        15872    10755    5117     2496    
1984
>>>>> 1    2993490944        15872    59       15813    14451   
1984
>>>>> 2    2489713664        15872    10758    5114     3726    
1984
>>>>> 3    3117609984        15872    3958     11914    6165    
1984
>>>>> 4    2544472064        15872    10753    5119     5119    
1984
>>>>> 5    3040948224        15872    10753    5119     5119    
1984
>>>>> 6    2971587584        15872    10753    5119     5119    
1984
>>>>> 7    4493871104        15872    8664     7208     3705    
1984
>>>>> 8    4544978944        15872    8711     7161     2919    
1984
>>>>> 9    4417209344        15872    3253     12619    6447    
1984
>>>>> ...
>>>>> 1513   5523329024        15872    1        15871    15871  
1984
>>>>> 1514   5526979584        15872    1        15871    15871  
1984
>>>>> 1515   5530630144        15872    1        15871    15871  
1984
>>>>> 1516   5534280704        15872    1        15871    15871  
1984
>>>>> 1517   5537931264        15872    1        15871    15871  
1984
>>>>> 1518   5541581824        15872    1        15871    15871  
1984
>>>>> 1519   5545232384        15872    1        15871    15871  
1984
>>>>> 1520   5548882944        15872    1        15871    15871  
1984
>>>>> 1521   5552533504        15872    1        15871    15871  
1984
>>>>> 1522   5556184064        15872    1        15871    15871  
1984
>>>>>
>>>>> ... all following group chains are similarly structured up
to #73 which
>>>>> looks as follows:
>>>>>
>>>>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>>>>> CRC32: 00000000   ECC: 0000
>>>>> ##   Block#            Total    Used     Free     Contig  
Size
>>>>> 0    2583263232        15872    5341     10531    5153    
1984
>>>>> 1    4543613952        15872    5329     10543    5119    
1984
>>>>> 2    4532662272        15872    10753    5119     5119    
1984
>>>>> 3    4539963392        15872    3223     12649    7530    
1984
>>>>> 4    4536312832        15872    5219     10653    5534    
1984
>>>>> 5    4529011712        15872    6047     9825     3359    
1984
>>>>> 6    4525361152        15872    4475     11397    5809    
1984
>>>>> 7    4521710592        15872    3182     12690    5844    
1984
>>>>> 8    4518060032        15872    5881     9991     5131    
1984
>>>>> 9    4236966912        15872    10753    5119     5119    
1984
>>>>> ...
>>>>> 2059651   4299026432        15872    4334     11538    4816
1984
>>>>> 2059652   4087293952        15872    7003     8869     2166
1984
>>>>> 2059653   4295375872        15872    6626     9246     5119
1984
>>>>> 2059654   4288074752        15872    509      15363    9662
1984
>>>>> 2059655   4291725312        15872    6151     9721     5119
1984
>>>>> 2059656   4284424192        15872    10052    5820     5119
1984
>>>>> 2059657   4277123072        15872    7383     8489     5120
1984
>>>>> 2059658   4273472512        15872    14       15858    5655
1984
>>>>> 2059659   4269821952        15872    2637     13235    7060
1984
>>>>> 2059660   4266171392        15872    10758    5114     3674
1984
>>>>> ...
>>>>>
>>>>> Assuming this would go on forever I stopped debugfs.ocfs2.
>>>>>
>>>>> With debugs.ocfs2 from ocfs2-tools 1.8.4 I get an identical
result.
>>>>>
>>>>> Please let me know if I can provide any further information
and help to
>>>>> fix this issue.
>>>>>
>>>>> Thanks again + Best regards ... Michael
>>>>>
>>>>> On 03/24/2016 01:30 AM, Joseph Qi wrote:
>>>>>> Hi Michael,
>>>>>> Could you please use debugfs to check the output?
>>>>>> # debugfs.ocfs2 -R 'stat //global_bitmap'
<device>
>>>>>>
>>>>>> Thanks,
>>>>>> Joseph
>>>>>>
>>>>>> On 2016/3/24 6:38, Michael Ulbrich wrote:
>>>>>>> Hi ocfs2-users,
>>>>>>>
>>>>>>> my first post to this list from yesterday probably
didn't get through.
>>>>>>>
>>>>>>> Anyway, I've made some progress in the meantime
and may now ask more
>>>>>>> specific questions ...
>>>>>>>
>>>>>>> I'm having issues with an 11 TB ocfs2 shared
filesystem on Debian Wheezy:
>>>>>>>
>>>>>>> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2
x86_64 GNU/Linux
>>>>>>>
>>>>>>> the kernel modules are:
>>>>>>>
>>>>>>> modinfo ocfs2 -> version: 1.5.0
>>>>>>>
>>>>>>> using stock ocfs2-tools 1.6.4-1+deb7u1 from the
distri.
>>>>>>>
>>>>>>> As an alternative I cloned and built the latest
ocfs2-tools from
>>>>>>> markfasheh's ocfs2-tools on github which should
be version 1.8.4.
>>>>>>>
>>>>>>> The filesystem runs on top of drbd, is used to
roughly 40 % and suffers
>>>>>>> from read-only remounts and hanging clients since
the last reboot. This
>>>>>>> may be DLM problems but I suspect they stem from
some corrupt disk
>>>>>>> structures. Before that it all ran stable for
months.
>>>>>>>
>>>>>>> This situation made me want to run fsck.ocfs2 and
now I wonder how to do
>>>>>>> that. The filesystem is not mounted.
>>>>>>>
>>>>>>> With the stock ocfs-tools 1.6.4:
>>>>>>>
>>>>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 >
fsck_drbd1.log 2>&1
>>>>>>> fsck.ocfs2 1.6.4
>>>>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>>>>   Label:              ocfs2_ASSET
>>>>>>>   UUID:              
6A1A0189A3F94E32B6B9A526DF9060F3
>>>>>>>   Number of blocks:   5557283182
>>>>>>>   Block size:         2048
>>>>>>>   Number of clusters: 2778641591
>>>>>>>   Cluster size:       4096
>>>>>>>   Number of slots:    16
>>>>>>>
>>>>>>> I'm checking fsck_drbd1.log and find that it is
making progress in
>>>>>>>
>>>>>>> Pass 0a: Checking cluster allocation chains
>>>>>>>
>>>>>>> until it reaches "chain 73" and goes into
an infinite loop filling the
>>>>>>> logfile with breathtaking speed.
>>>>>>>
>>>>>>> With the newly built ocfs-tools 1.8.4 I get:
>>>>>>>
>>>>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 >
fsck_drbd1.log 2>&1
>>>>>>> fsck.ocfs2 1.8.4
>>>>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>>>>   Label:              ocfs2_ASSET
>>>>>>>   UUID:              
6A1A0189A3F94E32B6B9A526DF9060F3
>>>>>>>   Number of blocks:   5557283182
>>>>>>>   Block size:         2048
>>>>>>>   Number of clusters: 2778641591
>>>>>>>   Cluster size:       4096
>>>>>>>   Number of slots:    16
>>>>>>>
>>>>>>> Again watching the verbose output in fsck_drbd1.log
I find that this
>>>>>>> time it proceeds up to
>>>>>>>
>>>>>>> Pass 0a: Checking cluster allocation chains
>>>>>>> o2fsck_pass0:1360 | found inode alloc 13 at block
13
>>>>>>>
>>>>>>> and stays there without any further progress.
I've terminated this
>>>>>>> process after waiting for more than an hour.
>>>>>>>
>>>>>>> Now - I'm lost somehow ... and would very much
appreciate if anybody on
>>>>>>> this list would share his knowledge and give me a
hint what to do next.
>>>>>>>
>>>>>>> What could be done to get this file system checked
and repaired? Am I
>>>>>>> missing something important or do I just have to
wait a little bit
>>>>>>> longer? Is there a version of ocfs2-tools /
fsck.ocfs2 which will
>>>>>>> perform as expected?
>>>>>>>
>>>>>>> I'm prepared to upgrade the kernel to
3.16.0-0.bpo.4-amd64 but shy away
>>>>>>> from taking that risk without any clue of whether
that might solve my
>>>>>>> problem ...
>>>>>>>
>>>>>>> Thanks in advance ... Michael Ulbrich
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-users mailing list
>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>
>>> .
>>>
>>
>>
> 
> .
>

Ocfs2 users - Mar 2016 - fsck.ocfs2 loops + hangs but does not check

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check