thr3ads.net - Lustre discuss - [Lustre-discuss] fsck.ext4 for device ... exited with signal 11. [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Craig Prescott

2010-Dec-01 22:16 UTC

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Hi,

We are trying to fsck an OST that was not unmounted
cleanly.  But fsck is dying with this error after making some corrections:

[root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
...
High 16 bits of extent/index block set
CLEARED.
Inode 306602015 has an invalid extent node (blk 512, lblk 641536)
Clear? yes

Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited with signal 11.

It is repeatable.

So we are stuck.  We need to fsck our OST, but fsck is dying.  Can 
anyone give us some advice on how to proceed?

Thanks,
Craig Prescott
UF HPC Center

Craig Prescott

2010-Dec-01 22:20 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

I forgot to add - our affected OSS is running Lustre 1.8.4, and
e2fsprogs-1.41.10.sun2-0redhat.  `uname -r` gives

2.6.18-194.3.1.0.1.el5_lustre.1.8.4

Thanks,
Craig Prescott
UF HPC Center


Craig Prescott wrote:> Hi,
> 
> We are trying to fsck an OST that was not unmounted
> cleanly.  But fsck is dying with this error after making some corrections:
> 
> [root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
> ...
> High 16 bits of extent/index block set
> CLEARED.
> Inode 306602015 has an invalid extent node (blk 512, lblk 641536)
> Clear? yes
> 
> Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited with signal
11.
> 
> It is repeatable.
> 
> So we are stuck.  We need to fsck our OST, but fsck is dying.  Can 
> anyone give us some advice on how to proceed?
> 
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Colin Faber

2010-Dec-01 22:38 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Hi,

Try upgrading to the latest e2fsprogs package. 1.41.12.2

-cf


On 12/01/2010 03:20 PM, Craig Prescott wrote:> I forgot to add - our affected OSS is running Lustre 1.8.4, and
> e2fsprogs-1.41.10.sun2-0redhat.  `uname -r` gives
>
> 2.6.18-194.3.1.0.1.el5_lustre.1.8.4
>
> Thanks,
> Craig Prescott
> UF HPC Center
>
>
> Craig Prescott wrote:
>> Hi,
>>
>> We are trying to fsck an OST that was not unmounted
>> cleanly.  But fsck is dying with this error after making some
corrections:
>>
>> [root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
>> ...
>> High 16 bits of extent/index block set
>> CLEARED.
>> Inode 306602015 has an invalid extent node (blk 512, lblk 641536)
>> Clear? yes
>>
>> Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited with
signal 11.
>>
>> It is repeatable.
>>
>> So we are stuck.  We need to fsck our OST, but fsck is dying.  Can
>> anyone give us some advice on how to proceed?
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Dec-01 23:02 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

On 2010-12-01, at 15:16, Craig Prescott wrote:> We are trying to fsck an OST that was not unmounted
> cleanly.  But fsck is dying with this error after making some corrections:
> 
> [root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
> ...
> High 16 bits of extent/index block set
> CLEARED.
> Inode 306602015 has an invalid extent node (blk 512, lblk 641536)
> Clear? yes
> 
> Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited with signal
11.
> 
> It is repeatable.
> 
> So we are stuck.  We need to fsck our OST, but fsck is dying.  Can 
> anyone give us some advice on how to proceed?
Hmm, IIRC signal 11 normally means memory errors, though it could be a software
bug in e2fsck.

Do you have enough RAM to run e2fsck on this node?  Have you tried running it
under gdb to see if it can catch the sig11 and print a stack trace?

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Craig Prescott

2010-Dec-01 23:11 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Andreas Dilger wrote:> Do you have enough RAM to run e2fsck on this node?  Have you tried running
it under gdb to see if it can catch the sig11 and print a stack trace?
Yup, plenty of RAM - we''ve got 32GB in this node.

We''ve already started up fsck again using Colin''s suggestion
of
e2fsprogs-1.41.12.2.  So far so good.  But if we need to fire it up 
under gdb, I guess that''s what we''ll do.

Thanks,
Craig Prescott
UF HPC Center

Larry

2010-Dec-02 04:14 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Old version of e2fsprogs actually has bugs like this, the newer the
better, I think

On Thu, Dec 2, 2010 at 7:11 AM, Craig Prescott <prescott at hpc.ufl.edu>
wrote:> Andreas Dilger wrote:
>> Do you have enough RAM to run e2fsck on this node? ?Have you tried
running it under gdb to see if it can catch the sig11 and print a stack trace?
>
> Yup, plenty of RAM - we''ve got 32GB in this node.
>
> We''ve already started up fsck again using Colin''s
suggestion of
> e2fsprogs-1.41.12.2. ?So far so good. ?But if we need to fire it up
> under gdb, I guess that''s what we''ll do.
>
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Craig Prescott

2010-Dec-02 16:24 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Thanks very much for this suggestion.  The fsck using  is not dying.  It 
quickly passed the point where the original fscks barfed (using 
e2fsprogs-1.41.10.sun2).

But the fsck using seems to be going extremely slowly - it ran all 
night, and is still running.  This is very abnormal, as fsck''s on the 
OSTs in this filesystem usually take on order of 30 minutes.  I''d like 
to understand better what fsck is doing at this time.

fsck seems to be spending a lot of time in Pass1D, cloning 
multiply-claimed blocks.  But there is no output from fsck in many hours 
now,

1) fsck.ext4 is using 100% of a 2.2GHz core.  The progress of the fsck 
seems to be CPU bound for a long time (many hours).  We''re not used to 
seeing this.

2) Using iostat, I can see the I/O rates are very low (10s of KB/s read 
and write).

3) Using strace, I can see a pattern of read()/seek()/write()/seek() 
being repeated over and over.  I guess this should not be surprising if 
fsck is really cloning multiply-claimed blocks.

4) Using pstack, I can see fsck.ext4 is in ext2fs_block_iterate2() - it 
looks like there is a lot of time being spent in ext2fs_new_block().

I''d like to understand what fsck is doing that takes so much CPU.  The 
OST was pretty full (~90%)... Is it computationally expensive to clone 
multiply-claimed blocks on a filesystem this full?

I''m also wondering if I should let this continue or not.

I appended a bit of the strace output.  From the offset arg to the 
lseek() calls, it looks like data is being copied from one side of the 
spindles to the other(?).

Thanks,
Craig Prescott
UF HPC Center

Sample strace output:

...
read(3, 
"\313R\354\222\205%\16\227\221,\226\35\317\22\331,0\312\262\330\252\314wI\2\345^\305\222d\273$"...,
4096) = 4096
lseek(3, 36574076928, SEEK_SET)         = 36574076928
write(3, "\35z\354 
\252\370\24\317\323\236VL]NF;\335\303\16w&\n\312\236F\0\3664RK\366\304"...,
4096) = 4096
lseek(3, 7424726908928, SEEK_SET)       = 7424726908928
...

Colin Faber wrote:> Hi,
> 
> Try upgrading to the latest e2fsprogs package. 1.41.12.2
> 
> -cf
> 
> 
> On 12/01/2010 03:20 PM, Craig Prescott wrote:
>> I forgot to add - our affected OSS is running Lustre 1.8.4, and
>> e2fsprogs-1.41.10.sun2-0redhat.  `uname -r` gives
>>
>> 2.6.18-194.3.1.0.1.el5_lustre.1.8.4
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>>
>>
>> Craig Prescott wrote:
>>> Hi,
>>>
>>> We are trying to fsck an OST that was not unmounted
>>> cleanly.  But fsck is dying with this error after making some 
>>> corrections:
>>>
>>> [root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
>>> ...
>>> High 16 bits of extent/index block set
>>> CLEARED.
>>> Inode 306602015 has an invalid extent node (blk 512, lblk 641536)
>>> Clear? yes
>>>
>>> Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited with 
>>> signal 11.
>>>
>>> It is repeatable.
>>>
>>> So we are stuck.  We need to fsck our OST, but fsck is dying.  Can
>>> anyone give us some advice on how to proceed?
>>>
>>> Thanks,
>>> Craig Prescott
>>> UF HPC Center
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Dec-02 17:52 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

On 2010-12-02, at 09:24, Craig Prescott wrote:> But the fsck using seems to be going extremely slowly - it ran all 
> night, and is still running.  This is very abnormal, as fsck''s on
the
> OSTs in this filesystem usually take on order of 30 minutes.  I''d
like
> to understand better what fsck is doing at this time.
> 
> fsck seems to be spending a lot of time in Pass1D, cloning 
> multiply-claimed blocks.  But there is no output from fsck in many hours 
> now,
Pass 1b-1d have O(n^2) complexity, and require a second pass through all of the
metadata, so if there are a large number of duplicate blocks it can take a long
time.
> 1) fsck.ext4 is using 100% of a 2.2GHz core.  The progress of the fsck 
> seems to be CPU bound for a long time (many hours).  We''re not
used to
> seeing this.
If there are a limited number of files, you can restart e2fsck with the option
"-E shared=delete", which will cause the inodes with shared blocks to
be deleted.  It will of course cause that data to be lost, but it will allow
e2fsck to complete much more quickly.
> 4) Using pstack, I can see fsck.ext4 is in ext2fs_block_iterate2() - it 
> looks like there is a lot of time being spent in ext2fs_new_block().
This is a major contributor to the slowdown - the code in libext2fs for
allocating blocks is quite slow, and does not necessarily make very good
allocations.
> I''d like to understand what fsck is doing that takes so much CPU. 
The
> OST was pretty full (~90%)... Is it computationally expensive to clone 
> multiply-claimed blocks on a filesystem this full?
> 
> I''m also wondering if I should let this continue or not.
> 
> I appended a bit of the strace output.  From the offset arg to the 
> lseek() calls, it looks like data is being copied from one side of the 
> spindles to the other(?).
> 
> Thanks,
> Craig Prescott
> UF HPC Center
> 
> 
> Sample strace output:
> 
> ...
> read(3, 
>
"\313R\354\222\205%\16\227\221,\226\35\317\22\331,0\312\262\330\252\314wI\2\345^\305\222d\273$"...,
> 4096) = 4096
> lseek(3, 36574076928, SEEK_SET)         = 36574076928
> write(3, "\35z\354 
>
\252\370\24\317\323\236VL]NF;\335\303\16w&\n\312\236F\0\3664RK\366\304"...,
> 4096) = 4096
> lseek(3, 7424726908928, SEEK_SET)       = 7424726908928
> ...
> 
> 
> 
> 
> 
> Colin Faber wrote:
>> Hi,
>> 
>> Try upgrading to the latest e2fsprogs package. 1.41.12.2
>> 
>> -cf
>> 
>> 
>> On 12/01/2010 03:20 PM, Craig Prescott wrote:
>>> I forgot to add - our affected OSS is running Lustre 1.8.4, and
>>> e2fsprogs-1.41.10.sun2-0redhat.  `uname -r` gives
>>> 
>>> 2.6.18-194.3.1.0.1.el5_lustre.1.8.4
>>> 
>>> Thanks,
>>> Craig Prescott
>>> UF HPC Center
>>> 
>>> 
>>> Craig Prescott wrote:
>>>> Hi,
>>>> 
>>>> We are trying to fsck an OST that was not unmounted
>>>> cleanly.  But fsck is dying with this error after making some 
>>>> corrections:
>>>> 
>>>> [root at XXXXXX tmp]# fsck -f -y /dev/arc1-lv2/OST0003
>>>> ...
>>>> High 16 bits of extent/index block set
>>>> CLEARED.
>>>> Inode 306602015 has an invalid extent node (blk 512, lblk
641536)
>>>> Clear? yes
>>>> 
>>>> Warning... fsck.ext4 for device /dev/arc1-lv2/OST0003 exited
with
>>>> signal 11.
>>>> 
>>>> It is repeatable.
>>>> 
>>>> So we are stuck.  We need to fsck our OST, but fsck is dying. 
Can
>>>> anyone give us some advice on how to proceed?
>>>> 
>>>> Thanks,
>>>> Craig Prescott
>>>> UF HPC Center
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Craig Prescott

2010-Dec-03 15:06 UTC

head link

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

Andreas Dilger wrote:> On 2010-12-02, at 09:24, Craig Prescott wrote:
>> fsck seems to be spending a lot of time in Pass1D, cloning 
>> multiply-claimed blocks.  But there is no output from fsck in many
hours
>> now,
> 
> Pass 1b-1d have O(n^2) complexity, and require a second pass through all of
the metadata, so if there are a large number of duplicate blocks it can take a
long time.
> 
>> 1) fsck.ext4 is using 100% of a 2.2GHz core.  The progress of the fsck 
>> seems to be CPU bound for a long time (many hours).  We''re not
used to
>> seeing this.
> 
> If there are a limited number of files, you can restart e2fsck with the
option "-E shared=delete", which will cause the inodes with shared
blocks to be deleted.  It will of course cause that data to be lost, but it will
allow e2fsck to complete much more quickly.
>
Well, we restarted fsck with the "-E shared=delete" option.  It has
been
running for about 16 hours at 100% CPU, almost all of it in Pass 1D 
(where it still is), and has deleted 58 files.

For all I know, these are the only 58 files fsck has even considered, so 
we are thinking about giving up on this and reformatting the OST.

Is there any way to estimate wallclock time required by Pass 1D?  Our 
~8TB OST had approximately 30k 2GB files on it.  Is there any way to 
estimate wallclock time required (ballpark)?

Thanks,
Craig Prescott
UF HPC Center

Lustre discuss - Dec 2010 - fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

[Lustre-discuss] fsck.ext4 for device ... exited with signal 11.