thr3ads.net - Lustre discuss - [Lustre-discuss] OST crash with group descriptors corrupted [Mar 2009]

If this information is useful, please help other people find it:
Share via:

thhsieh

2009-Mar-09 11:39 UTC

[Lustre-discuss] OST crash with group descriptors corrupted

Dear All,

We have an emergent condition on the Lustre filesystem.

We installed the lustre-1.6.6 with Linux kernel 2.6.22.19 on all the
MGS, MDT, OST servers and clients. They runs very well. But today
we encounter the disk array hardware problem (one of the hard disk
of the disk array RAID 6 crashed), and soon after that the lustre
filesystem got crashed, too.

After we replacing the bad hard disk with a new one, the disk array
seems rebuilding the RAID 6 data on the hard disk correctly. The file
servers seem can access the partitions of that disk array correctly.
But the OST partition on that disk array cannot be accessible now:

root at wd2:~# mount -t ldiskfs /dev/sdb1 /mnt/mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

The dmesg message shows:

[ 3314.530762] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors: Block
bitmap for group 11152 not in group (block 3407085568)!
[ 3314.531701] LDISKFS-fs: group descriptors corrupted!

If I run: ./tunefs.lustre --writeconf /dev/sdb1

Reading CONFIGS/mountdata

   Read previous values:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp


   Permanent disk data:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp

tunefs.lustre: Unable to mount /dev/sdb1: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)


The result of the command: "dumpe2fs /dev/sdb1"  gives:

Filesystem volume name:   cwork2-OST0000
Last mounted on:          <not available>
Filesystem UUID:          4f4323df-73a5-4e93-9a2d-2c2b9a6c3c60
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype n
eeds_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              101335040
Block count:              405336007
Reserved block count:     20266800
Free blocks:              164142148
Free inodes:              119852810
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      927
Blocks per group:         32768
Fragments per group:      32768
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype n
eeds_recovery extents sparse_super large_file
Filesystem flags:         signed directory hash 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              101335040
Block count:              405336007
Reserved block count:     20266800
Free blocks:              164142148
Free inodes:              119852810
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      927
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Thu Oct 16 15:29:21 2008

.....

  Block bitmap at 2820539742 (+2415232350), Inode bitmap at 2820539691
(+2415232299)
  Inode table at 2820539857-2820540368 (+2415232465)
  40232 free blocks, 20387 free inodes, 0 directories

dumpe2fs: /dev/sdb1: error reading bitmaps: Can''t read an block bitmap


It seems that the backend ext3 file system is still there, but has
errors.

Could anyone suggest me a way to recover the OST partitions? Can I use
e2fsck to fix the problems of the OST partitions?

The MGS and MDT seem to be ok, because they are not in the disk array.


Thanks very much for your kindly help.


Best Regards,

T.H.Hsieh

Brian J. Murrell

2009-Mar-09 18:13 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

On Mon, 2009-03-09 at 19:39 +0800, thhsieh wrote:> Dear All,
> 
> We have an emergent condition on the Lustre filesystem.
> 
> But today
> we encounter the disk array hardware problem (one of the hard disk
> of the disk array RAID 6 crashed), and soon after that the lustre
> filesystem got crashed, too.
> The dmesg message shows:
> 
> [ 3314.530762] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors:
Block bitmap for group 11152 not in group (block 3407085568)!
> [ 3314.531701] LDISKFS-fs: group descriptors corrupted!
It looks like your disk error has resulted on an on-disk corruption.
AFAIK, RAID is supposed to prevent this.  No idea why it didn''t in this
case.  Maybe check with your RAID vendor.
> It seems that the backend ext3 file system is still there, but has
> errors.
Indeed.
> Could anyone suggest me a way to recover the OST partitions? Can I use
> e2fsck to fix the problems of the OST partitions?
Yes, e2fsck should correct the problem(s).  Be aware that there is a
possibility that the only way for e2fsck to correct the state of the
filesystem is to (re-)move data from the filesystem.  To what extent,
will depend completely on how much on-disk corruption has taken place.

You can get an idea of what e2fsck will do without actually doing
anything to the disk data by giving it the "-n" argument.  You can
decide based on that "dry-run" e2fsck output whether the corrective
action it will take is acceptable to you.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090309/f8818307/attachment.bin

thhsieh

2009-Mar-10 10:14 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

Hello,

Thanks very much for your kindly reply.

We did the e2fsck (version 1.41.4) on all the OST partitions.
Thousands of errors prompted. But now we enounter a serious
error which I have no idea to fix. Even though the e2fsck
has finished checking, one of the OST partition still has
problem. The command:

./tunefs.lustre --writeconf /dev/sdb1

shows:

checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp


   Permanent disk data:
Target:     cwork2-OST0000
Index:      0
Lustre FS:  cwork2
Mount type: ldiskfs
Flags:      0x102
              (OST writeconf )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.168.10.50 at tcp

tunefs.lustre: Unable to mount /dev/sdb1: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)

and the kernel message prompted with the following error:

[80083.964462] LDISKFS-fs: group descriptors corrupted!
[81423.119834] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors:
Checksum for group 11165 failed (0!=20224)


We tried e2fsck with superblock 32768, but after some error
corrections again we encounter the same problem. How could
we fix this kind of problem? In any case, we are trying to
rescue the existing data as much as possible, and reformat
the whole filesystem after that.

Is there any other information I should provide in order to
make the situation more clear? Please let me know.

I am really thanksful for your kindly suggestions.


Best Regards,

T.H.Hsieh


On Mon, Mar 09, 2009 at 02:13:15PM -0400, Brian J. Murrell
wrote:> On Mon, 2009-03-09 at 19:39 +0800, thhsieh wrote:
> > Dear All,
> > 
> > We have an emergent condition on the Lustre filesystem.
> > 
> > But today
> > we encounter the disk array hardware problem (one of the hard disk
> > of the disk array RAID 6 crashed), and soon after that the lustre
> > filesystem got crashed, too.
> 
> > The dmesg message shows:
> > 
> > [ 3314.530762] LDISKFS-fs error (device sdb1):
ldiskfs_check_descriptors: Block bitmap for group 11152 not in group (block
3407085568)!
> > [ 3314.531701] LDISKFS-fs: group descriptors corrupted!
> 
> It looks like your disk error has resulted on an on-disk corruption.
> AFAIK, RAID is supposed to prevent this.  No idea why it didn''t in
this
> case.  Maybe check with your RAID vendor.
> 
> > It seems that the backend ext3 file system is still there, but has
> > errors.
> 
> Indeed.
> 
> > Could anyone suggest me a way to recover the OST partitions? Can I use
> > e2fsck to fix the problems of the OST partitions?
> 
> Yes, e2fsck should correct the problem(s).  Be aware that there is a
> possibility that the only way for e2fsck to correct the state of the
> filesystem is to (re-)move data from the filesystem.  To what extent,
> will depend completely on how much on-disk corruption has taken place.
> 
> You can get an idea of what e2fsck will do without actually doing
> anything to the disk data by giving it the "-n" argument.  You
can
> decide based on that "dry-run" e2fsck output whether the
corrective
> action it will take is acceptable to you.
> 
> b.
> 

> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

2009-Mar-10 14:04 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

On Tue, 2009-03-10 at 18:14 +0800, thhsieh wrote:> Hello,
Hi.
> Thanks very much for your kindly reply.
NP.
> We did the e2fsck (version 1.41.4) on all the OST partitions.
You did that with or without "-n" in the command arguments?
> [80083.964462] LDISKFS-fs: group descriptors corrupted!
> [81423.119834] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors:
Checksum for group 11165 failed (0!=20224)
Hrm.  I don''t know enough about the innards of ext3 to parse that.
Maybe (well, no maybes about it) Andreas will know if he is reading.
> We tried e2fsck with superblock 32768, but after some error
> corrections again we encounter the same problem.
Again, with our without e2fsck''s "-n" argument?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090310/80e65176/attachment.bin

thhsieh

2009-Mar-10 14:56 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

Hello,

We have done it without "-n" command augument.
Hence the filesystem is modified.

Best Regards,

T.H.Hsieh


On Tue, Mar 10, 2009 at 10:04:24AM -0400, Brian J. Murrell
wrote:> On Tue, 2009-03-10 at 18:14 +0800, thhsieh wrote:
> > Hello,
> 
> Hi.
> 
> > Thanks very much for your kindly reply.
> 
> NP.
> 
> > We did the e2fsck (version 1.41.4) on all the OST partitions.
> 
> You did that with or without "-n" in the command arguments?
> 
> > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > [81423.119834] LDISKFS-fs error (device sdb1):
ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)
> 
> Hrm.  I don''t know enough about the innards of ext3 to parse that.
> Maybe (well, no maybes about it) Andreas will know if he is reading.
> 
> > We tried e2fsck with superblock 32768, but after some error
> > corrections again we encounter the same problem.
> 
> Again, with our without e2fsck''s "-n" argument?
> 
> b.
> 

> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

thhsieh

2009-Mar-10 15:42 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

Hello,

I am wondering that, whether it is possible to give up that
problematic OST, and only make the other OSTs active, so that
we can rescue part of the data files ?

Now we have totally 6 OSTs, and one of the OST has problem
that I have no idea to repair now. If now I only activate
the 5 OSTs, could I get back (at most) 5/6 of data files,
or I can just get back junk files (since the files are
divided into fragments and are distributed into all OSTs) ?

If only activates the 5 OSTs can get back some data files,
what''s the procedure I could do ?

Because I am under time pressure to recover the system.
Hence I am considering the worst situation ....

Thanks so much for your kindly reply.

Best Regards,

T.H.Hsieh

On Tue, Mar 10, 2009 at 10:56:17PM +0800, thhsieh wrote:> Hello,
> 
> We have done it without "-n" command augument.
> Hence the filesystem is modified.
> 
> Best Regards,
> 
> T.H.Hsieh
> 
> 
> On Tue, Mar 10, 2009 at 10:04:24AM -0400, Brian J. Murrell wrote:
> > On Tue, 2009-03-10 at 18:14 +0800, thhsieh wrote:
> > > Hello,
> > 
> > Hi.
> > 
> > > Thanks very much for your kindly reply.
> > 
> > NP.
> > 
> > > We did the e2fsck (version 1.41.4) on all the OST partitions.
> > 
> > You did that with or without "-n" in the command arguments?
> > 
> > > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > > [81423.119834] LDISKFS-fs error (device sdb1):
ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)
> > 
> > Hrm.  I don''t know enough about the innards of ext3 to parse
that.
> > Maybe (well, no maybes about it) Andreas will know if he is reading.
> > 
> > > We tried e2fsck with superblock 32768, but after some error
> > > corrections again we encounter the same problem.
> > 
> > Again, with our without e2fsck''s "-n" argument?
> > 
> > b.
> > 
> 
> 
> 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2009-Mar-10 19:47 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

On Mar 10, 2009  23:42 +0800, thhsieh wrote:> I am wondering that, whether it is possible to give up that
> problematic OST, and only make the other OSTs active, so that
> we can rescue part of the data files ?
Yes, this is always possible.  Just mout lustre as normal, and
on all clients + MDS run "lctl set_param osc.*OST{number}*.active=0"
so that they will return an EIO error instead of hanging and waiting
for the failed OST to return.

That said, I don''t think this is necessarily a fatal problem.
> Now we have totally 6 OSTs, and one of the OST has problem
> that I have no idea to repair now. If now I only activate
> the 5 OSTs, could I get back (at most) 5/6 of data files,
> or I can just get back junk files (since the files are
> divided into fragments and are distributed into all OSTs) ?
Lustre by default places each file on a single OST, so you
should be able to get back 5/6 of your files.
> If only activates the 5 OSTs can get back some data files,
> what''s the procedure I could do ?
Use "lfs find --obd {OST_UUID} /mount/point" to find files that
are on the failed OST.  Hmm, there isn''t a way to specify the
opposite, however "lfs find ! --obd {OST_UUID}", please file a
bug for that, it is relatively easy to implement (or you could
take a crack at it in lustre/utils/lfs.c::lfs_find()).
> Because I am under time pressure to recover the system.
> Hence I am considering the worst situation ....
I think you can possibly recover this OST.
> > > You did that with or without "-n" in the command
arguments?
> > > 
> > > > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > > > [81423.119834] LDISKFS-fs error (device sdb1):
ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)
It looks like this is a simple bug in the ldiskfs code AND in the e2fsck
code.  The feature that enables group checksums (uninit_bg) was disabled
in the superblock for some reason, but e2fsck didn''t clear the checksum
from disk.  Now, the kernel is returning "0" for the checksum (because
this feature is disabled) but there is an old checksum value on disk.

The easiest way to fix this (short of modifying the kernel and/or e2fsck)
is to re-enable the uninit_bg feature, and re-run e2fsck.  Note that
running with uninit_bg is preferable in any case, as it improves performance.

# tune2fs -O uninit_bg /dev/XXX
# e2fsck -fy /dev/XXX

This will report an error for all of the checksum values and correct
them, but then hopefully your filesystem can be mounted again.  Please
file a separate bug on this, it needs to be fixed in our uninit_bg code
to ignore the checksum if the feature is disabled, and in e2fsck to zero
this value if the feature is disabled.
> > > > We did the e2fsck (version 1.41.4) on all the OST
partitions.
Note that e2fsck 1.41.4 is the upstream e2fsprogs, and not the Lustre-patched
e2fsprogs-1.40.11.sun1.  While the majority of Lustre (now ext4) functionality
is included into 1.41.4 it isn''t all there.  In this case I
don''t know if it
matters or not.

Also note that the "uninit_bg" feature was called
"uninit_groups" in the
1.40.11 release of Lustre e2fsprogs (this was changed beyond our control),
so adjust the above steps accordingly.
> > > Hrm.  I don''t know enough about the innards of ext3 to
parse that.
> > > Maybe (well, no maybes about it) Andreas will know if he is
reading.
> > > 
> > > > We tried e2fsck with superblock 32768, but after some error
> > > > corrections again we encounter the same problem.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

thhsieh

2009-Mar-11 05:27 UTC

head link

[Lustre-discuss] OST crash with group descriptors corrupted

Hello,

Thanks so much to Andreas, Megan, and Brian. Following the suggestions
of Andreas, now all the OSTs are recovered and functional.

Due to the large time pressure, and unfortunately I have to be out of
office and can only access limited networking, I cannot get the lustre
patched e2fsprogs-1.40.11.sun1 to work. So I still use e2fsprogs-1.41.4.

After doing

# tune2fs -O uninit_bg /dev/XXX
# e2fsck -fy /dev/XXX

but still I cannot run the "tunefs.lustre --writeback /dev/xxx".
Kernel
message complained that "Missing journal". Therefore, I tried to run:

# tune2fs -j /dev/XXX

This time everything works !!!!!  :)

Now I make the system on-line for users to download their data. For
safe I guess I still need to run:

lfs find --obd {OST_UUID} /mount/point

or anything I need to do in order to ensure the consistancy of the
lustre filesystem ?

Please give me suggestions.

Thanks very much again.


T.H.Hsieh

On Tue, Mar 10, 2009 at 01:47:39PM -0600, Andreas Dilger
wrote:> On Mar 10, 2009  23:42 +0800, thhsieh wrote:
> > I am wondering that, whether it is possible to give up that
> > problematic OST, and only make the other OSTs active, so that
> > we can rescue part of the data files ?
> 
> Yes, this is always possible.  Just mout lustre as normal, and
> on all clients + MDS run "lctl set_param
osc.*OST{number}*.active=0"
> so that they will return an EIO error instead of hanging and waiting
> for the failed OST to return.
> 
> That said, I don''t think this is necessarily a fatal problem.
> 
> > Now we have totally 6 OSTs, and one of the OST has problem
> > that I have no idea to repair now. If now I only activate
> > the 5 OSTs, could I get back (at most) 5/6 of data files,
> > or I can just get back junk files (since the files are
> > divided into fragments and are distributed into all OSTs) ?
> 
> Lustre by default places each file on a single OST, so you
> should be able to get back 5/6 of your files.
> 
> > If only activates the 5 OSTs can get back some data files,
> > what''s the procedure I could do ?
> 
> Use "lfs find --obd {OST_UUID} /mount/point" to find files that
> are on the failed OST.  Hmm, there isn''t a way to specify the
> opposite, however "lfs find ! --obd {OST_UUID}", please file a
> bug for that, it is relatively easy to implement (or you could
> take a crack at it in lustre/utils/lfs.c::lfs_find()).
> 
> > Because I am under time pressure to recover the system.
> > Hence I am considering the worst situation ....
> 
> I think you can possibly recover this OST.
> 
> > > > You did that with or without "-n" in the command
arguments?
> > > > 
> > > > > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > > > > [81423.119834] LDISKFS-fs error (device sdb1):
ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)
> 
> It looks like this is a simple bug in the ldiskfs code AND in the e2fsck
> code.  The feature that enables group checksums (uninit_bg) was disabled
> in the superblock for some reason, but e2fsck didn''t clear the
checksum
> from disk.  Now, the kernel is returning "0" for the checksum
(because
> this feature is disabled) but there is an old checksum value on disk.
> 
> The easiest way to fix this (short of modifying the kernel and/or e2fsck)
> is to re-enable the uninit_bg feature, and re-run e2fsck.  Note that
> running with uninit_bg is preferable in any case, as it improves
performance.
> 
> # tune2fs -O uninit_bg /dev/XXX
> # e2fsck -fy /dev/XXX
> 
> This will report an error for all of the checksum values and correct
> them, but then hopefully your filesystem can be mounted again.  Please
> file a separate bug on this, it needs to be fixed in our uninit_bg code
> to ignore the checksum if the feature is disabled, and in e2fsck to zero
> this value if the feature is disabled.
> 
> > > > > We did the e2fsck (version 1.41.4) on all the OST
partitions.
> 
> Note that e2fsck 1.41.4 is the upstream e2fsprogs, and not the
Lustre-patched
> e2fsprogs-1.40.11.sun1.  While the majority of Lustre (now ext4)
functionality
> is included into 1.41.4 it isn''t all there.  In this case I
don''t know if it
> matters or not.
> 
> Also note that the "uninit_bg" feature was called
"uninit_groups" in the
> 1.40.11 release of Lustre e2fsprogs (this was changed beyond our control),
> so adjust the above steps accordingly.
> 
> > > > Hrm.  I don''t know enough about the innards of ext3
to parse that.
> > > > Maybe (well, no maybes about it) Andreas will know if he is
reading.
> > > > 
> > > > > We tried e2fsck with superblock 32768, but after some
error
> > > > > corrections again we encounter the same problem.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>

Lustre discuss - Mar 2009 - OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted

[Lustre-discuss] OST crash with group descriptors corrupted