thr3ads.net - Lustre discuss - [Lustre-discuss] fsck of OST problems - endless loop restarting pass 1 [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Craig Prescott

2009-Dec-01 20:56 UTC

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Hope someone can help us out with this one.

We are running Lustre 1.8.1.1.  One of our two OSS nodes (12 OSTs) 
become unresponsive on Sunday night.  We issued an IPMI power cycle.

After the node was back up, we tried to fsck the OSTs 
(e2fsprogs-1.41.6.sun1-0redhat.x86_64) with ''fsck -f -y''. 
Eleven of the
twelve OSTs fsck''d normally.  The 12th OST showed heavy corruption,
with
many inodes moved to /lost+found.  This fsck never finished, and we 
killed it after ~14 hours.

All further fsck attempts seem to endlessly get kicked back to pass 1 
after many zero dtime corrections, and relocating many group block 
bitmaps, inode bitmaps, and inode tables.  It seems that many of these 
changes are never written out to the filesystem, as we encounter the 
same corrections on subsequent pass 1 restarts.  Actually, it looks like 
every *other* attempt to run pass 1 yields similar output, as if fsck is 
bouncing back and forth between two solutions.

We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from sourceforge. 
   Logs (enormous) of the fsck attempts are available here:

http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck attempts)
http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck attempts)

Can any part of this OST be salvaged?

Thanks,
Craig Prescott
UF HPC Center


 From the initial fsck:

fsck.ext4: Group descriptors look bad... trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

Superblock has_journal flag is clear, but a journal inode is present.
Clear? yes

Pass 1: Checking inodes, blocks, and sizes
Journal inode is not in use, but contains data.  Clear? yes


Inodes that were part of a corrupted orphan linked list found.  Fix? yes

Inode 32784385 was part of the orphaned inode list.  FIXED.
Inode 32784385 has imagic flag set.  Clear? yes

...

File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008)
   has 506488 multiply-claimed block(s), shared with 7 file(s):
         ??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008)
         ... (inode #114786317, mod time Fri Oct 10 14:03:48 2008)
         ... (inode #114786315, mod time Fri Oct 10 14:03:48 2008)
         ??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008)
         ... (inode #114786311, mod time Fri Oct 10 14:03:48 2008)
         ... (inode #114786309, mod time Fri Oct 10 14:03:48 2008)
         ??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008)
Clone multiply-claimed blocks? yes

...

Andreas Dilger

2009-Dec-01 23:50 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

On 2009-12-01, at 13:56, Craig Prescott wrote:> We are running Lustre 1.8.1.1.  One of our two OSS nodes (12 OSTs)
> become unresponsive on Sunday night.  We issued an IPMI power cycle.
>
> After the node was back up, we tried to fsck the OSTs
> (e2fsprogs-1.41.6.sun1-0redhat.x86_64) with ''fsck -f -y''.
Eleven of
> the twelve OSTs fsck''d normally.  The 12th OST showed heavy  
> corruption, with many inodes moved to /lost+found.  This fsck never  
> finished, and we killed it after ~14 hours.
>
> All further fsck attempts seem to endlessly get kicked back to pass 1
> after many zero dtime corrections, and relocating many group block
> bitmaps, inode bitmaps, and inode tables.  It seems that many of these
> changes are never written out to the filesystem, as we encounter the
> same corrections on subsequent pass 1 restarts.  Actually, it looks  
> like every *other* attempt to run pass 1 yields similar output, as  
> if fsck is bouncing back and forth between two solutions.
>
> We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from  
> sourceforge.
>   Logs (enormous) of the fsck attempts are available here:
>
> http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck  
> attempts)
> http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck  
> attempts)
>
> Can any part of this OST be salvaged?
It''s possible, though I''m not sure how much will be left,
after the
volume of messages that I saw.

I would start by simply trying to mount the OST filesystem with  
ldiskfs directly (mount options "-o ro" to avoid any further  
corruption or errors, and possibly also "noload" to avoid recovering  
the journal), and seeing if you can copy out the data from the  
filesystem into a backup filesystem, and then just reformat the OST.

You should copy out the files with a tool that has xattr support, like  
rsync v3, or the RHEL tar using the --xattr option.

Failing that, you may be able to e2fsck using a backup superblock and  
group descriptor with the "-B 4096 -b {blocknr}", where:

blocknr = 32768 * {3,5,7}^n

I don''t think the first backup group descriptor is valid (that would  
be n=0 above, or 32768), so you could try (at random) 32768 * 3^2 =  
294912.
If you can get it mounted at all you should copy the data out.  If you  
have a very new kernel you may be able to mount the filesystem with  
ext4 (so that you don''t need to re-create the journal) to copy the  
data out.

For the objects in the lost+found directory ll_recover_lost_found_objs  
will "rescue" all of these objects and put them back into the right  
directory structure for Lustre to find them again.
> From the initial fsck:
>
> fsck.ext4: Group descriptors look bad... trying backup blocks...
> Superblock has an invalid journal (inode 8).
> Clear? yes
>
> *** ext3 journal has been deleted - filesystem is now ext2 only ***
>
> Superblock has_journal flag is clear, but a journal inode is present.
> Clear? yes
>
> Pass 1: Checking inodes, blocks, and sizes
> Journal inode is not in use, but contains data.  Clear? yes
>
>
> Inodes that were part of a corrupted orphan linked list found.  Fix?  
> yes
>
> Inode 32784385 was part of the orphaned inode list.  FIXED.
> Inode 32784385 has imagic flag set.  Clear? yes
>
> ...
>
> File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008)
>   has 506488 multiply-claimed block(s), shared with 7 file(s):
>         ??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786317, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786315, mod time Fri Oct 10 14:03:48 2008)
>         ??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786311, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786309, mod time Fri Oct 10 14:03:48 2008)
>         ??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008)
> Clone multiply-claimed blocks? yes
>
> ...
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Craig Prescott

2009-Dec-02 02:01 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Thanks for the reply, Andreas.

Andreas Dilger wrote:> I would start by simply trying to mount the OST filesystem with ldiskfs 
> directly (mount options "-o ro" to avoid any further corruption
or
> errors, and possibly also "noload" to avoid recovering the
journal), and
> seeing if you can copy out the data from the filesystem into a backup 
> filesystem, and then just reformat the OST.
Unfortunately, this did not work:

[root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/F3P1L0/T2-F3P1L0,
        missing codepage or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

In dmesg I see this:

LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum for 
group 256 failed (18306!=0)

LDISKFS-fs: group descriptors corrupted!

Adding "noload" to the options list did not change anything.
> You should copy out the files with a tool that has xattr support, like 
> rsync v3, or the RHEL tar using the --xattr option.
> 
> Failing that, you may be able to e2fsck using a backup superblock and 
> group descriptor with the "-B 4096 -b {blocknr}", where:
> 
> blocknr = 32768 * {3,5,7}^n
> 
> I don''t think the first backup group descriptor is valid (that
would be
> n=0 above, or 32768), so you could try (at random) 32768 * 3^2 = 294912.
I tried fsck with from the 1.41.6 Lustre package with the ''-p''
option
with several values of n and all three values {3,5,7}.  Nearly all 
attempts look like this one - the same block is complained about 
*almost* every time:

[root at tebow2 ~]# fsck -b 294912 -B 4096 -f -p /dev/F3P1L0/T2-F3P1L0
fsck 1.41.6.sun1 (30-May-2009)
crn-OST0011: Block bitmap for group 6016 is not in group.  (block 484237063)

Seems that particular groups get complained about, FWIW, 6016 and 10112.

However, with n=1 and 7 as the multiplier, the fsck -p output was a bit 
different (different block, zeroed some checksums for group descriptors) 
- am trying an fsck with that superblock and "-y" now.
> If you can get it mounted at all you should copy the data out.  If you 
> have a very new kernel you may be able to mount the filesystem with ext4 
> (so that you don''t need to re-create the journal) to copy the data
out.
> 
> For the objects in the lost+found directory ll_recover_lost_found_objs 
> will "rescue" all of these objects and put them back into the
right
> directory structure for Lustre to find them again.
Hopefully we can get it mounted and rescue the data.

We appreciate your help.

Thanks,
Craig Prescott
UF HPC Center

Andreas Dilger

2009-Dec-02 02:43 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

On 2009-12-01, at 19:01, Craig Prescott wrote:> Andreas Dilger wrote:
>> I would start by simply trying to mount the OST filesystem with  
>> ldiskfs directly (mount options "-o ro" to avoid any further
>> corruption or errors, and possibly also "noload" to avoid  
>> recovering the journal), and seeing if you can copy out the data  
>> from the filesystem into a backup filesystem, and then just  
>> reformat the OST.
>
> Unfortunately, this did not work:
>
> [root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/F3P1L0/T2- 
> F3P1L0,
>       missing codepage or other error
>       In some cases useful info is found in syslog - try
>       dmesg | tail  or so
>
> In dmesg I see this:
>
> LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum  
> for group 256 failed (18306!=0)
> LDISKFS-fs: group descriptors corrupted!
You may want to disable the group descriptor checksums with:

debugfs -R "feature ^uninit_bg" {dev}

and then retry the mount and/or e2fsck.  This feature is making it  
more difficult to use the backup descriptors for some reason.
>
> Adding "noload" to the options list did not change anything.
>
>> You should copy out the files with a tool that has xattr support,  
>> like rsync v3, or the RHEL tar using the --xattr option.
>> Failing that, you may be able to e2fsck using a backup superblock  
>> and group descriptor with the "-B 4096 -b {blocknr}", where:
>> blocknr = 32768 * {3,5,7}^n
>> I don''t think the first backup group descriptor is valid (that
>> would be n=0 above, or 32768), so you could try (at random) 32768 *  
>> 3^2 = 294912.
>
> I tried fsck with from the 1.41.6 Lustre package with the
''-p''
> option with several values of n and all three values {3,5,7}.   
> Nearly all attempts look like this one - the same block is  
> complained about *almost* every time:
>
> [root at tebow2 ~]# fsck -b 294912 -B 4096 -f -p /dev/F3P1L0/T2-F3P1L0
> fsck 1.41.6.sun1 (30-May-2009)
> crn-OST0011: Block bitmap for group 6016 is not in group.  (block  
> 484237063)
>
> Seems that particular groups get complained about, FWIW, 6016 and  
> 10112.
>
> However, with n=1 and 7 as the multiplier, the fsck -p output was a  
> bit different (different block, zeroed some checksums for group  
> descriptors) - am trying an fsck with that superblock and "-y"
now.
>
>> If you can get it mounted at all you should copy the data out.  If  
>> you have a very new kernel you may be able to mount the filesystem  
>> with ext4 (so that you don''t need to re-create the journal) to
copy
>> the data out.
>> For the objects in the lost+found directory  
>> ll_recover_lost_found_objs will "rescue" all of these objects
and
>> put them back into the right directory structure for Lustre to find  
>> them again.
>
> Hopefully we can get it mounted and rescue the data.
>
> We appreciate your help.
>
> Thanks,
> Craig Prescott
> UF HPC Center

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Craig Prescott

2009-Dec-02 18:51 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Andreas Dilger wrote:> On 2009-12-01, at 19:01, Craig Prescott wrote:
>> Andreas Dilger wrote:
>>> I would start by simply trying to mount the OST filesystem with 
>>> ldiskfs directly (mount options "-o ro" to avoid any
further
>>> corruption or errors, and possibly also "noload" to avoid
recovering
>>> the journal), and seeing if you can copy out the data from the 
>>> filesystem into a backup filesystem, and then just reformat the
OST.
>>
>> Unfortunately, this did not work:
>>
>> [root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt
>> mount: wrong fs type, bad option, bad superblock on 
>> /dev/F3P1L0/T2-F3P1L0,
>>       missing codepage or other error
>>       In some cases useful info is found in syslog - try
>>       dmesg | tail  or so
>>
>> In dmesg I see this:
>>
>> LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum 
>> for group 256 failed (18306!=0)
>> LDISKFS-fs: group descriptors corrupted!
> 
> You may want to disable the group descriptor checksums with:
> 
> debugfs -R "feature ^uninit_bg" {dev}
> 
> and then retry the mount and/or e2fsck.  This feature is making it more 
> difficult to use the backup descriptors for some reason.
The debugfs command didn''t take - uninit_bg still showed up in 
"filesystem features" if I ran ''stats'' under debugfs
interactively.

But ''tune2fs -O ^uninit_bg /dev/F3P1L0/T2-F3P1L0'' did work.

Unfortunately, mounting the device as ldiskfs still didn''t work; from 
the syslog:

LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum for 
group 0 failed (0!=29388)

LDISKFS-fs: group descriptors corrupted!

Note that the group descriptor checksum inequality message in the syslog 
is changed - (0!=29388) is what we get now, versus (18306!=0) when group 
descriptor checksums were enabled.

I still haven''t had any luck with fsck.

Do you have any other ideas?

Thanks,
Craig Prescott
UF HPC Center

Andreas Dilger

2009-Dec-02 22:27 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

On 2009-12-02, at 11:51, Craig Prescott wrote:>> You may want to disable the group descriptor checksums with:
>>
>> debugfs -R "feature ^uninit_bg" {dev}
>>
>> and then retry the mount and/or e2fsck.  This feature is making it  
>> more
>> difficult to use the backup descriptors for some reason.
>
> The debugfs command didn''t take - uninit_bg still showed up in
> "filesystem features" if I ran ''stats'' under
debugfs interactively.
>
> But ''tune2fs -O ^uninit_bg /dev/F3P1L0/T2-F3P1L0'' did
work.
>
> Unfortunately, mounting the device as ldiskfs still didn''t work;
from
> the syslog:
>
> LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum  
> for group 0 failed (0!=29388)
>
> LDISKFS-fs: group descriptors corrupted!
>
> Note that the group descriptor checksum inequality message in the  
> syslog is changed - (0!=29388) is what we get now, versus (18306!=0)  
> when group descriptor checksums were enabled.
>
> I still haven''t had any luck with fsck.
>
> Do you have any other ideas?

Hmm, the code shouldn''t be checking the checksums if the uninit_bg
feature is not enabled.  I believe this was fixed in ext4 already:

in ldiskfs_group_desc_csum_verify() change it to be:

int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
                                    __u32 block_group,
                                    struct ext4_group_desc *gdp)
{
         if ((sbi->s_es->s_feature_ro_compat &
              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) &&
             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi,  
block_group, gdp)))
                 return 0;
         return 1;
}

This should allow you to mount the filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Craig Prescott

2009-Dec-03 00:16 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Andreas Dilger wrote:> Hmm, the code shouldn''t be checking the checksums if the uninit_bg
> feature is not enabled.  I believe this was fixed in ext4 already:
> 
> in ldiskfs_group_desc_csum_verify() change it to be:
> 
> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
>                                    __u32 block_group,
>                                    struct ext4_group_desc *gdp)
> {
>         if ((sbi->s_es->s_feature_ro_compat &
>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) &&
>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, 
> block_group, gdp)))
>                 return 0;
>         return 1;
> }
Ok, thanks.  I''ll try that.

Here''s what the 1.8.1.1 ldiskfs_group_desc_csum_verify() looks like 
(from lustre-ldiskfs-3.0.9/ldiskfs/super.c):

int ldiskfs_group_desc_csum_verify(struct ldiskfs_sb_info *sbi, __u32 
block_group,
                                 struct ldiskfs_group_desc *gdp)
{
         return (gdp->bg_checksum =                        
ldiskfs_group_desc_csum(sbi, block_group, gdp));
}

(this is following an ''rpmbuild -bc lustre-ldiskfs.spec'' from 
lustre-ldiskfs-3.0.9-2.6.18_128.7.1.el5_lustre.1.8.1.1.src.rpm).

The problematic OST is direct-attached to a running OSS with ldiskfs.ko 
loaded (problematic OST is marked inactive).  I''ll have to wait at
least
until tomorrow for an opportunity to try deploying and reloading an 
updated ldiskfs.ko.

Again, I really appreciate the help, and will let the list know how it goes.

Thanks,
Craig Prescott
UF HPC Center

Craig Prescott

2009-Dec-03 17:27 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Craig Prescott wrote:> Andreas Dilger wrote:
>> Hmm, the code shouldn''t be checking the checksums if the
uninit_bg
>> feature is not enabled.  I believe this was fixed in ext4 already:
>>
>> in ldiskfs_group_desc_csum_verify() change it to be:
>>
>> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
>>                                    __u32 block_group,
>>                                    struct ext4_group_desc *gdp)
>> {
>>         if ((sbi->s_es->s_feature_ro_compat &
>>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM))
&&
>>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, 
>> block_group, gdp)))
>>                 return 0;
>>         return 1;
>> }
> 
> Ok, thanks.  I''ll try that.
> 
<snip>> Again, I really appreciate the help, and will let the list know how it 
> goes.
Sadly, we didn''t have any luck with this.  We had written off the OST
in
our minds anyway, so to get any of the data back would have been a windfall.

Wouldn''t mount as ldiskfs with the group descriptor checksum disabled:

Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7):
ldiskfs_check_descriptors: Block bitmap for group 10112 not in group (block
484237063)!
Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!

Disabling that check and trying to mount yielded this one:

Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7):
ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group (block
14342712)!
Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!

Disabling that check yielded this one:

Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7):
ldiskfs_check_descriptors: Inode table for group 10112 not in group (block
3538357782)!
Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!

All these messages were seen repeatedly in our fsck attempts.  If we had 
been able to get past this group, several thousand more would have followed.

Disabling the inode table present in group check:

Dec  3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on dm-7

At that point we tried to rewrite superblocks with mkfs.lustre and 
--mkfsoptions="-S", which panic''d the OSS.  At that point, we
gave up.

Though it didn''t work out this time, we''ll be in a better
position to be
successful if this happens ever again.

Thanks,
Craig Prescott
UF HPC Center

恩强周

2009-Dec-04 03:19 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

hi, all
I also hit ldiskfs problems.I have two osts report messages like this.
LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd
LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd
LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gd
...
Does it mean LDISKFS will corrupted at some time later?

Also one ost  reported messages like "Remounting ... read-only", so
some
files cann''t be write at that time.We have run e2fsck to fix it. But it
reported again now.
We have found that ldiskfs seems unstable since 1.6.(1.4  better than 1.6)
We have worryed about problem like filessystem corruption.Anyone can give
some suggestion?


2009/12/4 Craig Prescott <prescott at hpc.ufl.edu>
> Craig Prescott wrote:
> > Andreas Dilger wrote:
> >> Hmm, the code shouldn''t be checking the checksums if the
uninit_bg
> >> feature is not enabled.  I believe this was fixed in ext4 already:
> >>
> >> in ldiskfs_group_desc_csum_verify() change it to be:
> >>
> >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
> >>                                    __u32 block_group,
> >>                                    struct ext4_group_desc *gdp)
> >> {
> >>         if ((sbi->s_es->s_feature_ro_compat &
> >>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM))
&&
> >>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi,
> >> block_group, gdp)))
> >>                 return 0;
> >>         return 1;
> >> }
> >
> > Ok, thanks.  I''ll try that.
> >
> <snip>
> > Again, I really appreciate the help, and will let the list know how it
> > goes.
>
> Sadly, we didn''t have any luck with this.  We had written off the
OST in
> our minds anyway, so to get any of the data back would have been a
> windfall.
>
> Wouldn''t mount as ldiskfs with the group descriptor checksum
disabled:
>
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Block bitmap for group 10112 not in group (block
> 484237063)!
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> Disabling that check and trying to mount yielded this one:
>
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group (block
> 14342712)!
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> Disabling that check yielded this one:
>
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode table for group 10112 not in group (block
> 3538357782)!
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> All these messages were seen repeatedly in our fsck attempts.  If we had
> been able to get past this group, several thousand more would have
> followed.
>
> Disabling the inode table present in group check:
>
> Dec  3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on dm-7
>
> At that point we tried to rewrite superblocks with mkfs.lustre and
> --mkfsoptions="-S", which panic''d the OSS.  At that
point, we gave up.
>
> Though it didn''t work out this time, we''ll be in a better
position to be
> successful if this happens ever again.
>
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091204/55aa4dd2/attachment.html

Andreas Dilger

2009-Dec-06 03:19 UTC

head link

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

On 2009-12-03, at 20:19, ??? wrote:> hi, all
> I also hit ldiskfs problems.I have two osts report messages like this.
> LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd
> LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd
> LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gd
I believe this is a bug that was already fixed in newer Lustre releases.
You should run the Lustre "e2fsck -f" on the device, when it is  
unmounted.
> Does it mean LDISKFS will corrupted at some time later?
>
> Also one ost  reported messages like "Remounting ... read-only",
so
> some files cann''t be write at that time.We have run e2fsck to fix
> it. But it reported again now.
> We have found that ldiskfs seems unstable since 1.6.(1.4  better  
> than 1.6)
> We have worryed about problem like filessystem corruption.Anyone can  
> give some suggestion?
You should update to a newer version of Lustre.
> 2009/12/4 Craig Prescott <prescott at hpc.ufl.edu>
> Craig Prescott wrote:
> > Andreas Dilger wrote:
> >> Hmm, the code shouldn''t be checking the checksums if the
uninit_bg
> >> feature is not enabled.  I believe this was fixed in ext4 already:
> >>
> >> in ldiskfs_group_desc_csum_verify() change it to be:
> >>
> >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
> >>                                    __u32 block_group,
> >>                                    struct ext4_group_desc *gdp)
> >> {
> >>         if ((sbi->s_es->s_feature_ro_compat &
> >>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM))
&&
> >>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi,
> >> block_group, gdp)))
> >>                 return 0;
> >>         return 1;
> >> }
> >
> > Ok, thanks.  I''ll try that.
> >
> <snip>
> > Again, I really appreciate the help, and will let the list know  
> how it
> > goes.
>
> Sadly, we didn''t have any luck with this.  We had written off the
> OST in
> our minds anyway, so to get any of the data back would have been a  
> windfall.
>
> Wouldn''t mount as ldiskfs with the group descriptor checksum
disabled:
>
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Block bitmap for group 10112 not in group  
> (block
> 484237063)!
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> Disabling that check and trying to mount yielded this one:
>
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group  
> (block
> 14342712)!
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> Disabling that check yielded this one:
>
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode table for group 10112 not in group  
> (block
> 3538357782)!
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> All these messages were seen repeatedly in our fsck attempts.  If we  
> had
> been able to get past this group, several thousand more would have  
> followed.
>
> Disabling the inode table present in group check:
>
> Dec  3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on  
> dm-7
>
> At that point we tried to rewrite superblocks with mkfs.lustre and
> --mkfsoptions="-S", which panic''d the OSS.  At that
point, we gave up.
>
> Though it didn''t work out this time, we''ll be in a better
position
> to be
> successful if this happens ever again.
>
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Dec 2009 - fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1