Craig Prescott
2009-Dec-01 20:56 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Hope someone can help us out with this one. We are running Lustre 1.8.1.1. One of our two OSS nodes (12 OSTs) become unresponsive on Sunday night. We issued an IPMI power cycle. After the node was back up, we tried to fsck the OSTs (e2fsprogs-1.41.6.sun1-0redhat.x86_64) with ''fsck -f -y''. Eleven of the twelve OSTs fsck''d normally. The 12th OST showed heavy corruption, with many inodes moved to /lost+found. This fsck never finished, and we killed it after ~14 hours. All further fsck attempts seem to endlessly get kicked back to pass 1 after many zero dtime corrections, and relocating many group block bitmaps, inode bitmaps, and inode tables. It seems that many of these changes are never written out to the filesystem, as we encounter the same corrections on subsequent pass 1 restarts. Actually, it looks like every *other* attempt to run pass 1 yields similar output, as if fsck is bouncing back and forth between two solutions. We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from sourceforge. Logs (enormous) of the fsck attempts are available here: http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck attempts) http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck attempts) Can any part of this OST be salvaged? Thanks, Craig Prescott UF HPC Center From the initial fsck: fsck.ext4: Group descriptors look bad... trying backup blocks... Superblock has an invalid journal (inode 8). Clear? yes *** ext3 journal has been deleted - filesystem is now ext2 only *** Superblock has_journal flag is clear, but a journal inode is present. Clear? yes Pass 1: Checking inodes, blocks, and sizes Journal inode is not in use, but contains data. Clear? yes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 32784385 was part of the orphaned inode list. FIXED. Inode 32784385 has imagic flag set. Clear? yes ... File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008) has 506488 multiply-claimed block(s), shared with 7 file(s): ??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008) ... (inode #114786317, mod time Fri Oct 10 14:03:48 2008) ... (inode #114786315, mod time Fri Oct 10 14:03:48 2008) ??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008) ... (inode #114786311, mod time Fri Oct 10 14:03:48 2008) ... (inode #114786309, mod time Fri Oct 10 14:03:48 2008) ??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008) Clone multiply-claimed blocks? yes ...
Andreas Dilger
2009-Dec-01 23:50 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
On 2009-12-01, at 13:56, Craig Prescott wrote:> We are running Lustre 1.8.1.1. One of our two OSS nodes (12 OSTs) > become unresponsive on Sunday night. We issued an IPMI power cycle. > > After the node was back up, we tried to fsck the OSTs > (e2fsprogs-1.41.6.sun1-0redhat.x86_64) with ''fsck -f -y''. Eleven of > the twelve OSTs fsck''d normally. The 12th OST showed heavy > corruption, with many inodes moved to /lost+found. This fsck never > finished, and we killed it after ~14 hours. > > All further fsck attempts seem to endlessly get kicked back to pass 1 > after many zero dtime corrections, and relocating many group block > bitmaps, inode bitmaps, and inode tables. It seems that many of these > changes are never written out to the filesystem, as we encounter the > same corrections on subsequent pass 1 restarts. Actually, it looks > like every *other* attempt to run pass 1 yields similar output, as > if fsck is bouncing back and forth between two solutions. > > We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from > sourceforge. > Logs (enormous) of the fsck attempts are available here: > > http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck > attempts) > http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck > attempts) > > Can any part of this OST be salvaged?It''s possible, though I''m not sure how much will be left, after the volume of messages that I saw. I would start by simply trying to mount the OST filesystem with ldiskfs directly (mount options "-o ro" to avoid any further corruption or errors, and possibly also "noload" to avoid recovering the journal), and seeing if you can copy out the data from the filesystem into a backup filesystem, and then just reformat the OST. You should copy out the files with a tool that has xattr support, like rsync v3, or the RHEL tar using the --xattr option. Failing that, you may be able to e2fsck using a backup superblock and group descriptor with the "-B 4096 -b {blocknr}", where: blocknr = 32768 * {3,5,7}^n I don''t think the first backup group descriptor is valid (that would be n=0 above, or 32768), so you could try (at random) 32768 * 3^2 = 294912. If you can get it mounted at all you should copy the data out. If you have a very new kernel you may be able to mount the filesystem with ext4 (so that you don''t need to re-create the journal) to copy the data out. For the objects in the lost+found directory ll_recover_lost_found_objs will "rescue" all of these objects and put them back into the right directory structure for Lustre to find them again.> From the initial fsck: > > fsck.ext4: Group descriptors look bad... trying backup blocks... > Superblock has an invalid journal (inode 8). > Clear? yes > > *** ext3 journal has been deleted - filesystem is now ext2 only *** > > Superblock has_journal flag is clear, but a journal inode is present. > Clear? yes > > Pass 1: Checking inodes, blocks, and sizes > Journal inode is not in use, but contains data. Clear? yes > > > Inodes that were part of a corrupted orphan linked list found. Fix? > yes > > Inode 32784385 was part of the orphaned inode list. FIXED. > Inode 32784385 has imagic flag set. Clear? yes > > ... > > File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008) > has 506488 multiply-claimed block(s), shared with 7 file(s): > ??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008) > ... (inode #114786317, mod time Fri Oct 10 14:03:48 2008) > ... (inode #114786315, mod time Fri Oct 10 14:03:48 2008) > ??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008) > ... (inode #114786311, mod time Fri Oct 10 14:03:48 2008) > ... (inode #114786309, mod time Fri Oct 10 14:03:48 2008) > ??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008) > Clone multiply-claimed blocks? yes > > ... > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Craig Prescott
2009-Dec-02 02:01 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Thanks for the reply, Andreas. Andreas Dilger wrote:> I would start by simply trying to mount the OST filesystem with ldiskfs > directly (mount options "-o ro" to avoid any further corruption or > errors, and possibly also "noload" to avoid recovering the journal), and > seeing if you can copy out the data from the filesystem into a backup > filesystem, and then just reformat the OST.Unfortunately, this did not work: [root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt mount: wrong fs type, bad option, bad superblock on /dev/F3P1L0/T2-F3P1L0, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so In dmesg I see this: LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum for group 256 failed (18306!=0) LDISKFS-fs: group descriptors corrupted! Adding "noload" to the options list did not change anything.> You should copy out the files with a tool that has xattr support, like > rsync v3, or the RHEL tar using the --xattr option. > > Failing that, you may be able to e2fsck using a backup superblock and > group descriptor with the "-B 4096 -b {blocknr}", where: > > blocknr = 32768 * {3,5,7}^n > > I don''t think the first backup group descriptor is valid (that would be > n=0 above, or 32768), so you could try (at random) 32768 * 3^2 = 294912.I tried fsck with from the 1.41.6 Lustre package with the ''-p'' option with several values of n and all three values {3,5,7}. Nearly all attempts look like this one - the same block is complained about *almost* every time: [root at tebow2 ~]# fsck -b 294912 -B 4096 -f -p /dev/F3P1L0/T2-F3P1L0 fsck 1.41.6.sun1 (30-May-2009) crn-OST0011: Block bitmap for group 6016 is not in group. (block 484237063) Seems that particular groups get complained about, FWIW, 6016 and 10112. However, with n=1 and 7 as the multiplier, the fsck -p output was a bit different (different block, zeroed some checksums for group descriptors) - am trying an fsck with that superblock and "-y" now.> If you can get it mounted at all you should copy the data out. If you > have a very new kernel you may be able to mount the filesystem with ext4 > (so that you don''t need to re-create the journal) to copy the data out. > > For the objects in the lost+found directory ll_recover_lost_found_objs > will "rescue" all of these objects and put them back into the right > directory structure for Lustre to find them again.Hopefully we can get it mounted and rescue the data. We appreciate your help. Thanks, Craig Prescott UF HPC Center
Andreas Dilger
2009-Dec-02 02:43 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
On 2009-12-01, at 19:01, Craig Prescott wrote:> Andreas Dilger wrote: >> I would start by simply trying to mount the OST filesystem with >> ldiskfs directly (mount options "-o ro" to avoid any further >> corruption or errors, and possibly also "noload" to avoid >> recovering the journal), and seeing if you can copy out the data >> from the filesystem into a backup filesystem, and then just >> reformat the OST. > > Unfortunately, this did not work: > > [root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt > mount: wrong fs type, bad option, bad superblock on /dev/F3P1L0/T2- > F3P1L0, > missing codepage or other error > In some cases useful info is found in syslog - try > dmesg | tail or so > > In dmesg I see this: > > LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum > for group 256 failed (18306!=0) > LDISKFS-fs: group descriptors corrupted!You may want to disable the group descriptor checksums with: debugfs -R "feature ^uninit_bg" {dev} and then retry the mount and/or e2fsck. This feature is making it more difficult to use the backup descriptors for some reason.> > Adding "noload" to the options list did not change anything. > >> You should copy out the files with a tool that has xattr support, >> like rsync v3, or the RHEL tar using the --xattr option. >> Failing that, you may be able to e2fsck using a backup superblock >> and group descriptor with the "-B 4096 -b {blocknr}", where: >> blocknr = 32768 * {3,5,7}^n >> I don''t think the first backup group descriptor is valid (that >> would be n=0 above, or 32768), so you could try (at random) 32768 * >> 3^2 = 294912. > > I tried fsck with from the 1.41.6 Lustre package with the ''-p'' > option with several values of n and all three values {3,5,7}. > Nearly all attempts look like this one - the same block is > complained about *almost* every time: > > [root at tebow2 ~]# fsck -b 294912 -B 4096 -f -p /dev/F3P1L0/T2-F3P1L0 > fsck 1.41.6.sun1 (30-May-2009) > crn-OST0011: Block bitmap for group 6016 is not in group. (block > 484237063) > > Seems that particular groups get complained about, FWIW, 6016 and > 10112. > > However, with n=1 and 7 as the multiplier, the fsck -p output was a > bit different (different block, zeroed some checksums for group > descriptors) - am trying an fsck with that superblock and "-y" now. > >> If you can get it mounted at all you should copy the data out. If >> you have a very new kernel you may be able to mount the filesystem >> with ext4 (so that you don''t need to re-create the journal) to copy >> the data out. >> For the objects in the lost+found directory >> ll_recover_lost_found_objs will "rescue" all of these objects and >> put them back into the right directory structure for Lustre to find >> them again. > > Hopefully we can get it mounted and rescue the data. > > We appreciate your help. > > Thanks, > Craig Prescott > UF HPC CenterCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Craig Prescott
2009-Dec-02 18:51 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Andreas Dilger wrote:> On 2009-12-01, at 19:01, Craig Prescott wrote: >> Andreas Dilger wrote: >>> I would start by simply trying to mount the OST filesystem with >>> ldiskfs directly (mount options "-o ro" to avoid any further >>> corruption or errors, and possibly also "noload" to avoid recovering >>> the journal), and seeing if you can copy out the data from the >>> filesystem into a backup filesystem, and then just reformat the OST. >> >> Unfortunately, this did not work: >> >> [root at tebow2 ~]# mount -t ldiskfs -o ro /dev/F3P1L0/T2-F3P1L0 /mnt >> mount: wrong fs type, bad option, bad superblock on >> /dev/F3P1L0/T2-F3P1L0, >> missing codepage or other error >> In some cases useful info is found in syslog - try >> dmesg | tail or so >> >> In dmesg I see this: >> >> LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum >> for group 256 failed (18306!=0) >> LDISKFS-fs: group descriptors corrupted! > > You may want to disable the group descriptor checksums with: > > debugfs -R "feature ^uninit_bg" {dev} > > and then retry the mount and/or e2fsck. This feature is making it more > difficult to use the backup descriptors for some reason.The debugfs command didn''t take - uninit_bg still showed up in "filesystem features" if I ran ''stats'' under debugfs interactively. But ''tune2fs -O ^uninit_bg /dev/F3P1L0/T2-F3P1L0'' did work. Unfortunately, mounting the device as ldiskfs still didn''t work; from the syslog: LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum for group 0 failed (0!=29388) LDISKFS-fs: group descriptors corrupted! Note that the group descriptor checksum inequality message in the syslog is changed - (0!=29388) is what we get now, versus (18306!=0) when group descriptor checksums were enabled. I still haven''t had any luck with fsck. Do you have any other ideas? Thanks, Craig Prescott UF HPC Center
Andreas Dilger
2009-Dec-02 22:27 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
On 2009-12-02, at 11:51, Craig Prescott wrote:>> You may want to disable the group descriptor checksums with: >> >> debugfs -R "feature ^uninit_bg" {dev} >> >> and then retry the mount and/or e2fsck. This feature is making it >> more >> difficult to use the backup descriptors for some reason. > > The debugfs command didn''t take - uninit_bg still showed up in > "filesystem features" if I ran ''stats'' under debugfs interactively. > > But ''tune2fs -O ^uninit_bg /dev/F3P1L0/T2-F3P1L0'' did work. > > Unfortunately, mounting the device as ldiskfs still didn''t work; from > the syslog: > > LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Checksum > for group 0 failed (0!=29388) > > LDISKFS-fs: group descriptors corrupted! > > Note that the group descriptor checksum inequality message in the > syslog is changed - (0!=29388) is what we get now, versus (18306!=0) > when group descriptor checksums were enabled. > > I still haven''t had any luck with fsck. > > Do you have any other ideas?Hmm, the code shouldn''t be checking the checksums if the uninit_bg feature is not enabled. I believe this was fixed in ext4 already: in ldiskfs_group_desc_csum_verify() change it to be: int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 block_group, struct ext4_group_desc *gdp) { if ((sbi->s_es->s_feature_ro_compat & cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) && (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, block_group, gdp))) return 0; return 1; } This should allow you to mount the filesystem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Craig Prescott
2009-Dec-03 00:16 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Andreas Dilger wrote:> Hmm, the code shouldn''t be checking the checksums if the uninit_bg > feature is not enabled. I believe this was fixed in ext4 already: > > in ldiskfs_group_desc_csum_verify() change it to be: > > int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi, > __u32 block_group, > struct ext4_group_desc *gdp) > { > if ((sbi->s_es->s_feature_ro_compat & > cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) && > (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, > block_group, gdp))) > return 0; > return 1; > }Ok, thanks. I''ll try that. Here''s what the 1.8.1.1 ldiskfs_group_desc_csum_verify() looks like (from lustre-ldiskfs-3.0.9/ldiskfs/super.c): int ldiskfs_group_desc_csum_verify(struct ldiskfs_sb_info *sbi, __u32 block_group, struct ldiskfs_group_desc *gdp) { return (gdp->bg_checksum = ldiskfs_group_desc_csum(sbi, block_group, gdp)); } (this is following an ''rpmbuild -bc lustre-ldiskfs.spec'' from lustre-ldiskfs-3.0.9-2.6.18_128.7.1.el5_lustre.1.8.1.1.src.rpm). The problematic OST is direct-attached to a running OSS with ldiskfs.ko loaded (problematic OST is marked inactive). I''ll have to wait at least until tomorrow for an opportunity to try deploying and reloading an updated ldiskfs.ko. Again, I really appreciate the help, and will let the list know how it goes. Thanks, Craig Prescott UF HPC Center
Craig Prescott
2009-Dec-03 17:27 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Craig Prescott wrote:> Andreas Dilger wrote: >> Hmm, the code shouldn''t be checking the checksums if the uninit_bg >> feature is not enabled. I believe this was fixed in ext4 already: >> >> in ldiskfs_group_desc_csum_verify() change it to be: >> >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi, >> __u32 block_group, >> struct ext4_group_desc *gdp) >> { >> if ((sbi->s_es->s_feature_ro_compat & >> cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) && >> (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, >> block_group, gdp))) >> return 0; >> return 1; >> } > > Ok, thanks. I''ll try that. ><snip>> Again, I really appreciate the help, and will let the list know how it > goes.Sadly, we didn''t have any luck with this. We had written off the OST in our minds anyway, so to get any of the data back would have been a windfall. Wouldn''t mount as ldiskfs with the group descriptor checksum disabled: Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Block bitmap for group 10112 not in group (block 484237063)! Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! Disabling that check and trying to mount yielded this one: Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group (block 14342712)! Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! Disabling that check yielded this one: Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7): ldiskfs_check_descriptors: Inode table for group 10112 not in group (block 3538357782)! Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! All these messages were seen repeatedly in our fsck attempts. If we had been able to get past this group, several thousand more would have followed. Disabling the inode table present in group check: Dec 3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on dm-7 At that point we tried to rewrite superblocks with mkfs.lustre and --mkfsoptions="-S", which panic''d the OSS. At that point, we gave up. Though it didn''t work out this time, we''ll be in a better position to be successful if this happens ever again. Thanks, Craig Prescott UF HPC Center
恩强周
2009-Dec-04 03:19 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
hi, all I also hit ldiskfs problems.I have two osts report messages like this. LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gd ... Does it mean LDISKFS will corrupted at some time later? Also one ost reported messages like "Remounting ... read-only", so some files cann''t be write at that time.We have run e2fsck to fix it. But it reported again now. We have found that ldiskfs seems unstable since 1.6.(1.4 better than 1.6) We have worryed about problem like filessystem corruption.Anyone can give some suggestion? 2009/12/4 Craig Prescott <prescott at hpc.ufl.edu>> Craig Prescott wrote: > > Andreas Dilger wrote: > >> Hmm, the code shouldn''t be checking the checksums if the uninit_bg > >> feature is not enabled. I believe this was fixed in ext4 already: > >> > >> in ldiskfs_group_desc_csum_verify() change it to be: > >> > >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi, > >> __u32 block_group, > >> struct ext4_group_desc *gdp) > >> { > >> if ((sbi->s_es->s_feature_ro_compat & > >> cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) && > >> (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, > >> block_group, gdp))) > >> return 0; > >> return 1; > >> } > > > > Ok, thanks. I''ll try that. > > > <snip> > > Again, I really appreciate the help, and will let the list know how it > > goes. > > Sadly, we didn''t have any luck with this. We had written off the OST in > our minds anyway, so to get any of the data back would have been a > windfall. > > Wouldn''t mount as ldiskfs with the group descriptor checksum disabled: > > Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Block bitmap for group 10112 not in group (block > 484237063)! > Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! > > Disabling that check and trying to mount yielded this one: > > Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group (block > 14342712)! > Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! > > Disabling that check yielded this one: > > Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Inode table for group 10112 not in group (block > 3538357782)! > Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors corrupted! > > All these messages were seen repeatedly in our fsck attempts. If we had > been able to get past this group, several thousand more would have > followed. > > Disabling the inode table present in group check: > > Dec 3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on dm-7 > > At that point we tried to rewrite superblocks with mkfs.lustre and > --mkfsoptions="-S", which panic''d the OSS. At that point, we gave up. > > Though it didn''t work out this time, we''ll be in a better position to be > successful if this happens ever again. > > Thanks, > Craig Prescott > UF HPC Center > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091204/55aa4dd2/attachment.html
Andreas Dilger
2009-Dec-06 03:19 UTC
[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
On 2009-12-03, at 20:19, ??? wrote:> hi, all > I also hit ldiskfs problems.I have two osts report messages like this. > LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd > LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd > LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gdI believe this is a bug that was already fixed in newer Lustre releases. You should run the Lustre "e2fsck -f" on the device, when it is unmounted.> Does it mean LDISKFS will corrupted at some time later? > > Also one ost reported messages like "Remounting ... read-only", so > some files cann''t be write at that time.We have run e2fsck to fix > it. But it reported again now. > We have found that ldiskfs seems unstable since 1.6.(1.4 better > than 1.6) > We have worryed about problem like filessystem corruption.Anyone can > give some suggestion?You should update to a newer version of Lustre.> 2009/12/4 Craig Prescott <prescott at hpc.ufl.edu> > Craig Prescott wrote: > > Andreas Dilger wrote: > >> Hmm, the code shouldn''t be checking the checksums if the uninit_bg > >> feature is not enabled. I believe this was fixed in ext4 already: > >> > >> in ldiskfs_group_desc_csum_verify() change it to be: > >> > >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi, > >> __u32 block_group, > >> struct ext4_group_desc *gdp) > >> { > >> if ((sbi->s_es->s_feature_ro_compat & > >> cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) && > >> (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, > >> block_group, gdp))) > >> return 0; > >> return 1; > >> } > > > > Ok, thanks. I''ll try that. > > > <snip> > > Again, I really appreciate the help, and will let the list know > how it > > goes. > > Sadly, we didn''t have any luck with this. We had written off the > OST in > our minds anyway, so to get any of the data back would have been a > windfall. > > Wouldn''t mount as ldiskfs with the group descriptor checksum disabled: > > Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Block bitmap for group 10112 not in group > (block > 484237063)! > Dec 3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors > corrupted! > > Disabling that check and trying to mount yielded this one: > > Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group > (block > 14342712)! > Dec 3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors > corrupted! > > Disabling that check yielded this one: > > Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7): > ldiskfs_check_descriptors: Inode table for group 10112 not in group > (block > 3538357782)! > Dec 3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors > corrupted! > > All these messages were seen repeatedly in our fsck attempts. If we > had > been able to get past this group, several thousand more would have > followed. > > Disabling the inode table present in group check: > > Dec 3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on > dm-7 > > At that point we tried to rewrite superblocks with mkfs.lustre and > --mkfsoptions="-S", which panic''d the OSS. At that point, we gave up. > > Though it didn''t work out this time, we''ll be in a better position > to be > successful if this happens ever again. > > Thanks, > Craig Prescott > UF HPC Center > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.