Junxiao Bi
2015-Nov-03 07:12 UTC
[Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check
Hi Gang, This is not like a right patch. First, online file check only checks inode's block number, valid flag, fs generation value, and meta ecc. I never see a real corruption happened only on this field, if these fields are corrupted, that means something bad may happen on other place. So fix this field may not help and even cause corruption more hard. Second, the repair way is wrong. In ocfs2_filecheck_repair_inode_block(), if these fields in disk don't match the ones in memory, the ones in memory are used to update the disk fields. The question is how do you know these field in memory are right(they may be the real corrupted ones)? Thanks, Junxiao. On 10/28/2015 02:26 PM, Gang He wrote:> +static int ocfs2_filecheck_repair_inode_block(struct super_block *sb, > + struct buffer_head *bh) > +{ > + int rc; > + int changed = 0; > + struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data; > + > + rc = ocfs2_filecheck_validate_inode_block(sb, bh); > + /* Can't fix invalid inode block */ > + if (!rc || rc == -OCFS2_FILECHECK_ERR_INVALIDINO) > + return rc; > + > + trace_ocfs2_filecheck_repair_inode_block( > + (unsigned long long)bh->b_blocknr); > + > + if (ocfs2_is_hard_readonly(OCFS2_SB(sb)) || > + ocfs2_is_soft_readonly(OCFS2_SB(sb))) { > + mlog(ML_ERROR, > + "Filecheck: try to repair dinode #%llu on readonly filesystem\n", > + (unsigned long long)bh->b_blocknr); > + return -OCFS2_FILECHECK_ERR_READONLY; > + } > + > + if (le64_to_cpu(di->i_blkno) != bh->b_blocknr) { > + di->i_blkno = cpu_to_le64(bh->b_blocknr); > + changed = 1; > + mlog(ML_ERROR, > + "Filecheck: reset dinode #%llu: i_blkno to %llu\n", > + (unsigned long long)bh->b_blocknr, > + (unsigned long long)le64_to_cpu(di->i_blkno)); > + } > + > + if (!(di->i_flags & cpu_to_le32(OCFS2_VALID_FL))) { > + di->i_flags |= cpu_to_le32(OCFS2_VALID_FL); > + changed = 1; > + mlog(ML_ERROR, > + "Filecheck: reset dinode #%llu: OCFS2_VALID_FL is set\n", > + (unsigned long long)bh->b_blocknr); > + } > + > + if (le32_to_cpu(di->i_fs_generation) !> + OCFS2_SB(sb)->fs_generation) { > + di->i_fs_generation = cpu_to_le32(OCFS2_SB(sb)->fs_generation); > + changed = 1; > + mlog(ML_ERROR, > + "Filecheck: reset dinode #%llu: fs_generation to %u\n", > + (unsigned long long)bh->b_blocknr, > + le32_to_cpu(di->i_fs_generation)); > + } > + > + if (changed || > + ocfs2_validate_meta_ecc(sb, bh->b_data, &di->i_check)) { > + ocfs2_compute_meta_ecc(sb, bh->b_data, &di->i_check); > + mark_buffer_dirty(bh); > + mlog(ML_ERROR, > + "Filecheck: reset dinode #%llu: compute meta ecc\n", > + (unsigned long long)bh->b_blocknr); > + } > + > + return 0; > +}
Gang He
2015-Nov-03 08:15 UTC
[Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check
Hello Junxiao, See my comments inline.>>> > Hi Gang, > > This is not like a right patch. > First, online file check only checks inode's block number, valid flag, > fs generation value, and meta ecc. I never see a real corruption > happened only on this field, if these fields are corrupted, that means > something bad may happen on other place. So fix this field may not help > and even cause corruption more hard.This online file check/fix feature is used to check/fix some light file meta block corruption, instead of turning a file system off and using fsck.ocfs2. e.g. meta ecc error, we really need not to use fsck.ocfs2. of course, this feature does not replace fsck.ocfs2 and touch some complicated meta block problems, if there is some potential problem in some areas, we can discuss them one by one.> Second, the repair way is wrong. In > ocfs2_filecheck_repair_inode_block(), if these fields in disk don't > match the ones in memory, the ones in memory are used to update the disk > fields. The question is how do you know these field in memory are > right(they may be the real corrupted ones)?Here, if the inode block was corrupted, the file system is not able to load it into the memory. ocfs2_filecheck_repair_inode_block() will able to load it into the memory, since it try to fix these light-level problem before loading. if the fix is OK, the changed meta-block can pass the block-validate function and load into the memory as a inode object. Since the file system is under a cluster environment, we have to use some existing function and code path to keep these block operation under a cluster lock. Thanks Gang> > Thanks, > Junxiao. > On 10/28/2015 02:26 PM, Gang He wrote: >> +static int ocfs2_filecheck_repair_inode_block(struct super_block *sb, >> + struct buffer_head *bh) >> +{ >> + int rc; >> + int changed = 0; >> + struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data; >> + >> + rc = ocfs2_filecheck_validate_inode_block(sb, bh); >> + /* Can't fix invalid inode block */ >> + if (!rc || rc == -OCFS2_FILECHECK_ERR_INVALIDINO) >> + return rc; >> + >> + trace_ocfs2_filecheck_repair_inode_block( >> + (unsigned long long)bh->b_blocknr); >> + >> + if (ocfs2_is_hard_readonly(OCFS2_SB(sb)) || >> + ocfs2_is_soft_readonly(OCFS2_SB(sb))) { >> + mlog(ML_ERROR, >> + "Filecheck: try to repair dinode #%llu on readonly filesystem\n", >> + (unsigned long long)bh->b_blocknr); >> + return -OCFS2_FILECHECK_ERR_READONLY; >> + } >> + >> + if (le64_to_cpu(di->i_blkno) != bh->b_blocknr) { >> + di->i_blkno = cpu_to_le64(bh->b_blocknr); >> + changed = 1; >> + mlog(ML_ERROR, >> + "Filecheck: reset dinode #%llu: i_blkno to %llu\n", >> + (unsigned long long)bh->b_blocknr, >> + (unsigned long long)le64_to_cpu(di->i_blkno)); >> + } >> + >> + if (!(di->i_flags & cpu_to_le32(OCFS2_VALID_FL))) { >> + di->i_flags |= cpu_to_le32(OCFS2_VALID_FL); >> + changed = 1; >> + mlog(ML_ERROR, >> + "Filecheck: reset dinode #%llu: OCFS2_VALID_FL is set\n", >> + (unsigned long long)bh->b_blocknr); >> + } >> + >> + if (le32_to_cpu(di->i_fs_generation) !>> + OCFS2_SB(sb)->fs_generation) { >> + di->i_fs_generation = cpu_to_le32(OCFS2_SB(sb)->fs_generation); >> + changed = 1; >> + mlog(ML_ERROR, >> + "Filecheck: reset dinode #%llu: fs_generation to %u\n", >> + (unsigned long long)bh->b_blocknr, >> + le32_to_cpu(di->i_fs_generation)); >> + } >> + >> + if (changed || >> + ocfs2_validate_meta_ecc(sb, bh->b_data, &di->i_check)) { >> + ocfs2_compute_meta_ecc(sb, bh->b_data, &di->i_check); >> + mark_buffer_dirty(bh); >> + mlog(ML_ERROR, >> + "Filecheck: reset dinode #%llu: compute meta ecc\n", >> + (unsigned long long)bh->b_blocknr); >> + } >> + >> + return 0; >> +}
Mark Fasheh
2015-Nov-24 22:16 UTC
[Ocfs2-devel] [PATCH v2 4/4] ocfs2: check/fix inode block for online file check
Hi Junxiao, On Tue, Nov 03, 2015 at 03:12:35PM +0800, Junxiao Bi wrote:> Hi Gang, > > This is not like a right patch. > First, online file check only checks inode's block number, valid flag, > fs generation value, and meta ecc. I never see a real corruption > happened only on this field, if these fields are corrupted, that means > something bad may happen on other place. So fix this field may not help > and even cause corruption more hard.I agree that these are rather uncommon, we might even consider removing the VALID_FL fixup. I definitely don't think we're ready for anything more complicated than this though either. We kind of have to start somewhere too.> Second, the repair way is wrong. In > ocfs2_filecheck_repair_inode_block(), if these fields in disk don't > match the ones in memory, the ones in memory are used to update the disk > fields. The question is how do you know these field in memory are > right(they may be the real corrupted ones)?Your second point (and the last part of your 1st point) makes a good argument for why this shouldn't happen automatically. Some of these corruptions might require a human to look at the log and decide what to do. Especially as you point out, where we might not know where the source of the corruption is. And if the human can't figure it out, then it's probably time to unmount and fsck. Thanks, --Mark -- Mark Fasheh