Hi, If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). Please consider for review. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Index: e2fsprogs-1.39/e2fsck/pass2.c ==================================================================--- e2fsprogs-1.39.orig/e2fsck/pass2.c +++ e2fsprogs-1.39/e2fsck/pass2.c @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc return 1; } - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == ''.'' && + (dirent->name[1] == ''.'' || dirent->name[1] == ''\0''))) { should_be = EXT2_FT_DIR; } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, dirent->inode)) {
Kalpak Shah
2007-Feb-21 04:48 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
Hi, The other issue this brings up is maybe pass1 should be checking whether it is the inode mode that is corrupted (by trying to verify block[0] has "." and ".."in it) instead of truncating off those blocks. This would actually be a redundant check as pass2 would also make the same check. We can have some more checks like if i_blocks!=0 then the file may not be special file, etc. Does e2fsck need to have more such checks to avoid making decisions just based on the mode? Thanks, Kalpak. On Wed, 2007-02-21 at 14:45 +0530, Kalpak Shah wrote:> Hi, > > If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. > > By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). > > Please consider for review. > > Signed-off-by: Andreas Dilger <adilger@clusterfs.com> > Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> > > > Index: e2fsprogs-1.39/e2fsck/pass2.c > ==================================================================> --- e2fsprogs-1.39.orig/e2fsck/pass2.c > +++ e2fsprogs-1.39/e2fsck/pass2.c > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc > return 1; > } > > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == ''.'' && > + (dirent->name[1] == ''.'' || dirent->name[1] == ''\0''))) { > should_be = EXT2_FT_DIR; > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, > dirent->inode)) { > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Kalpak Shah
2007-Feb-21 08:25 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
Hi Peter, Thanks for pointing out the typo. In the kernel, struct ext3_dir_entry_2 has a 8-bit name_len field followed by 8-bit field for filetype. Whereas in e2fsck, struct ext2_dir_entry has a 16-bit name_len field, the upper 8 bits of which store the filetype. Hence e2fsck masks the upper 8 bits while using name_len. Here is the patch with the change. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Index: e2fsprogs-1.39/e2fsck/pass2.c ==================================================================--- e2fsprogs-1.39.orig/e2fsck/pass2.c +++ e2fsprogs-1.39/e2fsck/pass2.c @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc return 1; } - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || + ((dirent->name_len & 0xFF) <= 2 && dirent->name[0] == ''.'' && + (dirent->name[1] == ''.'' || dirent->name[1] == ''\0''))) { should_be = EXT2_FT_DIR; } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, dirent->inode)) { Thanks, Kalpak Shah. <kalpak@clusterfs.com> On Wed, 2007-02-21 at 09:49 -0500, Peter Staubach wrote:> Kalpak Shah wrote: > > Hi, > > > > If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. > > > > By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). > > > > Please consider for review. > > > > Signed-off-by: Andreas Dilger <adilger@clusterfs.com> > > Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> > > > > > > Index: e2fsprogs-1.39/e2fsck/pass2.c > > ==================================================================> > --- e2fsprogs-1.39.orig/e2fsck/pass2.c > > +++ e2fsprogs-1.39/e2fsck/pass2.c > > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc > > return 1; > > } > > > > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { > > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || > > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == ''.'' && > > > > What is the "&& 0xFF" part for? At the very least, it should probably > be "& 0xFF". It doesn''t seem like this mask should be needed at all > though, I think. > > Thanx... > > ps > > > + (dirent->name[1] == ''.'' || dirent->name[1] == ''\0''))) { > > should_be = EXT2_FT_DIR; > > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, > > dirent->inode)) { > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
Peter Staubach
2007-Feb-21 18:00 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
Kalpak Shah wrote:> Hi, > > If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2. > > By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype(). > > Please consider for review. > > Signed-off-by: Andreas Dilger <adilger@clusterfs.com> > Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> > > > Index: e2fsprogs-1.39/e2fsck/pass2.c > ==================================================================> --- e2fsprogs-1.39.orig/e2fsck/pass2.c > +++ e2fsprogs-1.39/e2fsck/pass2.c > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc > return 1; > } > > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) { > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) || > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == ''.'' && >What is the "&& 0xFF" part for? At the very least, it should probably be "& 0xFF". It doesn''t seem like this mask should be needed at all though, I think. Thanx... ps> + (dirent->name[1] == ''.'' || dirent->name[1] == ''\0''))) { > should_be = EXT2_FT_DIR; > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map, > dirent->inode)) { > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Andreas Dilger
2007-Mar-31 01:16 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
On Mar 30, 2007 20:44 -0400, Theodore Tso wrote:> On Wed, Feb 21, 2007 at 02:45:59PM +0530, Kalpak Shah wrote: > > If the mode of a directory gets corrupted, check_filetype() makes > > wrong decisions for all its sub-directories. For example, using > > debugfs we can corrupt the mode of a directory to 0140755 (i.e. a > > socket). e2fsck will set the filetype of all its subdirectories as 6 > > (filetype for socket). All the subdirectories would be moved to > > lost+found, and in second run of e2fsck their filetype would be set > > back to 2. > > Um, I''m not seeing this. Using stock e2fsprogs, given the following > test image, I''m not seeing the behavior you describe.The log of the e2fsck (on a test filesystem with deliberately introduced corruption) is available at: https://bugzilla.lustre.org/show_bug.cgi?id=11645 I''ve also added a testcase (created by making a parent directory with several subdirs, then using debugfs to change the mode of the parent directory). https://bugzilla.lustre.org/attachment.cgi?id=9958 (also attached here). It doesn''t exhibit the filetype breakage in the upstream e2fsck because the test is run with the patch applied, but it does still show the "size is wrong on second e2fsck" problem you observed. The test case is created as if that problem was also fixed already (i.e. second e2fsck is clean). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: e2fsprogs-tests-f_check_filetype.patch Type: application/octet-stream Size: 14690 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070331/121563b9/e2fsprogs-tests-f_check_filetype.obj
Kalpak Shah
2007-Mar-31 12:38 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
On Sat, 2007-03-31 at 10:39 -0400, Theodore Tso wrote:> On Sat, Mar 31, 2007 at 08:35:09AM -0400, Theodore Tso wrote: > > And we can''t really check the case where a directory gets turned into > > a regular file without destroying e2fsck''s performance, since that > > would require reading the first block of every single file, and this > > also significantly increases the chance of false positives. So it''s a > > lot of complexity for what seems to have always been an artificial > > test case.Yes, you can''t check whether each regular file is a directory or not. The only case which is easy to check is whether the mode of a directory is corrupted since directories will always have a "." and ".." entry. Note that we will still lose all data of a regular file if its mode is changed to lets say, a socket. But there is hardly anything we can do about this since there is no special check that can be associated with a regular file. Since dir_index feature is going to be set by default in ext3/4, maybe we can check for EXT2_INDEX_FL in check_is_really_dir() and if this flag is set then check if such a file is a directory or not. So we would not be reading the first block of each file and yet would be able to correct most directory mode corruptions. Thanks, Kalpak.
Andreas Dilger
2007-Apr-03 11:37 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
On Mar 31, 2007 10:39 -0400, Theodore Tso wrote:> I''m going to let this one soak for a bit to make sure we don''t pick up > any fase positives or negatives in the hueristics. > > @@ -133,11 +133,10 @@ int e2fsck_pass1_check_device_inode(ext2 > + * If the index flag is set, then this is a bogus > + * device/fifo/socket > */ > - if ((ext2fs_inode_data_blocks(fs, inode) != 0) || > - (inode->i_flags & EXT2_INDEX_FL)) > + if (inode->i_flags & EXT2_INDEX_FL) > return 0;There were ancient versions of the kernel that left EXT2_INDEX_FL set on all files, instead of just directories... I''m not sure if those were in actual released kernels, or just in patches.> +static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx, > + char *buf) > +{ > + if (ext2fs_read_dir_block(ctx->fs, inode->i_block[0], buf)) > + return;Do we call check_blocks() on this inode shortly thereafter? If we do then the overhead of reading the first block is offset by not reading it again later. Otherwise, this could slow things down.> + dirent = (struct ext2_dir_entry *) buf; > + if (((dirent->name_len & 0xFF) != 1) || > + (dirent->name[0] != ''.'') || > + (dirent->inode != pctx->ino) || > + (dirent->rec_len < 12) || > + (dirent->rec_len % 4) || > + (dirent->rec_len >= ctx->fs->blocksize - 12)) > + return; > + > + dirent = (struct ext2_dir_entry *) (buf + dirent->rec_len); > + if (((dirent->name_len & 0xFF) != 2) || > + (dirent->name[0] != ''.'') || > + (dirent->name[1] != ''.'') || > + (dirent->rec_len < 12) || > + (dirent->rec_len % 4)) > + return; > + > + if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) { > + inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR; > + e2fsck_write_inode_full(ctx, pctx->ino, inode, > + EXT2_INODE_SIZE(ctx->fs->super), > + "check_is_really_dir"); > }The one worry I have here (though I don''t think it is necessarily IN the code you propose) is that someone could create a regular file which looks like a directory and somehow get it linked into the filesystem tree, giving them escalated access (e.g. device files owned by them, suid executables, links to otherwise unreadable files, etc). It would seem that this is only a danger if the mode on the file is corrupted, which shouldn''t really be doable by a regular user, but it is definitely something to consider. I take it that this code fixes the test image I previously posted? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Theodore Tso
2007-Apr-04 00:32 UTC
[Lustre-discuss] Re: [PATCH] Correction to check_filetype()
On Tue, Apr 03, 2007 at 11:37:16AM -0600, Andreas Dilger wrote:> On Mar 31, 2007 10:39 -0400, Theodore Tso wrote: > > I''m going to let this one soak for a bit to make sure we don''t pick up > > any fase positives or negatives in the hueristics. > > > > @@ -133,11 +133,10 @@ int e2fsck_pass1_check_device_inode(ext2 > > + * If the index flag is set, then this is a bogus > > + * device/fifo/socket > > */ > > - if ((ext2fs_inode_data_blocks(fs, inode) != 0) || > > - (inode->i_flags & EXT2_INDEX_FL)) > > + if (inode->i_flags & EXT2_INDEX_FL) > > return 0; > > There were ancient versions of the kernel that left EXT2_INDEX_FL set > on all files, instead of just directories... I''m not sure if those > were in actual released kernels, or just in patches.Well, we''ve been running with this in e2fsprogs for quite some time (August 2002, e2fsprogs 1.28), and no one has complained, so I think we''re safe...> > +static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx, > > + char *buf) > > +{ > > + if (ext2fs_read_dir_block(ctx->fs, inode->i_block[0], buf)) > > + return; > > Do we call check_blocks() on this inode shortly thereafter? If we do then > the overhead of reading the first block is offset by not reading it again > later. Otherwise, this could slow things down.This is why we only do this on special devices; check_is_really_dir() doesn''t do anything on directory or regular files.> The one worry I have here (though I don''t think it is necessarily IN > the code you propose) is that someone could create a regular file which > looks like a directory and somehow get it linked into the filesystem > tree, giving them escalated access (e.g. device files owned by them, > suid executables, links to otherwise unreadable files, etc).I thought of that, but I didn''t worry about it because I''m not doing this check on regular files. Come to think of it I should add a check so we don''t do this for long symlinks, for similar reasons. It would only matter if the user can force filesystem check, which makes it a long shot, but we should avoid that issue as well.> I take it that this code fixes the test image I previously posted?Yup. Take a look at what I checked in into Mercurial. I added two separate test cases. One of them directly tests the filetype issue, and the other tests a mutated directory into a char device case. - Ted