thr3ads.net - Btrfs devel - [PATCH] Btrfs: Don''t allocate inode that is already in use [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Stefan Behrens

2013-Oct-15 18:08 UTC

[PATCH] Btrfs: Don''t allocate inode that is already in use

Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.

The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().

The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode''s number is equal to the highest_ino + 1 (or >= this
value
instead of > this value which was the off-by-one error), we mustn''t
add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.

When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().

One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.

One possible reproducer is this one:
 #!/bin/sh
 # preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
 rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed
''s|^.*/\([^/]*\)$|\1|''`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
 # the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
 # if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
 # already in use that btrfs_find_ino_for_alloc() returns.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
---
 fs/btrfs/inode-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 014de49..ec08004 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -237,7 +237,7 @@ again:
 		start_caching(root);
 
 		if (objectid <= root->cache_progress ||
-		    objectid > root->highest_objectid)
+		    objectid >= root->highest_objectid)
 			__btrfs_add_free_space(ctl, objectid, 1);
 		else
 			__btrfs_add_free_space(pinned, objectid, 1);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2013-Oct-15 18:54 UTC

head link

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

On Tue, October 15, 2013 at 20:08 (+0200), Stefan Behrens
wrote:> Due to an off-by-one error, it is possible to reproduce a bug
> when the inode cache is used.
> 
> The same inode number is assigned twice, the second time this
> leads to an EEXIST in btrfs_insert_empty_items().
> 
> The issue can happen when a file is removed right after a subvolume
> is created and then a new inode number is created before the
> inodes in free_inode_pinned are processed.
> unlink() calls btrfs_return_ino() which calls start_caching() in this
> case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
> searching for the highest inode (which already cannot find the
> unlinked one anymore in btrfs_find_free_objectid()). So if this
> unlinked inode''s number is equal to the highest_ino + 1 (or >=
this value
> instead of > this value which was the off-by-one error), we
mustn''t add
> the inode number to free_ino_pinned (caching_thread() does it right).
> In this case we need to try directly to add the number to the inode_cache
> which will fail in this case.
> 
> When this inode number is allocated while it is still in free_ino_pinned,
> it is allocated and still added to the free inode cache when the
> pinned inodes are processed, thus one of the following inode number
> allocations will get an inode that is already in use and fail with EEXIST
> in btrfs_insert_empty_items().
> 
> One example which was created with the reproducer below:
> Create a snapshot, work in the newly created snapshot for the rest.
> In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
> start_caching() calls add_free_space [34284, 18446744073709517077].
> In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
> mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
> btrfs_unpin_free_ino calls add_free_space [34284, 1].
> mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
> EEXIST when the new inode is inserted.
> 
> One possible reproducer is this one:
>  #!/bin/sh
>  # preparation
> TEST_DEV=/dev/sdc1
> TEST_MNT=/mnt
> umount ${TEST_MNT} 2>/dev/null || true
> mkfs.btrfs -f ${TEST_DEV}
> mount ${TEST_DEV} ${TEST_MNT} -o \
>  rw,relatime,compress=lzo,space_cache,inode_cache
> btrfs subv create ${TEST_MNT}/s1
> for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
> btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
> FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed
''s|^.*/\([^/]*\)$|\1|''`
> rm ${TEST_MNT}/s2/$FILENAME
> touch ${TEST_MNT}/s2/$FILENAME
>  # the following steps can be repeated to reproduce the issue again and
again
> [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
> btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
> rm ${TEST_MNT}/s3/$FILENAME
> touch ${TEST_MNT}/s3/$FILENAME
> ls -alFi ${TEST_MNT}/s?/$FILENAME
> touch ${TEST_MNT}/s3/_1 || logger FAILED
> ls -alFi ${TEST_MNT}/s?/_1
> touch ${TEST_MNT}/s3/_2 || logger FAILED
> ls -alFi ${TEST_MNT}/s?/_2
> touch ${TEST_MNT}/s3/__1 || logger FAILED
> ls -alFi ${TEST_MNT}/s?/__1
> touch ${TEST_MNT}/s3/__2 || logger FAILED
> ls -alFi ${TEST_MNT}/s?/__2
>  # if the above is not enough, add the following loop:
> for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
>  #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED;
done
>  # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
>  # already in use that btrfs_find_ino_for_alloc() returns.
Probably a bit too obscure to turn this into  an xfstest? At least nobody
complained so far, and this reproducer takes me 1m57 to run, so nothing I want
in each xfstest cycle.

If we ever introduce a similar problem, this reproducer probably won''t
find it
(at least if it''s really dependent on the exact number of files and the
exact
inode number), unless we''re effectively reversing this patch. So no
real use for
a regression test in my opinion, I''m okay with just fixing it.
> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
> ---
>  fs/btrfs/inode-map.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 014de49..ec08004 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -237,7 +237,7 @@ again:
>  		start_caching(root);
>  
>  		if (objectid <= root->cache_progress ||
> -		    objectid > root->highest_objectid)
> +		    objectid >= root->highest_objectid)
>  			__btrfs_add_free_space(ctl, objectid, 1);
>  		else
>  			__btrfs_add_free_space(pinned, objectid, 1);
> 
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

... although this is not the most beautiful commit message I''ve ever
seen ;-)

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Oct-15 20:41 UTC

head link

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

> Probably a bit too obscure to turn this into  an xfstest? At least nobody
> complained so far, and this reproducer takes me 1m57 to run, so nothing I
want
> in each xfstest cycle.
I disagree.  The entire point of regression tests is to trigger bugs
that the usual processes failed to find, like this one.

If you think that two minutes is too long for a test to run then mark it
as "stress" (is that the xfstests group for boring long running
tests?)
or take the time to make a tighter test.

Don''t just skip regression testing.  Please.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2013-Oct-16 12:19 UTC

head link

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

On Tue, October 15, 2013 at 22:41 (+0200), Zach Brown
wrote:>> Probably a bit too obscure to turn this into  an xfstest? At least
nobody
>> complained so far, and this reproducer takes me 1m57 to run, so nothing
I want
>> in each xfstest cycle.
> 
> I disagree.  The entire point of regression tests is to trigger bugs
> that the usual processes failed to find, like this one.
> 
> If you think that two minutes is too long for a test to run then mark it
> as "stress" (is that the xfstests group for boring long running
tests?)
> or take the time to make a tighter test.
> 
> Don''t just skip regression testing.  Please.
You are mixing up my points. The first argument you''re quoting is not
against
regression testing in this case, and it deserves the "stress" answer,
I agree.

You don''t quote my second argument, which is not "just skip
regression testing".
I''ll try again in other words: A regression test only makes sense if it
can
prevent us from making the same mistake again. As far as I see, the reproducer
script is so specific, that the only thing it can prevent is an exact revert of
Stefan''s patch. If you argue that we should have a test for just this,
fair
enough, then we could use exactly Stefan''s script. I don''t
think that gains us
anything. We''re not normally reverting bugfix patches deliberately,
especially
not for very short patches with very long descriptions.

I''d very much like to see a more generic test to avoid similar
regressions, if
that can be created. I don''t have a good plan how to trigger such a
situation
(i.e. know which inodes are on the free_inode_pinned list) in a more general
way.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Oct-16 16:46 UTC

head link

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

> > Don''t just skip regression testing.  Please.
> 
> You are mixing up my points. The first argument you''re quoting is
not against
> regression testing in this case, and it deserves the "stress"
answer, I agree.
Great, then we''re done.

(Yes, I''m very much ignoring your other argument that''s based
on your
predictions of what bugs will exist in the future and whether or not
tests will find them.)

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tim Landscheidt

2013-Oct-16 17:26 UTC

head link

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

Jan Schmidt <list.btrfs@jan-o-sch.net> wrote:
> [...]
> You don''t quote my second argument, which is not "just skip
regression testing".
> I''ll try again in other words: A regression test only makes sense
if it can
> prevent us from making the same mistake again. As far as I see, the
reproducer
> script is so specific, that the only thing it can prevent is an exact
revert of
> Stefan''s patch. If you argue that we should have a test for just
this, fair
> enough, then we could use exactly Stefan''s script. I
don''t think that gains us
> anything. We''re not normally reverting bugfix patches
deliberately, especially
> not for very short patches with very long descriptions.
> [...]
The presence of the bug in the current code indicates that
someone in the past made an error, and that suggests that
this error can be repeated (by someone else) for example in
a rewrite months or years in the future.  The purpose of a
regression test is to spare anyone who touches the code to
go through all the commits to see if they''re unintentionally
reverting a bug fix, but give them a nice "FAIL"/"PASS"
traffic light.

Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Oct 2013 - [PATCH] Btrfs: Don't allocate inode that is already in use

[PATCH] Btrfs: Don''t allocate inode that is already in use

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use

Re: [PATCH] Btrfs: Don''t allocate inode that is already in use