Patch 1 is a NULL pointer fix. Patch 2-4 are cleanups. Patch 5 is an enhancement for the file clone ioctl. Liu Bo (5): Btrfs: fix use-after-free bug in umount Btrfs: update new flags for tracepoint Btrfs: kill replicate code in replay_one_buffer Btrfs: remove unused code in btrfs_del_root Btrfs: allow file data clone within a file fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/ioctl.c | 26 +++++++++++++++++++------- fs/btrfs/root-tree.c | 4 ---- fs/btrfs/tree-log.c | 9 ++------- include/trace/events/btrfs.h | 35 +++++++++++++++++++++++------------ 5 files changed, 46 insertions(+), 32 deletions(-) -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Commit be283b2e674a09457d4563729015adb637ce7cc1 ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We''ve free''ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/disk-io.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e7b3cb5..a5c8f28 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3512,10 +3512,10 @@ int close_ctree(struct btrfs_root *root) percpu_counter_sum(&fs_info->delalloc_bytes)); } - free_root_pointers(fs_info, 1); - btrfs_free_block_groups(fs_info); + free_root_pointers(fs_info, 1); + del_fs_roots(fs_info); iput(fs_info->btree_inode); -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adding new flags to keep tracepoints consistent with btrfs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- include/trace/events/btrfs.h | 35 +++++++++++++++++++++++------------ 1 files changed, 23 insertions(+), 12 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index ea546a4..2902657 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -40,22 +40,25 @@ struct extent_buffer; { BTRFS_ROOT_TREE_DIR_OBJECTID, "ROOT_TREE_DIR" }, \ { BTRFS_CSUM_TREE_OBJECTID, "CSUM_TREE" }, \ { BTRFS_TREE_LOG_OBJECTID, "TREE_LOG" }, \ + { BTRFS_QUOTA_TREE_OBJECTID, "QUOTA_TREE" }, \ { BTRFS_TREE_RELOC_OBJECTID, "TREE_RELOC" }, \ { BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" }) #define show_root_type(obj) \ obj, ((obj >= BTRFS_DATA_RELOC_TREE_OBJECTID) || \ (obj >= BTRFS_ROOT_TREE_OBJECTID && \ - obj <= BTRFS_CSUM_TREE_OBJECTID)) ? __show_root_type(obj) : "-" + obj <= BTRFS_QUOTA_TREE_OBJECTID)) ? __show_root_type(obj) : "-" #define BTRFS_GROUP_FLAGS \ - { BTRFS_BLOCK_GROUP_DATA, "DATA"}, \ - { BTRFS_BLOCK_GROUP_SYSTEM, "SYSTEM"}, \ - { BTRFS_BLOCK_GROUP_METADATA, "METADATA"}, \ - { BTRFS_BLOCK_GROUP_RAID0, "RAID0"}, \ - { BTRFS_BLOCK_GROUP_RAID1, "RAID1"}, \ - { BTRFS_BLOCK_GROUP_DUP, "DUP"}, \ - { BTRFS_BLOCK_GROUP_RAID10, "RAID10"} + { BTRFS_BLOCK_GROUP_DATA, "DATA"}, \ + { BTRFS_BLOCK_GROUP_SYSTEM, "SYSTEM"}, \ + { BTRFS_BLOCK_GROUP_METADATA, "METADATA"}, \ + { BTRFS_BLOCK_GROUP_RAID0, "RAID0"}, \ + { BTRFS_BLOCK_GROUP_RAID1, "RAID1"}, \ + { BTRFS_BLOCK_GROUP_DUP, "DUP"}, \ + { BTRFS_BLOCK_GROUP_RAID10, "RAID10"}, \ + { BTRFS_BLOCK_GROUP_RAID5, "RAID5"}, \ + { BTRFS_BLOCK_GROUP_RAID6, "RAID6"} #define BTRFS_UUID_SIZE 16 @@ -154,7 +157,9 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict, { EXTENT_FLAG_PINNED, "PINNED" }, \ { EXTENT_FLAG_COMPRESSED, "COMPRESSED" }, \ { EXTENT_FLAG_VACANCY, "VACANCY" }, \ - { EXTENT_FLAG_PREALLOC, "PREALLOC" }) + { EXTENT_FLAG_PREALLOC, "PREALLOC" }, \ + { EXTENT_FLAG_LOGGING, "LOGGING" }, \ + { EXTENT_FLAG_FILLING, "FILLING" }) TRACE_EVENT(btrfs_get_extent, @@ -201,13 +206,17 @@ TRACE_EVENT(btrfs_get_extent, ); #define show_ordered_flags(flags) \ - __print_symbolic(flags, \ + __print_symbolic(flags, \ { BTRFS_ORDERED_IO_DONE, "IO_DONE" }, \ { BTRFS_ORDERED_COMPLETE, "COMPLETE" }, \ { BTRFS_ORDERED_NOCOW, "NOCOW" }, \ { BTRFS_ORDERED_COMPRESSED, "COMPRESSED" }, \ { BTRFS_ORDERED_PREALLOC, "PREALLOC" }, \ - { BTRFS_ORDERED_DIRECT, "DIRECT" }) + { BTRFS_ORDERED_DIRECT, "DIRECT" }, \ + { BTRFS_ORDERED_IOERR, "IOERR" }, \ + { BTRFS_ORDERED_UPDATED_ISIZE, "UPDATED_ISIZE" }, \ + { BTRFS_ORDERED_LOGGED_CSUM, "LOGGED_CSUM" }) + DECLARE_EVENT_CLASS(btrfs__ordered_extent, @@ -555,7 +564,9 @@ TRACE_EVENT(btrfs_delayed_ref_head, { BTRFS_BLOCK_GROUP_RAID0, "RAID0" }, \ { BTRFS_BLOCK_GROUP_RAID1, "RAID1" }, \ { BTRFS_BLOCK_GROUP_DUP, "DUP" }, \ - { BTRFS_BLOCK_GROUP_RAID10, "RAID10"}) + { BTRFS_BLOCK_GROUP_RAID10, "RAID10"}, \ + { BTRFS_BLOCK_GROUP_RAID5, "RAID5" }, \ + { BTRFS_BLOCK_GROUP_RAID6, "RAID6" }) DECLARE_EVENT_CLASS(btrfs__chunk, -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
EXTREF is treated same as REF, so we can make the code tidy. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/tree-log.c | 9 ++------- 1 files changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c276ac9..3f30053 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2016,13 +2016,8 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb, eb, i, &key); if (ret) break; - } else if (key.type == BTRFS_INODE_REF_KEY) { - ret = add_inode_ref(wc->trans, root, log, path, - eb, i, &key); - if (ret && ret != -ENOENT) - break; - ret = 0; - } else if (key.type == BTRFS_INODE_EXTREF_KEY) { + } else if (key.type == BTRFS_INODE_REF_KEY || + key.type == BTRFS_INODE_EXTREF_KEY) { ret = add_inode_ref(wc->trans, root, log, path, eb, i, &key); if (ret && ret != -ENOENT) -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-May-26 13:50 UTC
[PATCH 4/5 RESEND] Btrfs: remove unused code in btrfs_del_root
''leaf'' and ''ri'' is not used somehow. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/root-tree.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 5bf1ed5..4a5abda 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -368,8 +368,6 @@ int btrfs_del_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, { struct btrfs_path *path; int ret; - struct btrfs_root_item *ri; - struct extent_buffer *leaf; path = btrfs_alloc_path(); if (!path) @@ -379,8 +377,6 @@ int btrfs_del_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, goto out; BUG_ON(ret != 0); - leaf = path->nodes[0]; - ri = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_root_item); ret = btrfs_del_item(trans, root, path); out: -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2013-May-26 13:50 UTC
[PATCH 5/5 RESEND] Btrfs: allow file data clone within a file
We did not allow file data clone within the same file because of deadlock issues. However, we now use nested lock to avoid deadlock between the parent directory and the child file. So it''s safe to do file clone within the same file when the two ranges are not overlapped. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/ioctl.c | 26 +++++++++++++++++++------- 1 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0f81d67..112153a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2480,6 +2480,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, int ret; u64 len = olen; u64 bs = root->fs_info->sb->s_blocksize; + int same_inode = 0; /* * TODO: @@ -2516,7 +2517,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, ret = -EINVAL; if (src == inode) - goto out_fput; + same_inode = 1; /* the src must be open for reading */ if (!(src_file.file->f_mode & FMODE_READ)) @@ -2547,12 +2548,16 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, } path->reada = 2; - if (inode < src) { - mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD); + if (!same_inode) { + if (inode < src) { + mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD); + } else { + mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + } } else { - mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD); + mutex_lock(&src->i_mutex); } /* determine range to clone */ @@ -2570,6 +2575,12 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, !IS_ALIGNED(destoff, bs)) goto out_unlock; + /* verify if ranges are overlapped within the same file */ + if (same_inode) { + if (destoff + len > off && destoff < off + len) + goto out_unlock; + } + if (destoff > inode->i_size) { ret = btrfs_cont_expand(inode, inode->i_size, destoff); if (ret) @@ -2846,7 +2857,8 @@ out: unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1); out_unlock: mutex_unlock(&src->i_mutex); - mutex_unlock(&inode->i_mutex); + if (!same_inode) + mutex_unlock(&inode->i_mutex); vfree(buf); btrfs_free_path(path); out_fput: -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-May-27 12:07 UTC
Re: [PATCH 1/5] Btrfs: fix use-after-free bug during umount
On Sun, May 26, 2013 at 09:50:27PM +0800, Liu Bo wrote:> Commit be283b2e674a09457d4563729015adb637ce7cc1 > ( Btrfs: use helper to cleanup tree roots) introduced the following bug,Well, it did not introduce the bug, but made it visible.> We''ve free''ed commit_root before actually getting to free block groups where > caching thread needs valid extent_root->commit_root. > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > --- > fs/btrfs/disk-io.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index e7b3cb5..a5c8f28 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3512,10 +3512,10 @@ int close_ctree(struct btrfs_root *root) > percpu_counter_sum(&fs_info->delalloc_bytes)); > } > > - free_root_pointers(fs_info, 1); > - > btrfs_free_block_groups(fs_info); > > + free_root_pointers(fs_info, 1); > + > del_fs_roots(fs_info); > > iput(fs_info->btree_inode);This makes it just harder to hit, but the worker threads that get stopped after iput may still access the freed roots, like mentioned here http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24239.html david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 27, 2013 at 02:07:00PM +0200, David Sterba wrote:> On Sun, May 26, 2013 at 09:50:27PM +0800, Liu Bo wrote: > > Commit be283b2e674a09457d4563729015adb637ce7cc1 > > ( Btrfs: use helper to cleanup tree roots) introduced the following bug, > > Well, it did not introduce the bug, but made it visible. > > > We''ve free''ed commit_root before actually getting to free block groups where > > caching thread needs valid extent_root->commit_root. > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > --- > > fs/btrfs/disk-io.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index e7b3cb5..a5c8f28 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -3512,10 +3512,10 @@ int close_ctree(struct btrfs_root *root) > > percpu_counter_sum(&fs_info->delalloc_bytes)); > > } > > > > - free_root_pointers(fs_info, 1); > > - > > btrfs_free_block_groups(fs_info); > > > > + free_root_pointers(fs_info, 1); > > + > > del_fs_roots(fs_info); > > > > iput(fs_info->btree_inode); > > This makes it just harder to hit, but the worker threads that get > stopped after iput may still access the freed roots, like mentioned here > > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24239.html > > davidHi David, The original code just free_extent_buffer(commit_root), so the bug proves that there can still be a reference on commit_root during the period between freeing tree roots and stopping workers. I think we should add a parameter for free_root_pointers() to tell it not set root->commit_root NULL. If you agree on this, I can make a patch :) (I did spend a lot of time to reproduce this with xfstests as you showed, but I failed somehow, the magic is that I reproduced it while testing the dedup patch set, lol) thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html