Josef Bacik
2011-Jun-14 19:17 UTC
[PATCH] Btrfs: protect the pending_snapshots list with trans_lock
Currently there is nothing protecting the pending_snapshots list on the transaction. We only hold the directory mutex that we are snapshotting and a read lock on the subvol_sem, so we could race with somebody else creating a snapshot in a different directory and end up with list corruption. So protect this list with the trans_lock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> --- fs/btrfs/ioctl.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index b793d11..a3c4751 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -482,8 +482,10 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, ret = btrfs_snap_reserve_metadata(trans, pending_snapshot); BUG_ON(ret); + spin_lock(&root->fs_info->trans_lock); list_add(&pending_snapshot->list, &trans->transaction->pending_snapshots); + spin_unlock(&root->fs_info->trans_lock); if (async_transid) { *async_transid = trans->transid; ret = btrfs_commit_transaction_async(trans, -- 1.7.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Jun-15 09:53 UTC
Re: [PATCH] Btrfs: protect the pending_snapshots list with trans_lock
Hi, On Tue, Jun 14, 2011 at 03:17:47PM -0400, Josef Bacik wrote:> Currently there is nothing protecting the pending_snapshots list on the > transaction. We only hold the directory mutex that we are snapshotting and a > read lock on the subvol_sem, so we could race with somebody else creating a > snapshot in a different directory and end up with list corruption. So protect > this list with the trans_lock. Thanks,it''s correct that the pending_snapshots list is not protected, but I do not see a way how to corrupt the list. The callchain goes like this: <user called ioctl> btrfs_ioctl_snap_create_transid btrfs_mksubvol create_snapshot the interesting stuff happens inside create_snapshot: * alloc pending snapshot struct * start transaction (btrfs_start_transaction) * reserve metadata * list_add pending_snapshot to the transaction * commit the transaction (sync or async) * free pending snapshot struct the snapshots are really created from btrfs_commit_transaction, create_pending_snapshots(), which does not protect the trans->pending_snapshots list, just traverses it. There is a potential corruption too but ... ... there is the ''start transaction'' call, which tries very hard to start a new transaction, ie. this call returns a new transaction handle. To prove this point needs more space, I will skip it now. The important thing for our case is that new transaction waits for the current to unblock. Then, each snap create will have its own transaction and own pending_snapshots list, and will create just one snapshot per transaction, because any concurrently running ioctl snap create will block in the start_transaction call. Therefore I think that either 1) there is no locking needed around pending_snapshots (under assumption one snapshot per transaction), or 2) you need to protect every access to the pending_snapshot: - add in create_snapshot - splice in btrfs_destroy_pending_snapshots - list foreach in create_pending_snapshots - list_empty in btrfs_commit_transaction david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2011-Jun-15 11:41 UTC
Re: [PATCH] Btrfs: protect the pending_snapshots list with trans_lock
Excerpts from David Sterba''s message of 2011-06-15 05:53:29 -0400:> Hi, > > On Tue, Jun 14, 2011 at 03:17:47PM -0400, Josef Bacik wrote: > > Currently there is nothing protecting the pending_snapshots list on the > > transaction. We only hold the directory mutex that we are snapshotting and a > > read lock on the subvol_sem, so we could race with somebody else creating a > > snapshot in a different directory and end up with list corruption. So protect > > this list with the trans_lock. Thanks, > > it''s correct that the pending_snapshots list is not protected, but I do > not see a way how to corrupt the list. > > The callchain goes like this: > > <user called ioctl> > btrfs_ioctl_snap_create_transid > btrfs_mksubvol > create_snapshot > > the interesting stuff happens inside create_snapshot: > > * alloc pending snapshot struct > * start transaction (btrfs_start_transaction) > * reserve metadata > * list_add pending_snapshot to the transaction > * commit the transaction (sync or async) > * free pending snapshot struct > > the snapshots are really created from btrfs_commit_transaction, > create_pending_snapshots(), which does not protect the > trans->pending_snapshots list, just traverses it. There is a potential > corruption too but ... > > ... there is the ''start transaction'' call, which tries very hard to > start a new transaction, ie. this call returns a new transaction handle. > To prove this point needs more space, I will skip it now. The important > thing for our case is that new transaction waits for the current to > unblock. > > Then, each snap create will have its own transaction and own > pending_snapshots list, and will create just one snapshot per > transaction, because any concurrently running ioctl snap create will block > in the start_transaction call. > > Therefore I think that either > > 1) there is no locking needed around pending_snapshots (under assumption > one snapshot per transaction), orThere is definitely a window where two procs can be inside create_snapshot() at the same time in the same transaction. You''re correct that the window is small, but its there. We''re not racing with the bulk of the work done at transaction commit time though. Going down to your list:> > 2) you need to protect every access to the pending_snapshot: > - add in create_snapshotcreate_snapshot is racey.> - splice in btrfs_destroy_pending_snapshotsThis can only happen during unmount while there are no writers.> - list foreach in create_pending_snapshotsThis can only happen during commit while there are no writers> - list_empty in btrfs_commit_transactionThis is there to make sure that any delayed allocation from the source subvol is actually done in the snapshot. You''re right that we should have a better check here. The worst possible case in getting this wrong is that the snapshot misses some recent file writes in the original, but it will be completely consistent. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Jun-20 22:32 UTC
Re: [PATCH] Btrfs: protect the pending_snapshots list with trans_lock
On Wed, Jun 15, 2011 at 07:41:37AM -0400, Chris Mason wrote:> There is definitely a window where two procs can be inside > create_snapshot() at the same time in the same transaction.I trust you on that. I was trying to follow the callgraph from btrfs_strat_transaction called from create_snapshot to see whether and how two calls may access the list and came to conclusion that there must be a fresh transaction after the call. There are lots of join_transaction, commit_transaction calls and I could have lost track. I''m leaving this to me as a homework, it was very educative anyway.> Going down to your list: > > > > > 2) you need to protect every access to the pending_snapshot: > > - add in create_snapshot > > create_snapshot is racey.agreed, the patch fixes it> > > - splice in btrfs_destroy_pending_snapshots > > This can only happen during unmount while there are no writers. > > > - list foreach in create_pending_snapshots > > This can only happen during commit while there are no writersthis counts as protection, for both> > - list_empty in btrfs_commit_transaction > > This is there to make sure that any delayed allocation from the source > subvol is actually done in the snapshot. You''re right that we should > have a better check here. The worst possible case in getting this wrong > is that the snapshot misses some recent file writes in the original, but > it will be completely consistent.I think this is ok and acceptable. Thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html