Miao Xie
2012-Nov-01 07:33 UTC
[PATCH 2/5] Btrfs: fix missing flush when committing a transaction
Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This patch fixes the above problem by flush all pending stuffs when all the other tasks end the transaction. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> --- fs/btrfs/transaction.c | 74 ++++++++++++++++++++++++++++++----------------- 1 files changed, 47 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 6d0d5a0..d9a9a70 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); } +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); + int snap_pending = 0; + int ret; + + if (!flush_on_commit) { + spin_lock(&root->fs_info->trans_lock); + if (!list_empty(&trans->transaction->pending_snapshots)) + snap_pending = 1; + spin_unlock(&root->fs_info->trans_lock); + } + + if (flush_on_commit || snap_pending) { + btrfs_start_delalloc_inodes(root, 1); + btrfs_wait_ordered_extents(root, 1); + } + + ret = btrfs_run_delayed_items(trans, root); + if (ret) + return ret; + + /* + * running the delayed items may have added new refs. account + * them now so that they hinder processing of more delayed refs + * as little as possible. + */ + btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); + + /* + * rename don''t use btrfs_join_transaction, so, once we + * set the transaction to blocked above, we aren''t going + * to get any new ordered operations. We can safely run + * it here and no for sure that nothing new will be added + * to the list + */ + btrfs_run_ordered_operations(root, 1); + + return 0; +} + /* * btrfs_transaction state sequence: * in_commit = 0, blocked = 0 (initial) @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, int ret = -EIO; int should_grow = 0; unsigned long now = get_seconds(); - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); btrfs_run_ordered_operations(root, 0); @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, should_grow = 1; do { - int snap_pending = 0; - joined = cur_trans->num_joined; - if (!list_empty(&trans->transaction->pending_snapshots)) - snap_pending = 1; WARN_ON(cur_trans != trans->transaction); - if (flush_on_commit || snap_pending) { - btrfs_start_delalloc_inodes(root, 1); - btrfs_wait_ordered_extents(root, 1); - } - - ret = btrfs_run_delayed_items(trans, root); + ret = btrfs_flush_all_pending_stuffs(trans, root); if (ret) goto cleanup_transaction; - /* - * running the delayed items may have added new refs. account - * them now so that they hinder processing of more delayed refs - * as little as possible. - */ - btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); - - /* - * rename don''t use btrfs_join_transaction, so, once we - * set the transaction to blocked above, we aren''t going - * to get any new ordered operations. We can safely run - * it here and no for sure that nothing new will be added - * to the list - */ - btrfs_run_ordered_operations(root, 1); - prepare_to_wait(&cur_trans->writer_wait, &wait, TASK_UNINTERRUPTIBLE); @@ -1536,6 +1552,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, } while (atomic_read(&cur_trans->num_writers) > 1 || (should_grow && cur_trans->num_joined != joined)); + ret = btrfs_flush_all_pending_stuffs(trans, root); + if (ret) + goto cleanup_transaction; + /* * Ok now we need to make sure to block out any other joins while we * commit the transaction. We could have started a join before setting -- 1.7.6.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Nov-01 08:04 UTC
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
(sorry, forgot to cc linux-btrfs.) On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote:> On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: > > On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: > >> Consider the following case: > >> Task1 Task2 > >> start_transaction > >> commit_transaction > >> check pending snapshots list and the > >> list is empty. > >> add pending snapshot into list > >> skip the delalloc flush > >> end_transaction > >> ... > >> > >> And then the problem that the snapshot is different with the source subvolume > >> happen. > >> > > > > This is weird, create_snapshot() will first add pending snapshot into > > list and then commit the transaction itself, regardless of if the > > snapshot is different with others or not. > > But the transaction may be committed by the other task, and the snapshot > creation task just wait until it ends. >It''s possible that a commit tranaction becomes a end transaction when it finds itself is already in commit. So if snapshot creation starts the transaction, it will increment the transaction''s num_writers, why does not the other task wait for its end_transacion? I doubt if this can really happen anyway... Can you elaborate the situation more? thanks, liubo> > > > How do you find this? > > Just by review the code. I think it can be triggered > > Thanks > Miao > > > > > thanks, > > liubo > > > >> This patch fixes the above problem by flush all pending stuffs when all the > >> other tasks end the transaction. > >> > >> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> > >> --- > >> fs/btrfs/transaction.c | 74 ++++++++++++++++++++++++++++++----------------- > >> 1 files changed, 47 insertions(+), 27 deletions(-) > >> > >> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > >> index 6d0d5a0..d9a9a70 100644 > >> --- a/fs/btrfs/transaction.c > >> +++ b/fs/btrfs/transaction.c > >> @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, > >> kmem_cache_free(btrfs_trans_handle_cachep, trans); > >> } > >> > >> +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, > >> + struct btrfs_root *root) > >> +{ > >> + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); > >> + int snap_pending = 0; > >> + int ret; > >> + > >> + if (!flush_on_commit) { > >> + spin_lock(&root->fs_info->trans_lock); > >> + if (!list_empty(&trans->transaction->pending_snapshots)) > >> + snap_pending = 1; > >> + spin_unlock(&root->fs_info->trans_lock); > >> + } > >> + > >> + if (flush_on_commit || snap_pending) { > >> + btrfs_start_delalloc_inodes(root, 1); > >> + btrfs_wait_ordered_extents(root, 1); > >> + } > >> + > >> + ret = btrfs_run_delayed_items(trans, root); > >> + if (ret) > >> + return ret; > >> + > >> + /* > >> + * running the delayed items may have added new refs. account > >> + * them now so that they hinder processing of more delayed refs > >> + * as little as possible. > >> + */ > >> + btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); > >> + > >> + /* > >> + * rename don''t use btrfs_join_transaction, so, once we > >> + * set the transaction to blocked above, we aren''t going > >> + * to get any new ordered operations. We can safely run > >> + * it here and no for sure that nothing new will be added > >> + * to the list > >> + */ > >> + btrfs_run_ordered_operations(root, 1); > >> + > >> + return 0; > >> +} > >> + > >> /* > >> * btrfs_transaction state sequence: > >> * in_commit = 0, blocked = 0 (initial) > >> @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > >> int ret = -EIO; > >> int should_grow = 0; > >> unsigned long now = get_seconds(); > >> - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); > >> > >> btrfs_run_ordered_operations(root, 0); > >> > >> @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > >> should_grow = 1; > >> > >> do { > >> - int snap_pending = 0; > >> - > >> joined = cur_trans->num_joined; > >> - if (!list_empty(&trans->transaction->pending_snapshots)) > >> - snap_pending = 1; > >> > >> WARN_ON(cur_trans != trans->transaction); > >> > >> - if (flush_on_commit || snap_pending) { > >> - btrfs_start_delalloc_inodes(root, 1); > >> - btrfs_wait_ordered_extents(root, 1); > >> - } > >> - > >> - ret = btrfs_run_delayed_items(trans, root); > >> + ret = btrfs_flush_all_pending_stuffs(trans, root); > >> if (ret) > >> goto cleanup_transaction; > >> > >> - /* > >> - * running the delayed items may have added new refs. account > >> - * them now so that they hinder processing of more delayed refs > >> - * as little as possible. > >> - */ > >> - btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); > >> - > >> - /* > >> - * rename don''t use btrfs_join_transaction, so, once we > >> - * set the transaction to blocked above, we aren''t going > >> - * to get any new ordered operations. We can safely run > >> - * it here and no for sure that nothing new will be added > >> - * to the list > >> - */ > >> - btrfs_run_ordered_operations(root, 1); > >> - > >> prepare_to_wait(&cur_trans->writer_wait, &wait, > >> TASK_UNINTERRUPTIBLE); > >> > >> @@ -1536,6 +1552,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, > >> } while (atomic_read(&cur_trans->num_writers) > 1 || > >> (should_grow && cur_trans->num_joined != joined)); > >> > >> + ret = btrfs_flush_all_pending_stuffs(trans, root); > >> + if (ret) > >> + goto cleanup_transaction; > >> + > >> /* > >> * Ok now we need to make sure to block out any other joins while we > >> * commit the transaction. We could have started a join before setting > >> -- > >> 1.7.6.5 > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Nov-01 08:16 UTC
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote:> (sorry, forgot to cc linux-btrfs.) > On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: >> On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: >>> On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: >>>> Consider the following case: >>>> Task1 Task2 >>>> start_transaction >>>> commit_transaction >>>> check pending snapshots list and the >>>> list is empty. >>>> add pending snapshot into list >>>> skip the delalloc flush >>>> end_transaction >>>> ... >>>> >>>> And then the problem that the snapshot is different with the source subvolume >>>> happen. >>>> >>> >>> This is weird, create_snapshot() will first add pending snapshot into >>> list and then commit the transaction itself, regardless of if the >>> snapshot is different with others or not. >> >> But the transaction may be committed by the other task, and the snapshot >> creation task just wait until it ends. >> > > It''s possible that a commit tranaction becomes a end transaction when it > finds itself is already in commit. > > So if snapshot creation starts the transaction, it will increment the > transaction''s num_writers, why does not the other task wait for its > end_transacion? > > I doubt if this can really happen anyway... > > Can you elaborate the situation more?Task1 Task2 start_transaction start_transaction commit_transaction set in_commit to 1 check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush commit_transaction find in_commit is 1 end_transaction (num_writer--) wait_for_commit num_writer is 1 continue committing the transaction ... Thanks Miao> > thanks, > liubo > >>> >>> How do you find this? >> >> Just by review the code. I think it can be triggered >> >> Thanks >> Miao >> >>> >>> thanks, >>> liubo >>> >>>> This patch fixes the above problem by flush all pending stuffs when all the >>>> other tasks end the transaction. >>>> >>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> >>>> --- >>>> fs/btrfs/transaction.c | 74 ++++++++++++++++++++++++++++++----------------- >>>> 1 files changed, 47 insertions(+), 27 deletions(-) >>>> >>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c >>>> index 6d0d5a0..d9a9a70 100644 >>>> --- a/fs/btrfs/transaction.c >>>> +++ b/fs/btrfs/transaction.c >>>> @@ -1401,6 +1401,48 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, >>>> kmem_cache_free(btrfs_trans_handle_cachep, trans); >>>> } >>>> >>>> +static int btrfs_flush_all_pending_stuffs(struct btrfs_trans_handle *trans, >>>> + struct btrfs_root *root) >>>> +{ >>>> + int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); >>>> + int snap_pending = 0; >>>> + int ret; >>>> + >>>> + if (!flush_on_commit) { >>>> + spin_lock(&root->fs_info->trans_lock); >>>> + if (!list_empty(&trans->transaction->pending_snapshots)) >>>> + snap_pending = 1; >>>> + spin_unlock(&root->fs_info->trans_lock); >>>> + } >>>> + >>>> + if (flush_on_commit || snap_pending) { >>>> + btrfs_start_delalloc_inodes(root, 1); >>>> + btrfs_wait_ordered_extents(root, 1); >>>> + } >>>> + >>>> + ret = btrfs_run_delayed_items(trans, root); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + /* >>>> + * running the delayed items may have added new refs. account >>>> + * them now so that they hinder processing of more delayed refs >>>> + * as little as possible. >>>> + */ >>>> + btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); >>>> + >>>> + /* >>>> + * rename don''t use btrfs_join_transaction, so, once we >>>> + * set the transaction to blocked above, we aren''t going >>>> + * to get any new ordered operations. We can safely run >>>> + * it here and no for sure that nothing new will be added >>>> + * to the list >>>> + */ >>>> + btrfs_run_ordered_operations(root, 1); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> /* >>>> * btrfs_transaction state sequence: >>>> * in_commit = 0, blocked = 0 (initial) >>>> @@ -1418,7 +1460,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, >>>> int ret = -EIO; >>>> int should_grow = 0; >>>> unsigned long now = get_seconds(); >>>> - int flush_on_commit = btrfs_test_opt(root, FLUSHONCOMMIT); >>>> >>>> btrfs_run_ordered_operations(root, 0); >>>> >>>> @@ -1491,39 +1532,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, >>>> should_grow = 1; >>>> >>>> do { >>>> - int snap_pending = 0; >>>> - >>>> joined = cur_trans->num_joined; >>>> - if (!list_empty(&trans->transaction->pending_snapshots)) >>>> - snap_pending = 1; >>>> >>>> WARN_ON(cur_trans != trans->transaction); >>>> >>>> - if (flush_on_commit || snap_pending) { >>>> - btrfs_start_delalloc_inodes(root, 1); >>>> - btrfs_wait_ordered_extents(root, 1); >>>> - } >>>> - >>>> - ret = btrfs_run_delayed_items(trans, root); >>>> + ret = btrfs_flush_all_pending_stuffs(trans, root); >>>> if (ret) >>>> goto cleanup_transaction; >>>> >>>> - /* >>>> - * running the delayed items may have added new refs. account >>>> - * them now so that they hinder processing of more delayed refs >>>> - * as little as possible. >>>> - */ >>>> - btrfs_delayed_refs_qgroup_accounting(trans, root->fs_info); >>>> - >>>> - /* >>>> - * rename don''t use btrfs_join_transaction, so, once we >>>> - * set the transaction to blocked above, we aren''t going >>>> - * to get any new ordered operations. We can safely run >>>> - * it here and no for sure that nothing new will be added >>>> - * to the list >>>> - */ >>>> - btrfs_run_ordered_operations(root, 1); >>>> - >>>> prepare_to_wait(&cur_trans->writer_wait, &wait, >>>> TASK_UNINTERRUPTIBLE); >>>> >>>> @@ -1536,6 +1552,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, >>>> } while (atomic_read(&cur_trans->num_writers) > 1 || >>>> (should_grow && cur_trans->num_joined != joined)); >>>> >>>> + ret = btrfs_flush_all_pending_stuffs(trans, root); >>>> + if (ret) >>>> + goto cleanup_transaction; >>>> + >>>> /* >>>> * Ok now we need to make sure to block out any other joins while we >>>> * commit the transaction. We could have started a join before setting >>>> -- >>>> 1.7.6.5 >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Nov-01 09:00 UTC
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On Thu, Nov 01, 2012 at 04:16:43PM +0800, Miao Xie wrote:> On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote: > > (sorry, forgot to cc linux-btrfs.) > > On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: > >> On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: > >>> On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: > >>>> Consider the following case: > >>>> Task1 Task2 > >>>> start_transaction > >>>> commit_transaction > >>>> check pending snapshots list and the > >>>> list is empty. > >>>> add pending snapshot into list > >>>> skip the delalloc flush > >>>> end_transaction > >>>> ... > >>>> > >>>> And then the problem that the snapshot is different with the source subvolume > >>>> happen. > >>>> > >>> > >>> This is weird, create_snapshot() will first add pending snapshot into > >>> list and then commit the transaction itself, regardless of if the > >>> snapshot is different with others or not. > >> > >> But the transaction may be committed by the other task, and the snapshot > >> creation task just wait until it ends. > >> > > > > It''s possible that a commit tranaction becomes a end transaction when it > > finds itself is already in commit. > > > > So if snapshot creation starts the transaction, it will increment the > > transaction''s num_writers, why does not the other task wait for its > > end_transacion? > > > > I doubt if this can really happen anyway... > > > > Can you elaborate the situation more? > > Task1 Task2 > start_transaction > start_transaction > commit_transaction > set in_commit to 1 > check pending snapshots list and the list is empty. > add pending snapshot into list > skip the delalloc flush > commit_transaction > find in_commit is 1 > end_transaction (num_writer--) > wait_for_commit > num_writer is 1 > continue committing the transaction > ... >Make sense. Then I think we''d better put the flush part right after setting ''trans_no_join = 1'' since snapshot creation may also join an existing transaction. thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Miao Xie
2012-Nov-01 10:18 UTC
Re: [PATCH 2/5] Btrfs: fix missing flush when committing a transaction
On Thu, 1 Nov 2012 17:00:00 +0800, Liu Bo wrote:> On Thu, Nov 01, 2012 at 04:16:43PM +0800, Miao Xie wrote: >> On thu, 1 Nov 2012 16:04:27 +0800, Liu Bo wrote: >>> (sorry, forgot to cc linux-btrfs.) >>> On Thu, Nov 01, 2012 at 03:51:41PM +0800, Miao Xie wrote: >>>> On Thu, 1 Nov 2012 15:44:43 +0800, Liu Bo wrote: >>>>> On Thu, Nov 01, 2012 at 03:33:14PM +0800, Miao Xie wrote: >>>>>> Consider the following case: >>>>>> Task1 Task2 >>>>>> start_transaction >>>>>> commit_transaction >>>>>> check pending snapshots list and the >>>>>> list is empty. >>>>>> add pending snapshot into list >>>>>> skip the delalloc flush >>>>>> end_transaction >>>>>> ... >>>>>> >>>>>> And then the problem that the snapshot is different with the source subvolume >>>>>> happen. >>>>>> >>>>> >>>>> This is weird, create_snapshot() will first add pending snapshot into >>>>> list and then commit the transaction itself, regardless of if the >>>>> snapshot is different with others or not. >>>> >>>> But the transaction may be committed by the other task, and the snapshot >>>> creation task just wait until it ends. >>>> >>> >>> It''s possible that a commit tranaction becomes a end transaction when it >>> finds itself is already in commit. >>> >>> So if snapshot creation starts the transaction, it will increment the >>> transaction''s num_writers, why does not the other task wait for its >>> end_transacion? >>> >>> I doubt if this can really happen anyway... >>> >>> Can you elaborate the situation more? >> >> Task1 Task2 >> start_transaction >> start_transaction >> commit_transaction >> set in_commit to 1 >> check pending snapshots list and the list is empty. >> add pending snapshot into list >> skip the delalloc flush >> commit_transaction >> find in_commit is 1 >> end_transaction (num_writer--) >> wait_for_commit >> num_writer is 1 >> continue committing the transaction >> ... >> > > Make sense. > > Then I think we''d better put the flush part right after setting ''trans_no_join = 1''No, or the flusher will be blocked when it joins an transaction.> since snapshot creation may also join an existing transaction.It is impossible because btrfs_start_transaction is different from btrfs_join_transaction, it will be blocked when transaction->blocked is 1. Snapshot creation uses btrfs_start_transaction. Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html