<liub.liubo@gmail.com>
2012-Aug-14  05:01 UTC
[PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
From: Liu Bo <bo.li.liu@oracle.com>
I found this while testing xfstests 068, the story is
    t1                                            t2
  sys_sync                                    thaw_super
    iterate_supers
      down_read(sb->s_umount)                   down_write(sb->s_umount)
--->wait for t1
      sync_fs (with wait mode)
        start_transaction
          sb_start_intwrite --------------------> wait for t2 to set
s_writers.frozen to SB_UNFROZEN
In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
start_transaction in sync_fs() with wait mode so that we won''t hit the
deadlock.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/super.c   |   15 +++++++++++++++
 include/linux/fs.h |    5 +++++
 2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f2eb24c..1e04b41 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -847,6 +847,21 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 		return 0;
 	}
 
+	/*
+	 * sys_sync can cause an ABBA deadlock with freeze/thaw
+	 * o freeze_super()   grabs s_umount lock and set sb to SB_FREEZE_FS.
+	 * o thaw_super()     grabs s_umount lock and set sb to SB_UNFROZEN.
+	 * o iterate_supers() grabs s_umount lock, and sync fs, during which
+	 *                    we need to do sb_start_intwrite() in starting a
+	 *                    new transaction.
+	 * so iterate_supers() will wait for thaw_super() to reset sb''s
frozen
+	 * state, while thaw_super() will wait for iterate_supers() to drop the
+	 * s_umount lock.  This is an ABBA deadlock.
+	 */
+	if (!sb_start_intwrite_trylock(sb))
+		return 0;
+	sb_end_intwrite(sb);
+
 	btrfs_wait_ordered_extents(root, 0, 0);
 
 	trans = btrfs_start_transaction(root, 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..8a3efd0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1700,6 +1700,11 @@ static inline void sb_start_intwrite(struct super_block
*sb)
 	__sb_start_write(sb, SB_FREEZE_FS, true);
 }
 
+static inline int sb_start_intwrite_trylock(struct super_block *sb)
+{
+	return __sb_start_write(sb, SB_FREEZE_FS, false);
+}
+
 
 extern bool inode_owner_or_capable(const struct inode *inode);
 
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Stornelli
2012-Aug-14  12:59 UTC
Re: [PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:> From: Liu Bo <bo.li.liu@oracle.com> > > I found this while testing xfstests 068, the story is > > t1 t2 > sys_sync thaw_super > iterate_supers > down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 > sync_fs (with wait mode) > start_transaction > sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN > > In this patch, I add an helper sb_start_intwrite_trylock() and use it before we > start_transaction in sync_fs() with wait mode so that we won''t hit the deadlock. >IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. According to man page, no other operation should modify the fs after the freeze. So for me the modification is inside sync_filesystem (and sync_one_sb). Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Aug-14  13:53 UTC
Re: [PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
On 08/14/2012 08:59 PM, Marco Stornelli wrote:> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >> From: Liu Bo <bo.li.liu@oracle.com> >> >> I found this while testing xfstests 068, the story is >> >> t1 t2 >> sys_sync thaw_super >> iterate_supers >> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >> sync_fs (with wait mode) >> start_transaction >> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >> >> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >> start_transaction in sync_fs() with wait mode so that we won''t hit the deadlock. >> > > IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. > According to man page, no other operation should modify the fs after the freeze. > So for me the modification is inside sync_filesystem (and sync_one_sb).Do you mean that we should add the trylock check in sync_filesystem? But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). thanks, liubo> > Marco > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco Stornelli
2012-Aug-14  14:12 UTC
Re: [PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
Il 14/08/2012 15:53, Liu Bo ha scritto:> On 08/14/2012 08:59 PM, Marco Stornelli wrote: >> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >>> From: Liu Bo <bo.li.liu@oracle.com> >>> >>> I found this while testing xfstests 068, the story is >>> >>> t1 t2 >>> sys_sync thaw_super >>> iterate_supers >>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >>> sync_fs (with wait mode) >>> start_transaction >>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >>> >>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >>> start_transaction in sync_fs() with wait mode so that we won''t hit the deadlock. >>> >> >> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. >> According to man page, no other operation should modify the fs after the freeze. >> So for me the modification is inside sync_filesystem (and sync_one_sb). > > Do you mean that we should add the trylock check in sync_filesystem? > > But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). > > thanks, > liubo >I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. Marco -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Aug-15  12:51 UTC
Re: [PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
(CCed Jan, the author of freeze code) On 08/14/2012 10:12 PM, Marco Stornelli wrote:> Il 14/08/2012 15:53, Liu Bo ha scritto: >> On 08/14/2012 08:59 PM, Marco Stornelli wrote: >>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >>>> From: Liu Bo <bo.li.liu@oracle.com> >>>> >>>> I found this while testing xfstests 068, the story is >>>> >>>> t1 t2 >>>> sys_sync thaw_super >>>> iterate_supers >>>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >>>> sync_fs (with wait mode) >>>> start_transaction >>>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >>>> >>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >>>> start_transaction in sync_fs() with wait mode so that we won''t hit the deadlock. >>>> >>> >>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. >>> According to man page, no other operation should modify the fs after the freeze. >>> So for me the modification is inside sync_filesystem (and sync_one_sb). >> >> Do you mean that we should add the trylock check in sync_filesystem? >> >> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). >> >> thanks, >> liubo >> > > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. >I''m ok with it. What do you think about it, Jan? Any comments? thanks, liubo> Marco-- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kara
2012-Aug-15  13:12 UTC
Re: [PATCH RFC] Btrfs: fix deadlock between sys_sync and freeze
On Wed 15-08-12 20:51:17, Liu Bo wrote:> (CCed Jan, the author of freeze code) > On 08/14/2012 10:12 PM, Marco Stornelli wrote: > > Il 14/08/2012 15:53, Liu Bo ha scritto: > >> On 08/14/2012 08:59 PM, Marco Stornelli wrote: > >>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: > >>>> From: Liu Bo <bo.li.liu@oracle.com> > >>>> > >>>> I found this while testing xfstests 068, the story is > >>>> > >>>> t1 t2 > >>>> sys_sync thaw_super > >>>> iterate_supers > >>>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 > >>>> sync_fs (with wait mode) > >>>> start_transaction > >>>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN > >>>> > >>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we > >>>> start_transaction in sync_fs() with wait mode so that we won''t hit the deadlock. > >>>> > >>> > >>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. > >>> According to man page, no other operation should modify the fs after the freeze. > >>> So for me the modification is inside sync_filesystem (and sync_one_sb). > >> > >> Do you mean that we should add the trylock check in sync_filesystem? > >> > >> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). > >> > >> thanks, > >> liubo > >> > > > > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. > > > > I''m ok with it. > > What do you think about it, Jan? Any comments?Hum, so what I don''t exactly understand is why does btrfs start a transaction in ->sync_fs(). The idea of freeze code is that when filesystem is frozen, there is no data / metadata to write and thus we avoid deadlocks arising from trying to write anything with s_umount held. So checking whether filesystem is frozen and avoiding transaction start in that case in btrfs_sync_fs() will work but looks hacky. Rather the check should be for the thing which is the reason for transaction start-end pair in btrfs_sync_fs(). My naive guess would be we should check whether there is any transaction running... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html