Pawel Jakub Dawidek
2009-Feb-15  08:23 UTC
[zfs-code] 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage()
On Wed, Jan 28, 2009 at 10:06:05AM -0800, Mark.Maybee at Sun.COM wrote:> Author: Mark Maybee <Mark.Maybee at Sun.COM> > Repository: /hg/onnv/onnv-gate > Latest revision: 7e4ce9158df3e94022ea0f7bffe7df5a4e23b04f > Total changesets: 1 > Log message: > 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() > 6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface > 6702206 ZFS read/writer lock contention throttles sendfile() benchmark > 6780491 Zone on a ZFS filesystem has poor fork/exec performance > 6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); > > Files: > update: usr/src/uts/common/fs/zfs/arc.c > update: usr/src/uts/common/fs/zfs/sys/zfs_znode.h > update: usr/src/uts/common/fs/zfs/zfs_rlock.c > update: usr/src/uts/common/fs/zfs/zfs_vnops.c > update: usr/src/uts/common/fs/zfs/zfs_znode.cI think after this commit, the comment above update_pages() is no longer true: * On Write: If we find a memory mapped page, we write to *both* * the page and the dmu buffer. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20090215/f4ed3170/attachment.bin>
Jürgen Keil
2009-Feb-17  11:22 UTC
[zfs-code] 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage()
It seems there is a bug introduced by the putback for
author:  	Mark Maybee <Mark.Maybee at Sun.COM>
date: 	Wed Jan 28 11:04:37 2009 -0700 (2 weeks ago)
files: 	usr/src/uts/common/fs/zfs/arc.c
usr/src/uts/common/fs/zfs/sys/zfs_znode.h
usr/src/uts/common/fs/zfs/zfs_rlock.c usr/src/uts/common/fs/zfs/zfs_vnops.c
usr/src/uts/common/fs/zfs/zfs_znode.c
description:
6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage()
6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface
6702206 ZFS read/writer lock contention throttles sendfile() benchmark
6780491 Zone on a ZFS filesystem has poor fork/exec performance
6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig),
BP_IDENTITY(zio->io_bp)));
zfs_vnops, zfs_putpage()
  3695		/*
  3696		 * Align this request to the file block size in case we kluster.
  3697		 * XXX - this can result in pretty aggresive locking, which can
  3698		 * impact simultanious read/write access.  One option might be
  3699		 * to break up long requests (len == 0) into block-by-block
  3700		 * operations to get narrower locking.
  3701		 */
  3702		blksz = zp->z_blksz;
  3703		if (ISP2(blksz))
  3704			io_off = P2ALIGN_TYPED(off, blksz, u_offset_t);
  3705		else
  3706			io_off = 0;
  3707		if (len > 0 && ISP2(blksz))
  3708			io_len = P2ROUNDUP_TYPED(len + (io_off - off), blksz, size_t);
  3709		else
  3710			io_len = 0;
  3711
  3712		if (io_len == 0) {
  3713			/*
  3714			 * Search the entire vp list for pages >= io_off.
  3715			 */
  3716			rl = zfs_range_lock(zp, io_off, UINT64_MAX, RL_WRITER);
  3717			error = pvn_vplist_dirty(vp, io_off, zfs_putapage, flags, cr);
  3718			goto out;
  3719		}
  3720		rl = zfs_range_lock(zp, io_off, io_len, RL_WRITER);
Line 3708:
"len + (io_off - off)" looks wrong, this should be
"len + (off - io_off)".	 The P2ALIGN_TYPED() macro at line 3704
should round down "off", i.e. io_off <= off.
Test case:
/files2/media/osol-0906-106a-global-x86.iso is a file on a zfs filesystem
# mount -F hsfs /files2/media/osol-0906-106a-global-x86.iso /mnt
# time mkisofs -r -o /dev/null /mnt
<<< very slow >>>
1.22u 5.05s 59:37.46 0.1%
On snv_104 the same test completes in 21 seconds.
It has become 180x slower...
diff --git a/usr/src/uts/common/fs/zfs/zfs_vnops.c
b/usr/src/uts/common/fs/zfs/zfs_vnops.c
--- a/usr/src/uts/common/fs/zfs/zfs_vnops.c
+++ b/usr/src/uts/common/fs/zfs/zfs_vnops.c
@@ -3705,7 +3705,7 @@
	else
            	io_off = 0;
	if (len > 0 && ISP2(blksz))
-		io_len = P2ROUNDUP_TYPED(len + (io_off - off), blksz, size_t);
+		io_len = P2ROUNDUP_TYPED(len + (off - io_off), blksz, size_t);
	else
		io_len = 0;
-- 
This message posted from opensolaris.org
Mark Maybee
2009-Feb-17  21:58 UTC
[zfs-code] 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage()
Gack! Absolutely correct J?rgen. I have filed 6806627 to track this. -Mark J?rgen Keil wrote:> It seems there is a bug introduced by the putback for > > author: Mark Maybee <Mark.Maybee at Sun.COM> > date: Wed Jan 28 11:04:37 2009 -0700 (2 weeks ago) > files: usr/src/uts/common/fs/zfs/arc.c usr/src/uts/common/fs/zfs/sys/zfs_znode.h > usr/src/uts/common/fs/zfs/zfs_rlock.c usr/src/uts/common/fs/zfs/zfs_vnops.c > usr/src/uts/common/fs/zfs/zfs_znode.c > description: > 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() > 6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface > 6702206 ZFS read/writer lock contention throttles sendfile() benchmark > 6780491 Zone on a ZFS filesystem has poor fork/exec performance > 6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); > > zfs_vnops, zfs_putpage() > > > 3695 /* > 3696 * Align this request to the file block size in case we kluster. > 3697 * XXX - this can result in pretty aggresive locking, which can > 3698 * impact simultanious read/write access. One option might be > 3699 * to break up long requests (len == 0) into block-by-block > 3700 * operations to get narrower locking. > 3701 */ > 3702 blksz = zp->z_blksz; > 3703 if (ISP2(blksz)) > 3704 io_off = P2ALIGN_TYPED(off, blksz, u_offset_t); > 3705 else > 3706 io_off = 0; > 3707 if (len > 0 && ISP2(blksz)) > 3708 io_len = P2ROUNDUP_TYPED(len + (io_off - off), blksz, size_t); > 3709 else > 3710 io_len = 0; > 3711 > 3712 if (io_len == 0) { > 3713 /* > 3714 * Search the entire vp list for pages >= io_off. > 3715 */ > 3716 rl = zfs_range_lock(zp, io_off, UINT64_MAX, RL_WRITER); > 3717 error = pvn_vplist_dirty(vp, io_off, zfs_putapage, flags, cr); > 3718 goto out; > 3719 } > 3720 rl = zfs_range_lock(zp, io_off, io_len, RL_WRITER); > > > Line 3708: > "len + (io_off - off)" looks wrong, this should be > "len + (off - io_off)". The P2ALIGN_TYPED() macro at line 3704 > should round down "off", i.e. io_off <= off. > > > Test case: > > /files2/media/osol-0906-106a-global-x86.iso is a file on a zfs filesystem > > # mount -F hsfs /files2/media/osol-0906-106a-global-x86.iso /mnt > # time mkisofs -r -o /dev/null /mnt > <<< very slow >>> > 1.22u 5.05s 59:37.46 0.1% > > On snv_104 the same test completes in 21 seconds. > > It has become 180x slower... > > > diff --git a/usr/src/uts/common/fs/zfs/zfs_vnops.c b/usr/src/uts/common/fs/zfs/zfs_vnops.c > --- a/usr/src/uts/common/fs/zfs/zfs_vnops.c > +++ b/usr/src/uts/common/fs/zfs/zfs_vnops.c > @@ -3705,7 +3705,7 @@ > else > io_off = 0; > if (len > 0 && ISP2(blksz)) > - io_len = P2ROUNDUP_TYPED(len + (io_off - off), blksz, size_t); > + io_len = P2ROUNDUP_TYPED(len + (off - io_off), blksz, size_t); > else > io_len = 0;