On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston
wrote:> Hello,
>
> I'm trying to investigate and solve some postgres latency spikes that
> I'm seeing as a result of some behaviour in the syncer. This is with
> FreeBSD 8.2 (with some local modifications and backports, r231160 in
> particular). The system has an LSI 9261-8i RAID controller (backed by
> mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and
> a RAID 1 respectively. It has about 96GB of RAM installed.
>
> What's happening is that the syncer tries to fsync a large database
file
> and goes to sleep in getpbuf() with the corresponding vnode lock held
> and the following stack:
>
> #3 0xffffffff805fceb5 in _sleep (ident=0xffffffff80ca8e20,
lock=0xffffffff80d6bc20, priority=-2134554464, wmesg=0xffffffff80a4fe43
"wswbuf0", timo=0) at /d2/usr/src/sys/kern/kern_synch.c:234
> #4 0xffffffff808780d5 in getpbuf (pfreecnt=0xffffffff80ca8e20) at
/d2/usr/src/sys/vm/vm_pager.c:339
> #5 0xffffffff80677a00 in cluster_wbuild (vp=0xffffff02ea3d7ce8,
size=16384, start_lbn=20869, len=2) at /d2/usr/src/sys/kern/vfs_cluster.c:801
> #6 0xffffffff808477ed in ffs_syncvnode (vp=0xffffff02ea3d7ce8,
waitfor=Variable "waitfor" is not available.) at
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
> #7 0xffffffff808488cf in ffs_fsync (ap=0xffffff9b0cd27b00) at
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:190
> #8 0xffffffff8096798a in VOP_FSYNC_APV (vop=0xffffffff80ca5300,
a=0xffffff9b0cd27b00) at vnode_if.c:1267
> #9 0xffffffff8068bade in sync_vnode (slp=0xffffff002ab8e758,
bo=0xffffff9b0cd27bc0, td=0xffffff002ac89460) at vnode_if.h:549
> #10 0xffffffff8068bdcd in sched_sync () at
/d2/usr/src/sys/kern/vfs_subr.c:1841
>
> (kgdb) frame 6
> #6 0xffffffff808477ed in ffs_syncvnode (vp=0xffffff02ea3d7ce8,
waitfor=Variable "waitfor" is not available.) at
/d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
> 306 vfs_bio_awrite(bp);
> (kgdb) vpath vp
> 0xffffff02ea3d7ce8: 18381
> 0xffffff02eab1dce8: 16384
> 0xffffff02eaaf0588: base
> 0xffffff01616d8b10: data
> 0xffffff01616d8ce8: pgsql
> 0xffffff002af9f588: <mount point>
> 0xffffff002af853b0: d3
> 0xffffff002abd6b10: /
> (kgdb)
>
> During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
> happening up to 8 or 10 times a second. When this happens, a bunch of
> postgres threads become blocked in vn_write() waiting for the vnode lock
> to become free. It looks like the write-clustering code is limited to
> using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
> to anything greater than 256.
Syncer is probably just a victim of profiling. Would postgres called
fsync(2), you then blame the fsync code for the pauses.
Just add a tunable to allow the user to manually-tune the nswbuf,
regardless of the buffer cache sizing. And yes, nswbuf default max
probably should be bumped to something like 1024, at least on 64bit
architectures which do not starve for kernel memory.
>
> Since the sleeps are happening in the write-clustering code, I tried
> disabling write clustering on the mountpoint (with the noclusterw mount
> option) and found that this replaces my problem with another one: the
> syncer periodically generates large bursts of writes that create a
> backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to
> return, causing more or less the same end result.
Well, this is exactly what clustering code is for, merge a lot of
short i/o requests into the bigger i/o ?
>
> Does anyone have any suggestions on what I can do to help reduce the
> impact of the syncer on my systems? I can't just move to a newer
version
> of FreeBSD, but I'm willing to backport changes if anyone can point me
> to something that might help.
As a side note, syncer code for UFS was redone in HEAD and 9 to only
iterate over the active vnodes. This is mostly irrelevant to your
problems which happen when syncer does perform the i/o, but still it
saves a lot of CPU.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130717/2a9ebb5c/attachment.sig>