Xin Li
2015-Mar-20 23:53 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 03/20/15 14:02, Mike Tancsa wrote:> OK, I think I found where the RELENG_10 performance loss happened. > It seems > https://lists.freebsd.org/pipermail/svn-src-stable-10/2015-March/004778.html > > is the issue. > > Testing with a kernel from r279796 I get 76-77Mb of throughput. > With r279848 it drops to about 60MbHrm, looking at 'diffgraph.svg', it suggest that writing through devfs have incurred a tax on writev(2), which is basically this codepath, if my reading is correct, in sys/fs/devfs/devfs.c: 1676 static int 1677 devfs_write_f(struct file *fp, struct uio *uio, struct ucred *cred, 1678 int flags, struct thread *td) 1679 { ... 1703 if (uio->uio_resid != resid || (error == 0 && resid !0)) { 1704 vfs_timestamp(&dev->si_ctime); 1705 dev->si_mtime = dev->si_ctime; 1706 } Further looking at the code, in devfs vfs_timestamp() is also called when reading from a device node. When the setting was 0, the code would return time_second directly instead of attempting to read the timestamp. For the purpose of devfs, does it make sense to bump timestamps like normal filesystems for each read/write operation? Looks like Mac OS X will bump timestamps for each operation but Debian don't. Cheers,> ---Mike > > > > On 3/20/2015 1:36 PM, Mike Tancsa wrote: >> OK, just to refocus, >> >> I had been tracking down what I thought was a regression between >> RELENG9 and RELENG10, but looks more like an issue that cropped >> up somewhere between the beginning of March and now. For >> RELENG9, I was actually using a kernel from sources back on Jan >> 29th by accident. If I bring RELENG9 upto today, I get a similar >> performance loss. >> >> Again, I am testing a simple VPN router setup >> >> server1 --- apu --- server2 >> >> where server1 connections to the apu via an OpenVPN tunnel and >> server1 sends packets via netblast across the tunnel to server2. >> >> I get the following throughput using netblast through the tunnel >> on 10 >> >> Using # netblast 1.1.2.2 500 1200 15 (server1 to server2) on >> 10.x >> >> Kernel Mb/s rev r277684 76.7563 r279978 59.3233 >> >> All good at r278533, r278534, r279467 >> >> But at r279978 its quite a bit slower. So somewhere between >> r279467 and r279978. I will keep trying to narrow it down... >> >> >> ---Mike >> >> >> >> >> >> On 3/19/2015 8:26 PM, Mike Tancsa wrote: >>> On 3/18/2015 5:14 PM, John-Mark Gurney wrote: >>>> # dtrace -x stackframes=100 -n 'profile-997 /arg0/ { >>>> @[stack()] = count(); } tick-60s { exit(0); }' -o >>>> out.kern_stacks >>>> >>>> Also, another thing you can do is to compare the two using >>>> differential flame graphs: >>>> http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html >>>> >>>> >>>> >>>> >>>> >>>>Which will highlight where the performances differ...>>> >>> OK, some more data points. It seems a performance regression >>> happened in RELENG_10 somewhere between r277684 (late January >>> 2015) and now. Using r277684 on RELENG_10, I can get about >>> 75Mb/s of throughput on OpenVPN. Still not as good as the >>> 83-85Mb on RELENG_9, but much better than the 61Mb using >>> RELENG_10 from the start of this week, >>> >>> For the differential graph, see >>> >>> http://tancsa.com/diffgraph.svg >>> >>> and >>> >>> http://tancsa.com/10-r277684.svg >>> http://tancsa.com/10-r277684-kern.svg >>> >>> ---Mike >>> >>> >>> >>> >> >> > >- -- Xin LI <delphij at delphij.net> https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.2 (FreeBSD) iQIcBAEBCgAGBQJVDLMDAAoJEJW2GBstM+nsmjgQAKGNdw70A2lfrVn8ngLympDM HeJKVCRtjaZselJpVnkyy8igBU6TX0Ae66R4IPNBpi9aLcGb/xscDgYKNh2dZc4H LjfyiXW62JwggXt+PCUjrLF3sbrlCI8ySTNLnILTs/CRFwdJXALFHWXDEtuMg+AT Mto3HxKlRZlEbWHaKbWmR4uysks6KNYJXSCnw/VtSFvlVUSwpyNz+soygrajZPKC PPOpjDqx2rvVxQwFaHF+ysTzKhvQSuJg+C5M9iKu4wgHd5QNuJ8IliAocVvPig9u gushcc2MxGPyzG24Telc3Vc5T+pbCexiP3hRtFY0/okADG4O47tuoJOY09gp28Tq 5X3eXCXkBGfCtriv6aPf//jAQ6duDrGTDmYcI+FGIDp3/FG71adNTqZ6fcqFAL/+ JHO39qpytOruVVjFyjEvQBslYEKWUZJBGZ82q6kuE+811TZxdMGVLh3PICxiJw0l SJcTR52UEYNp1kX+KhFXpz6cvV0Za5bJh/qHnC/1uzHvLYE7MDTinwDPb7LyK01G fg+Kls6YCAv6rh4mDzWJCg8aTaWmbtfn2+ETwx74if6Z8LjTB1VdmZ4y336YRY2f mFhV6eYAvvziIIMUvTQjAwyIcCtc65dvseOpgDdnWPFq/N+K4Jfg9Ast6mSHBYF2 FjvUzNg8OYvyfq5/TdPB =djPp -----END PGP SIGNATURE-----
Konstantin Belousov
2015-Mar-21 00:15 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On Fri, Mar 20, 2015 at 04:53:42PM -0700, Xin Li wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 03/20/15 14:02, Mike Tancsa wrote: > > OK, I think I found where the RELENG_10 performance loss happened. > > It seems > > https://lists.freebsd.org/pipermail/svn-src-stable-10/2015-March/004778.html > > > > is the issue. > > > > Testing with a kernel from r279796 I get 76-77Mb of throughput. > > With r279848 it drops to about 60Mb > > Hrm, looking at 'diffgraph.svg', it suggest that writing through devfs > have incurred a tax on writev(2), which is basically this codepath, if > my reading is correct, in sys/fs/devfs/devfs.c: > > 1676 static int > 1677 devfs_write_f(struct file *fp, struct uio *uio, struct ucred > *cred, > 1678 int flags, struct thread *td) > 1679 { > ... > 1703 if (uio->uio_resid != resid || (error == 0 && resid !> 0)) { > 1704 vfs_timestamp(&dev->si_ctime); > 1705 dev->si_mtime = dev->si_ctime; > 1706 } > > Further looking at the code, in devfs vfs_timestamp() is also called > when reading from a device node. When the setting was 0, the code > would return time_second directly instead of attempting to read the > timestamp. > > For the purpose of devfs, does it make sense to bump timestamps like > normal filesystems for each read/write operation? Looks like Mac OS X > will bump timestamps for each operation but Debian don't.First question is, what timecounter hardware is used. I would accept some slowdown from hardware like HPET, but it is indeed surprising if caused by TSC.