Mike Tancsa
2015-Mar-20 21:02 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
OK, I think I found where the RELENG_10 performance loss happened. It seems https://lists.freebsd.org/pipermail/svn-src-stable-10/2015-March/004778.html is the issue. Testing with a kernel from r279796 I get 76-77Mb of throughput. With r279848 it drops to about 60Mb ---Mike On 3/20/2015 1:36 PM, Mike Tancsa wrote:> OK, just to refocus, > > I had been tracking down what I thought was a regression between RELENG9 > and RELENG10, but looks more like an issue that cropped up somewhere > between the beginning of March and now. For RELENG9, I was actually > using a kernel from sources back on Jan 29th by accident. If I bring > RELENG9 upto today, I get a similar performance loss. > > Again, I am testing a simple VPN router setup > > server1 --- apu --- server2 > > where server1 connections to the apu via an OpenVPN tunnel and server1 > sends packets via netblast across the tunnel to server2. > > I get the following throughput using netblast through the tunnel on 10 > > Using > # netblast 1.1.2.2 500 1200 15 (server1 to server2) on 10.x > > Kernel Mb/s > rev > r277684 76.7563 > r279978 59.3233 > > All good at r278533, r278534, r279467 > > But at r279978 its quite a bit slower. So somewhere between r279467 and > r279978. I will keep trying to narrow it down... > > > ---Mike > > > > > > On 3/19/2015 8:26 PM, Mike Tancsa wrote: >> On 3/18/2015 5:14 PM, John-Mark Gurney wrote: >>> # dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] >>> count(); } tick-60s { exit(0); }' -o out.kern_stacks >>> >>> Also, another thing you can do is to compare the two using differential >>> flame graphs: >>> http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html >>> >>> >>> >>> Which will highlight where the performances differ... >> >> OK, some more data points. It seems a performance regression happened >> in RELENG_10 somewhere between r277684 (late January 2015) and now. >> Using r277684 on RELENG_10, I can get about 75Mb/s of throughput on >> OpenVPN. Still not as good as the 83-85Mb on RELENG_9, but much better >> than the 61Mb using RELENG_10 from the start of this week, >> >> For the differential graph, see >> >> http://tancsa.com/diffgraph.svg >> >> and >> >> http://tancsa.com/10-r277684.svg >> http://tancsa.com/10-r277684-kern.svg >> >> ---Mike >> >> >> >> > >-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
Miroslav Lachman
2015-Mar-20 23:06 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
Mike Tancsa wrote on 03/20/2015 22:02:> OK, I think I found where the RELENG_10 performance loss happened. It seems > https://lists.freebsd.org/pipermail/svn-src-stable-10/2015-March/004778.html > > is the issue. > > Testing with a kernel from r279796 I get 76-77Mb of throughput. With > r279848 it drops to about 60MbI am surprised by this huge performance drop. If it is really caused by this change (VFS timestamp precision from seconds to microseconds), wasn't this change tested before commit? Miroslav Lachman
Xin Li
2015-Mar-20 23:53 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 03/20/15 14:02, Mike Tancsa wrote:> OK, I think I found where the RELENG_10 performance loss happened. > It seems > https://lists.freebsd.org/pipermail/svn-src-stable-10/2015-March/004778.html > > is the issue. > > Testing with a kernel from r279796 I get 76-77Mb of throughput. > With r279848 it drops to about 60MbHrm, looking at 'diffgraph.svg', it suggest that writing through devfs have incurred a tax on writev(2), which is basically this codepath, if my reading is correct, in sys/fs/devfs/devfs.c: 1676 static int 1677 devfs_write_f(struct file *fp, struct uio *uio, struct ucred *cred, 1678 int flags, struct thread *td) 1679 { ... 1703 if (uio->uio_resid != resid || (error == 0 && resid !0)) { 1704 vfs_timestamp(&dev->si_ctime); 1705 dev->si_mtime = dev->si_ctime; 1706 } Further looking at the code, in devfs vfs_timestamp() is also called when reading from a device node. When the setting was 0, the code would return time_second directly instead of attempting to read the timestamp. For the purpose of devfs, does it make sense to bump timestamps like normal filesystems for each read/write operation? Looks like Mac OS X will bump timestamps for each operation but Debian don't. Cheers,> ---Mike > > > > On 3/20/2015 1:36 PM, Mike Tancsa wrote: >> OK, just to refocus, >> >> I had been tracking down what I thought was a regression between >> RELENG9 and RELENG10, but looks more like an issue that cropped >> up somewhere between the beginning of March and now. For >> RELENG9, I was actually using a kernel from sources back on Jan >> 29th by accident. If I bring RELENG9 upto today, I get a similar >> performance loss. >> >> Again, I am testing a simple VPN router setup >> >> server1 --- apu --- server2 >> >> where server1 connections to the apu via an OpenVPN tunnel and >> server1 sends packets via netblast across the tunnel to server2. >> >> I get the following throughput using netblast through the tunnel >> on 10 >> >> Using # netblast 1.1.2.2 500 1200 15 (server1 to server2) on >> 10.x >> >> Kernel Mb/s rev r277684 76.7563 r279978 59.3233 >> >> All good at r278533, r278534, r279467 >> >> But at r279978 its quite a bit slower. So somewhere between >> r279467 and r279978. I will keep trying to narrow it down... >> >> >> ---Mike >> >> >> >> >> >> On 3/19/2015 8:26 PM, Mike Tancsa wrote: >>> On 3/18/2015 5:14 PM, John-Mark Gurney wrote: >>>> # dtrace -x stackframes=100 -n 'profile-997 /arg0/ { >>>> @[stack()] = count(); } tick-60s { exit(0); }' -o >>>> out.kern_stacks >>>> >>>> Also, another thing you can do is to compare the two using >>>> differential flame graphs: >>>> http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html >>>> >>>> >>>> >>>> >>>> >>>>Which will highlight where the performances differ...>>> >>> OK, some more data points. It seems a performance regression >>> happened in RELENG_10 somewhere between r277684 (late January >>> 2015) and now. Using r277684 on RELENG_10, I can get about >>> 75Mb/s of throughput on OpenVPN. Still not as good as the >>> 83-85Mb on RELENG_9, but much better than the 61Mb using >>> RELENG_10 from the start of this week, >>> >>> For the differential graph, see >>> >>> http://tancsa.com/diffgraph.svg >>> >>> and >>> >>> http://tancsa.com/10-r277684.svg >>> http://tancsa.com/10-r277684-kern.svg >>> >>> ---Mike >>> >>> >>> >>> >> >> > >- -- Xin LI <delphij at delphij.net> https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.2 (FreeBSD) iQIcBAEBCgAGBQJVDLMDAAoJEJW2GBstM+nsmjgQAKGNdw70A2lfrVn8ngLympDM HeJKVCRtjaZselJpVnkyy8igBU6TX0Ae66R4IPNBpi9aLcGb/xscDgYKNh2dZc4H LjfyiXW62JwggXt+PCUjrLF3sbrlCI8ySTNLnILTs/CRFwdJXALFHWXDEtuMg+AT Mto3HxKlRZlEbWHaKbWmR4uysks6KNYJXSCnw/VtSFvlVUSwpyNz+soygrajZPKC PPOpjDqx2rvVxQwFaHF+ysTzKhvQSuJg+C5M9iKu4wgHd5QNuJ8IliAocVvPig9u gushcc2MxGPyzG24Telc3Vc5T+pbCexiP3hRtFY0/okADG4O47tuoJOY09gp28Tq 5X3eXCXkBGfCtriv6aPf//jAQ6duDrGTDmYcI+FGIDp3/FG71adNTqZ6fcqFAL/+ JHO39qpytOruVVjFyjEvQBslYEKWUZJBGZ82q6kuE+811TZxdMGVLh3PICxiJw0l SJcTR52UEYNp1kX+KhFXpz6cvV0Za5bJh/qHnC/1uzHvLYE7MDTinwDPb7LyK01G fg+Kls6YCAv6rh4mDzWJCg8aTaWmbtfn2+ETwx74if6Z8LjTB1VdmZ4y336YRY2f mFhV6eYAvvziIIMUvTQjAwyIcCtc65dvseOpgDdnWPFq/N+K4Jfg9Ast6mSHBYF2 FjvUzNg8OYvyfq5/TdPB =djPp -----END PGP SIGNATURE-----