John Baldwin
2015-Mar-21 15:52 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On 3/20/15 8:46 PM, Mike Tancsa wrote:> On 3/20/2015 8:15 PM, Konstantin Belousov wrote: >>> >>> For the purpose of devfs, does it make sense to bump timestamps like >>> normal filesystems for each read/write operation? Looks like Mac OS X >>> will bump timestamps for each operation but Debian don't. >> >> First question is, what timecounter hardware is used. I would accept >> some slowdown from hardware like HPET, but it is indeed surprising >> if caused by TSC. >> >> > > David Wolfskill suggested trying the problem commit with > > vfs.timestamp_precision=0 > > and it does indeed restore performance to what it was. The raw dtrace > files are available and FlameGraphs can all be found at > > http://tancsa.com/time/Do you know why you are using the HPET instead of TSC for timestamping? Using the TSC can make a non-trivial performance difference since userland can calculate timestamps without using system calls when it is used. (That is not related to this case, but switching to the TSC in general is preferable.) There are a few generations of Intel CPUs where you can't mix deeper sleep states with the TSC as timecounter, but those CPUs are getting to be a bit older at this point. -- John Baldwin
Adrian Chadd
2015-Mar-21 16:31 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On 21 March 2015 at 08:52, John Baldwin <jhb at freebsd.org> wrote:> On 3/20/15 8:46 PM, Mike Tancsa wrote: >> On 3/20/2015 8:15 PM, Konstantin Belousov wrote: >>>> >>>> For the purpose of devfs, does it make sense to bump timestamps like >>>> normal filesystems for each read/write operation? Looks like Mac OS X >>>> will bump timestamps for each operation but Debian don't. >>> >>> First question is, what timecounter hardware is used. I would accept >>> some slowdown from hardware like HPET, but it is indeed surprising >>> if caused by TSC. >>> >>> >> >> David Wolfskill suggested trying the problem commit with >> >> vfs.timestamp_precision=0 >> >> and it does indeed restore performance to what it was. The raw dtrace >> files are available and FlameGraphs can all be found at >> >> http://tancsa.com/time/ > > Do you know why you are using the HPET instead of TSC for timestamping? > Using the TSC can make a non-trivial performance difference since userland > can calculate timestamps without using system calls when it is used. > (That is not related to this case, but switching to the TSC in general is > preferable.) > > There are a few generations of Intel CPUs where you can't mix deeper sleep > states with the TSC as timecounter, but those CPUs are getting to be a bit > older at this point.What about various VMs? -adrian
Mike Tancsa
2015-Mar-21 18:13 UTC
RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On 3/21/2015 11:52 AM, John Baldwin wrote:>> http://tancsa.com/time/ > > Do you know why you are using the HPET instead of TSC for timestamping?Hi, I am not consciously making any time keep decisions. kern.eventtimer.choice: HPET(550) HPET1(450) LAPIC(400) i8254(100) RTC(0) kern.timecounter.choice: TSC(800) HPET(950) ACPI-fast(900) i8254(0) dummy(-1000000) (The full hardware info is at the above url)> Using the TSC can make a non-trivial performance difference since userland > can calculate timestamps without using system calls when it is used. > (That is not related to this case, but switching to the TSC in general is > preferable.) > > There are a few generations of Intel CPUs where you can't mix deeper sleep > states with the TSC as timecounter, but those CPUs are getting to be a bit > older at this point. >This one is an AMD CPU: AMD G-T40E Processor (1000.02-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x500f20 Family=0x14 Model=0x2 Stepping=0 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802209<SSE3,MON,SSSE3,CX16,POPCNT> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x35ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,IBS,SKINIT,WDT> SVM: NP,NRIP,NAsids=8 TSC: P-state invariant, performance statistics real memory = 2115297280 (2017 MB) avail memory = 2018639872 (1925 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <CORE COREBOOT> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 <Version 2.1> irqs 0-23 on motherboard random: <Software, Yarrow> initialized module_register_init: MOD_LOAD (vesa, 0xffffffff80d9ddf0, 0) error 19 kbd0 at kbdmux0 acpi0: <CORE COREBOOT> on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike at sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/