Hi all, are there any obvious changes between 6.0-BETA3 and 6.0-RELEASE / 6.0- STABLE that I should be aware of, that could cause a quite noticeable decline in performance (and a change in performance patterns) for java/tomcat? On a BETA-3 system I'm seeing, with the particular application we're running, about 28 transactions/second over a 10 minute interval. With -RELEASE and -STABLE I'm lucky to reach 24, and it'll usually wobble around 20. Another oddity is that where the BETA-3 system starts out with good performance from the beginning when running load tests, the -RELEASE and -STABLE systems need a good 20 seconds to reach their "max", starting out very low (3-10 transactions/second for the first 10 seconds or so). This is on HP DL385 servers with dual 2.4ghz Opteron CPUs, running FreeBSD-amd64 from 15kRPM drives in cached RAID. Hardware and software configuration (apart from the base system), network configuration and latencies, database access, etc. is 100% equal on all systems. Any ideas? Thanks, /Eirik
On 11/26/05, Eirik ?verby <ltning@anduin.net> wrote: E?> [Cross-posting after lack of response on -stable] The first step would be do some performance debugging. - What do top/vmstat/systat say about what the OS and apps are doing? Is the CPU pegged at 100%? What's the load seen by the disks? Is the RAID in good health? - Any unusual messages in /var/log/messages? Any errors shown by the network interfaces (I'm assuming the application is using the network). - A brief description of the workload presented by the app would help. -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
On Nov 28, 2005, at 14:45 , Joseph Koshy wrote:> On 11/26/05, Eirik ?verby <ltning@anduin.net> wrote: > E?> [Cross-posting after lack of response on -stable] > > The first step would be do some performance debugging.Yep.> - What do top/vmstat/systat say about what the OS and > apps are doing? Is the CPU pegged at 100%? What's > the load seen by the disks? Is the RAID in good health?vmstat during system idle times are found below. I think they are rather interesting. To your other questions: The CPU usage is comparable on both systems. Not pegged at 100%, but load seems to stabilize around 0.5. Disk load is minimal on the application servers, somewhat more on the database servers, but they are not interesting here (they are not the bottle neck, and they perform equally). The RAIDs are in good health on both systems. The vmstat output is interesting. From the "fast" system (6.0-BETA3, ~idle): [root@app_host01] ~# vmstat -w 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 1 0 0 2439220 38048 14 0 0 0 14 0 0 0 170 141 437 0 0 100 0 0 0 2439220 38028 2 0 0 0 3 0 2 0 192 94 475 0 0 100 0 0 0 2439220 37916 1 0 0 0 6 0 1 0 291 925 926 5 0 94 0 0 0 2439220 37916 0 0 0 0 0 0 0 0 185 91 458 0 0 100 0 0 0 2439220 37820 1 0 0 0 6 0 3 0 289 1163 1124 6 0 94 0 0 0 2439220 37820 0 0 0 0 0 0 0 0 183 91 454 0 0 100 From the "slow" system (6.0-BETA3, ~idle): [root@app_host02] ~# vmstat -w 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 0 1 2468180 51660 15 0 0 0 18 4 0 0 1048 3200 5130 0 0 100 0 0 0 2468180 51660 1 0 0 0 0 0 0 0 1004 3068 5063 0 0 100 0 0 0 2468180 51660 0 0 0 0 0 0 0 0 1003 3094 5057 0 0 100 0 0 0 2468180 51660 0 0 0 0 0 0 1 0 1005 3068 5065 0 0 100 0 0 0 2468180 51656 1 0 0 0 0 0 0 0 1002 3090 5054 0 1 99 0 0 0 2468180 51656 0 0 0 0 0 0 0 0 1002 3064 5053 0 0 100 *loads* more context switches than on the BETA-3 system. I have not yet tried this during load; I have to wait for the testing window for that. But perhaps this helps? What do I look for next?> - Any unusual messages in /var/log/messages? Any errors > shown by the network interfaces (I'm assuming the > application is using the network).No errors shown that I can determine.> - A brief description of the workload presented by > the app would help.This is a web application (payment gateway) that receives a HTTP POST, does some processing, asks an external service for a piece of information, then returns the gathered information to the client. The call to the external service can be eliminated, but does not change the performance profile. How the application works internally is impossible for me to say; it's 3rd party. I can say, after asking them, that it is "moderately" threaded. Whatever "moderately" threaded. My interpretation is that the heaviest threading happens in tomcat itself, with up to 150 concurrent connection threads running. Thanks, /Eirik> > -- > FreeBSD Volunteer, http://people.freebsd.org/~jkoshy > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org" > >
On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:> E?> *loads* more context switches than on the BETA-3 system. > E?> I have not yet tried this during load > > - Which scheduler have you configured (BSD or ULE)?Running GENERIC/SMP kernels, with BSD scheduler. Speaking of which; is there a way to extract the kernel configuration from a running kernel or kernel binary?> - What do the interrupt statistics show? Any interrupt > storms? Please check the mailing lists for a prior > discussion on interrupt storms on some motherboards.Slow system: interrupt total rate irq1: atkbd0 4 0 irq14: ata0 46 0 irq24: ciss0 337166 1 irq28: bge0 8038794 35 cpu0: timer 446869052 1999 cpu1: timer 446861051 1999 Total 902106113 4037 Fast system: interrupt total rate irq1: atkbd0 6 0 irq14: ata0 46 0 irq24: ciss0 7465831 1 irq28: bge0 20764380 2 lapic0: timer 14827978729 2000 lapic1: timer 14827970729 2000 Total 29684179721 4003 No significant differences I'd say. Anything else I can do to dig deeper?> - Could you post the dmesg output from the systems (I > presume there aren't any significant differences).dmesg from slow system follows. I do not have a dmesg for the fast system; I cannot boot it now either. However, I have compared them before, and they are 100% equal. Seems to be very close in serial numbers, probably same production run. Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-STABLE #0: Sat Nov 26 01:52:00 CET 2005 root@build.unicore.no:/usr/obj/amd64/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 250 (2405.47-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f51 Stepping = 1 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> Features2=0x1<SSE3> AMD Features=0xe2500800<SYSCALL,NX,MMX+,<b25>,LM,3DNow+,3DNow> real memory = 1073717248 (1023 MB) avail memory = 1024946176 (977 MB) ACPI APIC Table: <HP 00000083> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard ioapic1 <Version 1.1> irqs 24-27 on motherboard ioapic2 <Version 1.1> irqs 28-31 on motherboard ioapic3 <Version 1.1> irqs 32-35 on motherboard ioapic4 <Version 1.1> irqs 36-39 on motherboard acpi0: <HP A05> on motherboard acpi0: Power Button (fixed) pci_link0: <ACPI PCI Link LNKA> irq 5 on acpi0 pci_link1: <ACPI PCI Link LNKB> irq 7 on acpi0 pci_link2: <ACPI PCI Link LNKC> irq 0 on acpi0 pci_link3: <ACPI PCI Link LNKD> irq 3 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci1: <ACPI PCI bus> on pcib1 ohci0: <OHCI (generic) USB controller> mem 0xf7df0000-0xf7df0fff irq 19 at device 0.0 on pci1 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xf7de0000-0xf7de0fff irq 19 at device 0.1 on pci1 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered pci1: <base peripheral> at device 2.0 (no driver attached) pci1: <base peripheral> at device 2.2 (no driver attached) pci1: <display, VGA> at device 3.0 (no driver attached) isab0: <PCI-ISA bridge> at device 4.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <AMD 8111 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at device 4.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pci0: <bridge> at device 4.3 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 7.0 on pci0 pci2: <ACPI PCI bus> on pcib2 ciss0: <HP Smart Array 6i> port 0x5000-0x50ff mem 0xf7ef0000-0xf7ef1fff,0xf7e80000-0xf7ebffff irq 24 at device 4.0 on pci2 ciss0: [GIANT-LOCKED] pci0: <base peripheral, interrupt controller> at device 7.1 (no driver attached) pcib3: <ACPI PCI-PCI bridge> at device 8.0 on pci0 pci3: <ACPI PCI bus> on pcib3 bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2100> mem 0xf7ff0000-0xf7ffffff irq 28 at device 6.0 on pci3 miibus0: <MII bus> on bge0 brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:13:21:b3:c1:f8 bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2100> mem 0xf7fe0000-0xf7feffff irq 29 at device 6.1 on pci3 miibus1: <MII bus> on bge1 brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:13:21:b3:c1:f7 pci0: <base peripheral, interrupt controller> at device 8.1 (no driver attached) pcib4: <ACPI Host-PCI bridge> on acpi0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 9.0 on pci4 pci5: <ACPI PCI bus> on pcib5 pci4: <base peripheral, interrupt controller> at device 9.1 (no driver attached) pcib6: <ACPI PCI-PCI bridge> at device 10.0 on pci4 pci6: <ACPI PCI bus> on pcib6 pci4: <base peripheral, interrupt controller> at device 10.1 (no driver attached) atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model Generic PS/2 mouse, device ID 0 sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff, 0xcc000-0xcd7ff,0xee000-0xeffff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM <COMPAQ CD-ROM SN-124/N104> at ata0-master PIO4 SMP: AP CPU #1 Launched! da0 at ciss0 bus 0 target 0 lun 0 da0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device da0: 135.168MB/s transfers da0: 34727MB (71122560 512 byte sectors: 255H 32S/T 8716C) Trying to mount root from ufs:/dev/da0s1a
Follow-up: I've now ran vmstat during load, which confirms the findings of vmstat during idle time. Slow system - one sample before and after load start included: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 3 0 0 2468572 45476 14 0 0 0 18 4 0 0 1049 3201 5132 0 0 100 0 0 1 2468572 42388 1 0 0 0 154 0 5 0 6852 19813 19970 22 8 70 1 0 0 2468572 39332 1 0 0 0 155 0 11 0 6823 19661 19886 23 7 71 2 0 0 2468432 36336 1 0 0 0 160 0 6 0 7031 20356 20534 19 7 74 0 0 0 2468432 33228 1 0 0 0 156 0 5 0 6685 19420 19613 20 7 73 2 0 0 2468432 29928 1 0 0 0 164 0 5 0 7105 20483 20673 21 7 71 1 0 0 2468432 53568 1 0 0 0 153 1308 5 0 6688 19278 19537 21 8 72 1 0 1 2468432 50580 2 0 0 0 150 0 6 0 6408 18430 18693 24 7 69 0 0 0 2468432 47748 2 0 0 0 143 0 6 0 6323 18098 18328 26 7 67 0 0 0 2468432 45056 1 0 0 0 136 0 5 0 5607 17122 17062 16 7 77 0 0 0 2468432 45040 0 0 0 0 0 0 0 0 1093 3172 5164 0 0 100 Fast system: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 0 0 2439276 39708 1 0 0 0 6 0 1 0 281 1029 992 6 1 93 0 0 0 2439276 39380 7 0 0 0 16 0 1 0 665 1341 1714 2 1 98 0 0 0 2439276 36472 5 0 0 0 145 0 6 0 5569 12409 14821 21 7 72 0 0 0 2439276 33512 1 0 0 0 149 0 5 0 5862 12597 15532 15 6 79 0 0 0 2439276 30600 1 0 0 0 146 0 4 0 5682 12655 15102 19 7 74 2 0 0 2439276 54144 1 0 0 5 152 1310 10 0 6006 12908 15964 17 6 77 0 0 0 2439276 51176 2 0 0 0 151 0 7 0 5348 11899 14190 22 6 72 2 0 0 2439276 48104 98 0 0 0 248 0 5 0 5924 12889 15757 15 7 78 1 0 0 2439276 45172 1 0 0 0 147 0 5 0 5882 12660 15624 16 7 77 2 0 0 2439276 42276 1 0 0 0 145 0 5 0 5558 12477 14864 21 6 73 0 0 0 2439276 39300 1 0 0 0 149 0 5 0 5842 12660 15556 14 7 79 0 0 0 2439276 36348 1 0 0 0 150 0 8 0 5659 12562 15042 21 5 74 0 0 0 2439276 33404 1 0 0 0 150 0 7 0 5868 12642 15536 14 6 80 0 0 0 2439276 30588 1 0 0 0 142 0 6 0 5449 11961 14487 19 7 74 0 0 0 2439276 30588 0 0 0 0 0 0 0 0 227 246 565 0 0 100 I'm tempted to upgrade the fast system to 6-STABLE (same rev as the slow one). Even the slow system performs "adequately", though it might help me isolate any potential hardware differences. /Eirik On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:> E?> *loads* more context switches than on the BETA-3 system. > E?> I have not yet tried this during load > > - Which scheduler have you configured (BSD or ULE)? > - What do the interrupt statistics show? Any interrupt > storms? Please check the mailing lists for a prior > discussion on interrupt storms on some motherboards. > - Could you post the dmesg output from the systems (I > presume there aren't any significant differences). > > Please CC -stable too. > > -- > FreeBSD Volunteer, http://people.freebsd.org/~jkoshy > >
Hi, I think I have found the culprit. There must be some sort of difference between the machines after all (BIOS revision?), because while on one machine the interrupt rate for the bge card stays very low (2 to be exact) during maximum load, the other machine goes beyond 1000 and keeps rising constantly. This might also explain why performance slowly degrades over time on that machine, and response times vary wildly, while the "fast" machine responds nicely within 1-2 seconds no matter the load and testing time. I will have to investigate this more closely. Is there a way to force the NIC to polling mode (I'm assuming that is the difference, an IRQ rate of 2 is too low for a heavily loaded server if the NIC is interrupt-driven)? Anything else I could look at? Also, the interrupt rates for the CPUs stay at 2000 sharp on the fast system, but fluctuates somewhat on the other. /Eirik On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:> E?> *loads* more context switches than on the BETA-3 system. > E?> I have not yet tried this during load > > - Which scheduler have you configured (BSD or ULE)? > - What do the interrupt statistics show? Any interrupt > storms? Please check the mailing lists for a prior > discussion on interrupt storms on some motherboards. > - Could you post the dmesg output from the systems (I > presume there aren't any significant differences). > > Please CC -stable too. > > -- > FreeBSD Volunteer, http://people.freebsd.org/~jkoshy > >
On Mon, 28 Nov 2005, Kris Kennaway wrote:> On Mon, Nov 28, 2005 at 10:53:00PM +0100, Eirik ?verby wrote: >> Firmware versions are equal. BIOS settings are equal. >> However, a diff of the dmesgs show (apart from MAC address differences): >> >> 30c30 >> < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 >> --- >>> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 >> >> What on earth is that all about? The "slow" box has the ACPI-fast >> timecounter... > > Could be ACPI bugs on your system:Yes, but the other system is 100% equal - hardware, bios config, bios and bootblock revision, controller bioses, etc. etc. It all matches. Should I complain to HP? /Eirik> >>> BIOS update. > > Kris >
On Dec 1, 2005, at 04:12 , Michael Vince wrote:> Some apps that use of frequent queries of the system time for > example MySQL are well known in FreeBSD to be slower then Linux > because its more expensive to call compared to Linux, maybe Tomcat > is also another such app this can also be double the case depending > on on your jsp and servlet code.True, but on equal hardware it should perform equally.> If you are on good hardware, are using 6 and keep your systems time > updated via ntp you might want to try changing from > kern.timecounter.hardware: ACPI-fast to TSC(-100) and doing a > benchmark this has already proven to increase performance of MySQL > by a significantly amount.I will try this, though it will not solve my original problem (and the subject is somewhat misleading now, as this seems to be independent of kernel revisions).> Also some new experimental low-precision time code has been added > to current source tree to see how much performance increases can be > gained, weirdly enough some people have argued against it for I > guess a wide range of reasons such as they just have crap hardware > and don't care about performance, don't like the extra maintenance > of code or just like Red Hat fanatics having an easy way to bad > mouth FreeBSD performance. I think most people would agree though > that it has to be done, or have to choose to believe FreeBSD isn't > about performance among other goals.I will not join this discussion ;)> With 6 you can also use the new thr threading library, try your > libmap.conf to libthr for testing, for example > [/usr/local/jdk1.4.2/] > libpthread.so.2 libthr.so.2 > libpthread.so libthr.so > > I been doing some 'ab' testing libthr with Apache2 compiled for > worker MPM and have some really interesting differences on server > load, loads of about 40 for pthread and around 5 thr under certain > tests with ab with the exact same test.Too bad this causes jdk1.5.0-amd64 to crash... Application startup times were significantly reduced, but only the times it actually managed to start without failing. Latest at the 2nd or 3rd transaction Java coredumps. :( And as current load testing is done without Apache in between, this is moot.. /Eirik> > Mike > > > Eirik ?verby wrote: > >> Update: The diff below was made after making sure both systems >> are running the exact same kernel. Behavior is the same. Building >> new kernels (6-STABLE) now to get out of the BETA stage. >> >> /Eirik >> >> On Nov 28, 2005, at 22:53 , Eirik ?verby wrote: >> >>> Firmware versions are equal. BIOS settings are equal. >>> However, a diff of the dmesgs show (apart from MAC address >>> differences): >>> >>> 30c30 >>> < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 >>> --- >>> > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 >>> >>> What on earth is that all about? The "slow" box has the ACPI- >>> fast timecounter... >>> >>> /Eirik >>> >>> On Nov 28, 2005, at 22:14 , Kris Kennaway wrote: >>> >>>> On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote: >>>> >>>>> Hi, >>>>> >>>>> I think I have found the culprit. There must be some sort of >>>>> difference between the machines after all (BIOS revision?), >>>>> because >>>>> while on one machine the interrupt rate for the bge card stays >>>>> very >>>>> low (2 to be exact) during maximum load, the other machine goes >>>>> beyond 1000 and keeps rising constantly. This might also >>>>> explain why >>>>> performance slowly degrades over time on that machine, and >>>>> response >>>>> times vary wildly, while the "fast" machine responds nicely within >>>>> 1-2 seconds no matter the load and testing time. >>>>> >>>>> I will have to investigate this more closely. Is there a way >>>>> to force >>>>> the NIC to polling mode (I'm assuming that is the difference, >>>>> an IRQ >>>>> rate of 2 is too low for a heavily loaded server if the NIC is >>>>> interrupt-driven)? >>>>> >>>>> Anything else I could look at? >>>> >>>> >>>> BIOS update. >>>> >>>> Kris >>> >>> >> > > >