(I realize my last similar topic turned out to be a false alarm, so I hope I am not "crying wolf" again.) I''m still trying to understand some anomalous measurements in tmem, particularly instances where the maximum cycle count greatly exceeds the average. It seems that tmem''s compression code is a magnet for interruptions. This inspired me to create a more controlled magnet, an "interrupt honeypot". To do this, at every tmem call, I run a loop which does nothing but repetitively read tsc and check the difference between successive reads. On both of my test machines, the measurement is normally well under 100 cycles. Infrequently, I get a "large" measurement which, since xen is non-preemptive, indicates a lengthy interrupt (or possibly that the tsc is getting moved forward). My code uses per_cpu to ensure that there aren''t any measurement/recording races (which were the issue with my previous 10M "problem"). The result: On my quad-core-by-two-thread machine, I frequently get "large" measurements over 250000 cycles, with the max just over 1 million (and actually just over 2^20). Frequency averages about one every 1-2 seconds, but the measurement methodology makes it impossible to determine the true frequency or spacing. The vast majority of the "large" samples are reported on cpu#0 but a handful are reported on other cpus. This might also be methodology-related but the load is running on 4 vcpus. On the same machine, when I run with nosmp, I see no large measurements. And when I run the load with 1 vcpu, I see a lower frequency (about one every ten seconds), but again this could be due to the measurement methodology. On my dual-core (no SMT) test machine, I see only a couple of large measurements, 120438 cycles on cpu#0 and 120528 on cpu#1. The same load is being run, though limited to 2 vcpus. Is a million cycles in an interrupt handler bad? Any idea what might be consuming this? The evidence might imply more cpus means longer interrupt, which bodes poorly for larger machines. I tried disabling the timer rendezvous code (not positive I was successful), but still got large measurements, and eventually the machine froze up (but not before I observed the stime skew climbing quickly to the millisecond-plus range). Is there a way to cap the number of physical cpus seen by Xen (other than nosmp to cap at one)? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/04/2009 21:16, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Is a million cycles in an interrupt handler bad? Any idea what > might be consuming this? The evidence might imply more cpus > means longer interrupt, which bodes poorly for larger machines. > I tried disabling the timer rendezvous code (not positive I > was successful), but still got large measurements, and > eventually the machine froze up (but not before I observed > the stime skew climbing quickly to the millisecond-plus > range).You can instrument irq_enter() and irq_exit() to read TSC and find out the distribution of irq handling times for interruptions that Xen knows about. Otherwise who knows, it could even be system management mode on that particular box. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>Is there a way to cap the number of physical cpus seen by Xen >>(other than nosmp to cap at one)?There is a compile time option (max_phys_cpus) where you can specify the number of LCPUs. Aravindh Puthiyaparambil Virtualization Engineer Virtual Systems Development Unisys, Tredyffrin PA _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Aravindh! Hmmm... I''m suspicious of this. When compiled with max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1, cores_per_socket=1 and threads_per_core=1. Without this compile option, cores_per_socket=4 and threads_per_core=2... so I would expect only one of these to change. Hopefully the compile option is simply changing the data reported by xm info. It appears the guest still is using 4 physical processors.> -----Original Message----- > From: Puthiyaparambil, Aravindh > [mailto:aravindh.puthiyaparambil@unisys.com] > Sent: Monday, April 13, 2009 9:31 AM > To: Dan Magenheimer; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >>Is there a way to cap the number of physical cpus seen by Xen > >>(other than nosmp to cap at one)? > > There is a compile time option (max_phys_cpus) where you can > specify the number of LCPUs. > > Aravindh Puthiyaparambil > Virtualization Engineer > Virtual Systems Development > Unisys, Tredyffrin PA >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> You can instrument irq_enter() and irq_exit() to read TSCRather than do this generically and ensure I get all the macros correct (e.g. per_cpu, nesting) I manually instrumented three likely suspect irq_enter/exit pairs, two in do_IRQ() and one in smp_call_function(). ALL of them show an issue with max readings in the 300K-1M range... with smp_call_function showing the lowest max and the second in do_IRQ (the non-guest one) showing readings over 1M (and the guest one at about 800K). Interestingly, I get no readings at all over 60K when I recompile with max_phys_cpus=4 (and with nosmp) on my quad-core-by-two-thread machine. This is versus several readings over 60K nearly every second when max_phys_cpus=8.> Otherwise who knows, it could even be system management modeI suppose measuring irq_enter/exist pairs still don''t rule this out. But the "large" interrupts don''t seem to happen (at least not nearly as frequently) with fewer physical processors enabled, so sys mgmt mode seems unlikely. Anyway, still a probable problem, still mostly a mystery as to what is actually happening. And, repeat, this has nothing to do with tmem... I''m just observing it using tmem as a convenient measurement tool.> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Monday, April 13, 2009 2:24 AM > To: Dan Magenheimer; Xen-Devel (E-mail) > Subject: Re: [Xen-devel] million cycle interrupt > > > On 12/04/2009 21:16, "Dan Magenheimer" > <dan.magenheimer@oracle.com> wrote: > > > Is a million cycles in an interrupt handler bad? Any idea what > > might be consuming this? The evidence might imply more cpus > > means longer interrupt, which bodes poorly for larger machines. > > I tried disabling the timer rendezvous code (not positive I > > was successful), but still got large measurements, and > > eventually the machine froze up (but not before I observed > > the stime skew climbing quickly to the millisecond-plus > > range). > > You can instrument irq_enter() and irq_exit() to read TSC and > find out the > distribution of irq handling times for interruptions that Xen > knows about. > Otherwise who knows, it could even be system management mode on that > particular box. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I have only tried increasing the max_phys_cpus from 32 to 64, 96 etc. I would think that decreasing it to 4 means that only the first 4 LCPUS were booted. The rest should have been ignored. -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] Sent: Monday, April 13, 2009 4:24 PM To: Puthiyaparambil, Aravindh; Xen-Devel (E-mail) Subject: RE: [Xen-devel] million cycle interrupt Thanks Aravindh! Hmmm... I''m suspicious of this. When compiled with max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1, cores_per_socket=1 and threads_per_core=1. Without this compile option, cores_per_socket=4 and threads_per_core=2... so I would expect only one of these to change. Hopefully the compile option is simply changing the data reported by xm info. It appears the guest still is using 4 physical processors.> -----Original Message----- > From: Puthiyaparambil, Aravindh > [mailto:aravindh.puthiyaparambil@unisys.com] > Sent: Monday, April 13, 2009 9:31 AM > To: Dan Magenheimer; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >>Is there a way to cap the number of physical cpus seen by Xen > >>(other than nosmp to cap at one)? > > There is a compile time option (max_phys_cpus) where you can > specify the number of LCPUs. > > Aravindh Puthiyaparambil > Virtualization Engineer > Virtual Systems Development > Unisys, Tredyffrin PA >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
use "maxcpus=" in your grub.conf. Thanks Kevin>From: Dan Magenheimer >Sent: 2009年4月14日 4:24 > >Thanks Aravindh! > >Hmmm... I'm suspicious of this. When compiled with >max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1, >cores_per_socket=1 and threads_per_core=1. Without >this compile option, cores_per_socket=4 and >threads_per_core=2... so I would expect only one >of these to change. Hopefully the compile option >is simply changing the data reported by xm info. >It appears the guest still is using 4 physical processors. > >> -----Original Message----- >> From: Puthiyaparambil, Aravindh >> [mailto:aravindh.puthiyaparambil@unisys.com] >> Sent: Monday, April 13, 2009 9:31 AM >> To: Dan Magenheimer; Xen-Devel (E-mail) >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >>Is there a way to cap the number of physical cpus seen by Xen >> >>(other than nosmp to cap at one)? >> >> There is a compile time option (max_phys_cpus) where you can >> specify the number of LCPUs. >> >> Aravindh Puthiyaparambil >> Virtualization Engineer >> Virtual Systems Development >> Unisys, Tredyffrin PA >> > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer >Sent: 2009年4月14日 5:15 > >> You can instrument irq_enter() and irq_exit() to read TSC > >Rather than do this generically and ensure I get all the macros >correct (e.g. per_cpu, nesting) I manually instrumented three >likely suspect irq_enter/exit pairs, two in do_IRQ() and one >in smp_call_function(). ALL of them show an issue with max >readings in the 300K-1M range... with smp_call_function showing >the lowest max and the second in do_IRQ (the non-guest one) >showing readings over 1M (and the guest one at about 800K).Since you already reach this step around calling actual action's handler, why not take one more step to measure every handler (serial, apic, vtd, ...)? You can first simply print which handlers are registered or invoked on your platform. If only one handler is experienced with abnormal high latency, it's possibly one specific point. Or else you can suspect on some common code shared by all handlers, or ... as Keir said, it could be SMM. :-) Thanks, Kevin> >Interestingly, I get no readings at all over 60K when I >recompile with max_phys_cpus=4 (and with nosmp) on my >quad-core-by-two-thread machine. This is versus several >readings over 60K nearly every second when max_phys_cpus=8. > >> Otherwise who knows, it could even be system management mode > >I suppose measuring irq_enter/exist pairs still don't rule >this out. But the "large" interrupts don't seem to happen >(at least not nearly as frequently) with fewer physical >processors enabled, so sys mgmt mode seems unlikely. > >Anyway, still a probable problem, still mostly a mystery >as to what is actually happening. And, repeat, this has >nothing to do with tmem... I'm just observing it using >tmem as a convenient measurement tool. > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Monday, April 13, 2009 2:24 AM >> To: Dan Magenheimer; Xen-Devel (E-mail) >> Subject: Re: [Xen-devel] million cycle interrupt >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" >> <dan.magenheimer@oracle.com> wrote: >> >> > Is a million cycles in an interrupt handler bad? Any idea what >> > might be consuming this? The evidence might imply more cpus >> > means longer interrupt, which bodes poorly for larger machines. >> > I tried disabling the timer rendezvous code (not positive I >> > was successful), but still got large measurements, and >> > eventually the machine froze up (but not before I observed >> > the stime skew climbing quickly to the millisecond-plus >> > range). >> >> You can instrument irq_enter() and irq_exit() to read TSC and >> find out the >> distribution of irq handling times for interruptions that Xen >> knows about. >> Otherwise who knows, it could even be system management mode on that >> particular box. >> >> -- Keir >> >> >> > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> handler, why not take one more step to measure every handlerGreat idea! I added a max_cycles field to irq_desc_t and check/update it at every interrupt in do_IRQ, then print max_cycles in "xm debug-key i", including a "max max". I''m not entirely sure how to interpret the output from dump_irqs() but the only two IRQ''s with type==PCI-MSI have a "large" max (450044 cycles and 730972 cycles). The third is Vec240, an IO-APIC-edge interrupt that maxes at 1047500 cycles. No other interrupt has a max exceeding 10000. Here''s the relevant output. How do I map this to something meaningful? (XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 max_cycles=450044 in-flight=0 domain-list=0:254(----), (XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 max_cycles=730972 in-flight=0 domain-list=0:255(----), (XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 max_cycles=1047500 mapped, unbound (XEN) max_max_cycles = 1047500> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Monday, April 13, 2009 5:19 PM > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >From: Dan Magenheimer > >Sent: 2009年4月14日 5:15 > > > >> You can instrument irq_enter() and irq_exit() to read TSC > > > >Rather than do this generically and ensure I get all the macros > >correct (e.g. per_cpu, nesting) I manually instrumented three > >likely suspect irq_enter/exit pairs, two in do_IRQ() and one > >in smp_call_function(). ALL of them show an issue with max > >readings in the 300K-1M range... with smp_call_function showing > >the lowest max and the second in do_IRQ (the non-guest one) > >showing readings over 1M (and the guest one at about 800K). > > Since you already reach this step around calling actual action''s > handler, why not take one more step to measure every handler > (serial, apic, vtd, ...)? You can first simply print which handlers > are registered or invoked on your platform. If only one handler > is experienced with abnormal high latency, it''s possibly one > specific point. Or else you can suspect on some common code > shared by all handlers, or ... as Keir said, it could be SMM. :-) > > Thanks, > Kevin > > > > >Interestingly, I get no readings at all over 60K when I > >recompile with max_phys_cpus=4 (and with nosmp) on my > >quad-core-by-two-thread machine. This is versus several > >readings over 60K nearly every second when max_phys_cpus=8. > > > >> Otherwise who knows, it could even be system management mode > > > >I suppose measuring irq_enter/exist pairs still don''t rule > >this out. But the "large" interrupts don''t seem to happen > >(at least not nearly as frequently) with fewer physical > >processors enabled, so sys mgmt mode seems unlikely. > > > >Anyway, still a probable problem, still mostly a mystery > >as to what is actually happening. And, repeat, this has > >nothing to do with tmem... I''m just observing it using > >tmem as a convenient measurement tool. > > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> Sent: Monday, April 13, 2009 2:24 AM > >> To: Dan Magenheimer; Xen-Devel (E-mail) > >> Subject: Re: [Xen-devel] million cycle interrupt > >> > >> > >> On 12/04/2009 21:16, "Dan Magenheimer" > >> <dan.magenheimer@oracle.com> wrote: > >> > >> > Is a million cycles in an interrupt handler bad? Any idea what > >> > might be consuming this? The evidence might imply more cpus > >> > means longer interrupt, which bodes poorly for larger machines. > >> > I tried disabling the timer rendezvous code (not positive I > >> > was successful), but still got large measurements, and > >> > eventually the machine froze up (but not before I observed > >> > the stime skew climbing quickly to the millisecond-plus > >> > range). > >> > >> You can instrument irq_enter() and irq_exit() to read TSC and > >> find out the > >> distribution of irq handling times for interruptions that Xen > >> knows about. > >> Otherwise who knows, it could even be system management > mode on that > >> particular box. > >> > >> -- Keir > >> > >> > >> > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月14日 10:34 > >> handler, why not take one more step to measure every handler > >Great idea! I added a max_cycles field to irq_desc_t >and check/update it at every interrupt in do_IRQ, >then print max_cycles in "xm debug-key i", >including a "max max". > >I'm not entirely sure how to interpret the output >from dump_irqs() but the only two IRQ's with >type==PCI-MSI have a "large" >max (450044 cycles and 730972 cycles). The third is >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 >cycles. No other interrupt has a max exceeding 10000. > >Here's the relevant output. How do I map this to >something meaningful?One immediate trick is to print handler address for problematic vector, and then search its name in your dumpped Xen symbol file. Thanks, Kevin> >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 >max_cycles=450044 in-flight=0 domain-list=0:254(----), > >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 >max_cycles=730972 in-flight=0 domain-list=0:255(----),Above two look like owned by dom0.> >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 >max_cycles=1047500 mapped, unboundlooks like PIT, but not quite sure.> >(XEN) max_max_cycles = 1047500 > > > > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> Sent: Monday, April 13, 2009 5:19 PM >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >From: Dan Magenheimer >> >Sent: 2009年4月14日 5:15 >> > >> >> You can instrument irq_enter() and irq_exit() to read TSC >> > >> >Rather than do this generically and ensure I get all the macros >> >correct (e.g. per_cpu, nesting) I manually instrumented three >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one >> >in smp_call_function(). ALL of them show an issue with max >> >readings in the 300K-1M range... with smp_call_function showing >> >the lowest max and the second in do_IRQ (the non-guest one) >> >showing readings over 1M (and the guest one at about 800K). >> >> Since you already reach this step around calling actual action's >> handler, why not take one more step to measure every handler >> (serial, apic, vtd, ...)? You can first simply print which handlers >> are registered or invoked on your platform. If only one handler >> is experienced with abnormal high latency, it's possibly one >> specific point. Or else you can suspect on some common code >> shared by all handlers, or ... as Keir said, it could be SMM. :-) >> >> Thanks, >> Kevin >> >> > >> >Interestingly, I get no readings at all over 60K when I >> >recompile with max_phys_cpus=4 (and with nosmp) on my >> >quad-core-by-two-thread machine. This is versus several >> >readings over 60K nearly every second when max_phys_cpus=8. >> > >> >> Otherwise who knows, it could even be system management mode >> > >> >I suppose measuring irq_enter/exist pairs still don't rule >> >this out. But the "large" interrupts don't seem to happen >> >(at least not nearly as frequently) with fewer physical >> >processors enabled, so sys mgmt mode seems unlikely. >> > >> >Anyway, still a probable problem, still mostly a mystery >> >as to what is actually happening. And, repeat, this has >> >nothing to do with tmem... I'm just observing it using >> >tmem as a convenient measurement tool. >> > >> >> -----Original Message----- >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> >> Sent: Monday, April 13, 2009 2:24 AM >> >> To: Dan Magenheimer; Xen-Devel (E-mail) >> >> Subject: Re: [Xen-devel] million cycle interrupt >> >> >> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" >> >> <dan.magenheimer@oracle.com> wrote: >> >> >> >> > Is a million cycles in an interrupt handler bad? Any idea what >> >> > might be consuming this? The evidence might imply more cpus >> >> > means longer interrupt, which bodes poorly for larger machines. >> >> > I tried disabling the timer rendezvous code (not positive I >> >> > was successful), but still got large measurements, and >> >> > eventually the machine froze up (but not before I observed >> >> > the stime skew climbing quickly to the millisecond-plus >> >> > range). >> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC and >> >> find out the >> >> distribution of irq handling times for interruptions that Xen >> >> knows about. >> >> Otherwise who knows, it could even be system management >> mode on that >> >> particular box. >> >> >> >> -- Keir >> >> >> >> >> >> >> > >> >_______________________________________________ >> >Xen-devel mailing list >> >Xen-devel@lists.xensource.com >> >http://lists.xensource.com/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Some more interesting data that echos my earlier measurements: Booting with maxcpus=4, the three "large" interrupts are all about 20000 cycles. Booting with maxcpus=6, the three are at 146896, 183260, and 240959 cycles. So it definitely appears to get worse as maxcpus gets larger. I also wonder if it might be related to SMT?> One immediate trick is to print handler address for problematicI''ll give that a try, but probably not until tomorrow.> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Monday, April 13, 2009 8:42 PM > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >Sent: 2009年4月14日 10:34 > > > >> handler, why not take one more step to measure every handler > > > >Great idea! I added a max_cycles field to irq_desc_t > >and check/update it at every interrupt in do_IRQ, > >then print max_cycles in "xm debug-key i", > >including a "max max". > > > >I''m not entirely sure how to interpret the output > >from dump_irqs() but the only two IRQ''s with > >type==PCI-MSI have a "large" > >max (450044 cycles and 730972 cycles). The third is > >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 > >cycles. No other interrupt has a max exceeding 10000. > > > >Here''s the relevant output. How do I map this to > >something meaningful? > > One immediate trick is to print handler address for problematic > vector, and then search its name in your dumpped Xen symbol > file. > > Thanks, > Kevin > > > > >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 > >max_cycles=450044 in-flight=0 domain-list=0:254(----), > > > >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 > >max_cycles=730972 in-flight=0 domain-list=0:255(----), > > Above two look like owned by dom0. > > > > >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 > >max_cycles=1047500 mapped, unbound > > > looks like PIT, but not quite sure. > > > > >(XEN) max_max_cycles = 1047500 > > > > > > > > > >> -----Original Message----- > >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> Sent: Monday, April 13, 2009 5:19 PM > >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> Subject: RE: [Xen-devel] million cycle interrupt > >> > >> > >> >From: Dan Magenheimer > >> >Sent: 2009年4月14日 5:15 > >> > > >> >> You can instrument irq_enter() and irq_exit() to read TSC > >> > > >> >Rather than do this generically and ensure I get all the macros > >> >correct (e.g. per_cpu, nesting) I manually instrumented three > >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one > >> >in smp_call_function(). ALL of them show an issue with max > >> >readings in the 300K-1M range... with smp_call_function showing > >> >the lowest max and the second in do_IRQ (the non-guest one) > >> >showing readings over 1M (and the guest one at about 800K). > >> > >> Since you already reach this step around calling actual action''s > >> handler, why not take one more step to measure every handler > >> (serial, apic, vtd, ...)? You can first simply print which handlers > >> are registered or invoked on your platform. If only one handler > >> is experienced with abnormal high latency, it''s possibly one > >> specific point. Or else you can suspect on some common code > >> shared by all handlers, or ... as Keir said, it could be SMM. :-) > >> > >> Thanks, > >> Kevin > >> > >> > > >> >Interestingly, I get no readings at all over 60K when I > >> >recompile with max_phys_cpus=4 (and with nosmp) on my > >> >quad-core-by-two-thread machine. This is versus several > >> >readings over 60K nearly every second when max_phys_cpus=8. > >> > > >> >> Otherwise who knows, it could even be system management mode > >> > > >> >I suppose measuring irq_enter/exist pairs still don''t rule > >> >this out. But the "large" interrupts don''t seem to happen > >> >(at least not nearly as frequently) with fewer physical > >> >processors enabled, so sys mgmt mode seems unlikely. > >> > > >> >Anyway, still a probable problem, still mostly a mystery > >> >as to what is actually happening. And, repeat, this has > >> >nothing to do with tmem... I''m just observing it using > >> >tmem as a convenient measurement tool. > >> > > >> >> -----Original Message----- > >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> >> Sent: Monday, April 13, 2009 2:24 AM > >> >> To: Dan Magenheimer; Xen-Devel (E-mail) > >> >> Subject: Re: [Xen-devel] million cycle interrupt > >> >> > >> >> > >> >> On 12/04/2009 21:16, "Dan Magenheimer" > >> >> <dan.magenheimer@oracle.com> wrote: > >> >> > >> >> > Is a million cycles in an interrupt handler bad? Any > idea what > >> >> > might be consuming this? The evidence might imply more cpus > >> >> > means longer interrupt, which bodes poorly for larger > machines. > >> >> > I tried disabling the timer rendezvous code (not positive I > >> >> > was successful), but still got large measurements, and > >> >> > eventually the machine froze up (but not before I observed > >> >> > the stime skew climbing quickly to the millisecond-plus > >> >> > range). > >> >> > >> >> You can instrument irq_enter() and irq_exit() to read TSC and > >> >> find out the > >> >> distribution of irq handling times for interruptions that Xen > >> >> knows about. > >> >> Otherwise who knows, it could even be system management > >> mode on that > >> >> particular box. > >> >> > >> >> -- Keir > >> >> > >> >> > >> >> > >> > > >> >_______________________________________________ > >> >Xen-devel mailing list > >> >Xen-devel@lists.xensource.com > >> >http://lists.xensource.com/xen-devel > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月14日 10:59 > >Some more interesting data that echos my earlier >measurements: > >Booting with maxcpus=4, the three "large" interrupts >are all about 20000 cycles. Booting with maxcpus=6, >the three are at 146896, 183260, and 240959 cycles. > >So it definitely appears to get worse as maxcpus >gets larger.lock contention?> >I also wonder if it might be related to SMT?If it's really caused by SMT, it's more likely to be software caused resource contention due to doubled logical processors. At hardware level although cache is shared, ISRs in Xen are normally short and I don't think such high factor could come from there... Thanks, Kevin> >> One immediate trick is to print handler address for problematic > >I'll give that a try, but probably not until tomorrow. > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> Sent: Monday, April 13, 2009 8:42 PM >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >> >Sent: 2009年4月14日 10:34 >> > >> >> handler, why not take one more step to measure every handler >> > >> >Great idea! I added a max_cycles field to irq_desc_t >> >and check/update it at every interrupt in do_IRQ, >> >then print max_cycles in "xm debug-key i", >> >including a "max max". >> > >> >I'm not entirely sure how to interpret the output >> >from dump_irqs() but the only two IRQ's with >> >type==PCI-MSI have a "large" >> >max (450044 cycles and 730972 cycles). The third is >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 >> >cycles. No other interrupt has a max exceeding 10000. >> > >> >Here's the relevant output. How do I map this to >> >something meaningful? >> >> One immediate trick is to print handler address for problematic >> vector, and then search its name in your dumpped Xen symbol >> file. >> >> Thanks, >> Kevin >> >> > >> >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 >> >max_cycles=450044 in-flight=0 domain-list=0:254(----), >> > >> >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 >> >max_cycles=730972 in-flight=0 domain-list=0:255(----), >> >> Above two look like owned by dom0. >> >> > >> >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 >> >max_cycles=1047500 mapped, unbound >> >> >> looks like PIT, but not quite sure. >> >> > >> >(XEN) max_max_cycles = 1047500 >> > >> > >> > >> > >> >> -----Original Message----- >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> >> Sent: Monday, April 13, 2009 5:19 PM >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >> >> >> >From: Dan Magenheimer >> >> >Sent: 2009年4月14日 5:15 >> >> > >> >> >> You can instrument irq_enter() and irq_exit() to read TSC >> >> > >> >> >Rather than do this generically and ensure I get all the macros >> >> >correct (e.g. per_cpu, nesting) I manually instrumented three >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one >> >> >in smp_call_function(). ALL of them show an issue with max >> >> >readings in the 300K-1M range... with smp_call_function showing >> >> >the lowest max and the second in do_IRQ (the non-guest one) >> >> >showing readings over 1M (and the guest one at about 800K). >> >> >> >> Since you already reach this step around calling actual action's >> >> handler, why not take one more step to measure every handler >> >> (serial, apic, vtd, ...)? You can first simply print >which handlers >> >> are registered or invoked on your platform. If only one handler >> >> is experienced with abnormal high latency, it's possibly one >> >> specific point. Or else you can suspect on some common code >> >> shared by all handlers, or ... as Keir said, it could be SMM. :-) >> >> >> >> Thanks, >> >> Kevin >> >> >> >> > >> >> >Interestingly, I get no readings at all over 60K when I >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my >> >> >quad-core-by-two-thread machine. This is versus several >> >> >readings over 60K nearly every second when max_phys_cpus=8. >> >> > >> >> >> Otherwise who knows, it could even be system management mode >> >> > >> >> >I suppose measuring irq_enter/exist pairs still don't rule >> >> >this out. But the "large" interrupts don't seem to happen >> >> >(at least not nearly as frequently) with fewer physical >> >> >processors enabled, so sys mgmt mode seems unlikely. >> >> > >> >> >Anyway, still a probable problem, still mostly a mystery >> >> >as to what is actually happening. And, repeat, this has >> >> >nothing to do with tmem... I'm just observing it using >> >> >tmem as a convenient measurement tool. >> >> > >> >> >> -----Original Message----- >> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> >> >> Sent: Monday, April 13, 2009 2:24 AM >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail) >> >> >> Subject: Re: [Xen-devel] million cycle interrupt >> >> >> >> >> >> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" >> >> >> <dan.magenheimer@oracle.com> wrote: >> >> >> >> >> >> > Is a million cycles in an interrupt handler bad? Any >> idea what >> >> >> > might be consuming this? The evidence might imply more cpus >> >> >> > means longer interrupt, which bodes poorly for larger >> machines. >> >> >> > I tried disabling the timer rendezvous code (not positive I >> >> >> > was successful), but still got large measurements, and >> >> >> > eventually the machine froze up (but not before I observed >> >> >> > the stime skew climbing quickly to the millisecond-plus >> >> >> > range). >> >> >> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC and >> >> >> find out the >> >> >> distribution of irq handling times for interruptions that Xen >> >> >> knows about. >> >> >> Otherwise who knows, it could even be system management >> >> mode on that >> >> >> particular box. >> >> >> >> >> >> -- Keir >> >> >> >> >> >> >> >> >> >> >> > >> >> >_______________________________________________ >> >> >Xen-devel mailing list >> >> >Xen-devel@lists.xensource.com >> >> >http://lists.xensource.com/xen-devel >> >> > >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The IO-APIC-edge interrupt action is timer_interrupt. The MSI interrupt action is showing up strange (2010001)... I''ll need to look into this more.> lock contention?Possible, though the consistent max numbers make it seem unlikely.> If it''s really caused by SMT, it''s more likely to be software > caused resource contention due to doubled logical processors.You mentioned PIT earlier... I wonder if PIT can be read in parallel if read from different cores, but if read from different threads, it must be serialized? A million cycles is probably about right for a platform timer read, true?> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Monday, April 13, 2009 9:05 PM > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >Sent: 2009年4月14日 10:59 > > > >Some more interesting data that echos my earlier > >measurements: > > > >Booting with maxcpus=4, the three "large" interrupts > >are all about 20000 cycles. Booting with maxcpus=6, > >the three are at 146896, 183260, and 240959 cycles. > > > >So it definitely appears to get worse as maxcpus > >gets larger. > > lock contention? > > > > >I also wonder if it might be related to SMT? > > If it''s really caused by SMT, it''s more likely to be software > caused resource contention due to doubled logical processors. > At hardware level although cache is shared, ISRs in Xen are > normally short and I don''t think such high factor could come > from there... > > Thanks, > Kevin > > > > >> One immediate trick is to print handler address for problematic > > > >I''ll give that a try, but probably not until tomorrow. > > > >> -----Original Message----- > >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> Sent: Monday, April 13, 2009 8:42 PM > >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> Subject: RE: [Xen-devel] million cycle interrupt > >> > >> > >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >> >Sent: 2009年4月14日 10:34 > >> > > >> >> handler, why not take one more step to measure every handler > >> > > >> >Great idea! I added a max_cycles field to irq_desc_t > >> >and check/update it at every interrupt in do_IRQ, > >> >then print max_cycles in "xm debug-key i", > >> >including a "max max". > >> > > >> >I''m not entirely sure how to interpret the output > >> >from dump_irqs() but the only two IRQ''s with > >> >type==PCI-MSI have a "large" > >> >max (450044 cycles and 730972 cycles). The third is > >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 > >> >cycles. No other interrupt has a max exceeding 10000. > >> > > >> >Here''s the relevant output. How do I map this to > >> >something meaningful? > >> > >> One immediate trick is to print handler address for problematic > >> vector, and then search its name in your dumpped Xen symbol > >> file. > >> > >> Thanks, > >> Kevin > >> > >> > > >> >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 > >> >max_cycles=450044 in-flight=0 domain-list=0:254(----), > >> > > >> >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 > >> >max_cycles=730972 in-flight=0 domain-list=0:255(----), > >> > >> Above two look like owned by dom0. > >> > >> > > >> >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 > >> >max_cycles=1047500 mapped, unbound > >> > >> > >> looks like PIT, but not quite sure. > >> > >> > > >> >(XEN) max_max_cycles = 1047500 > >> > > >> > > >> > > >> > > >> >> -----Original Message----- > >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> >> Sent: Monday, April 13, 2009 5:19 PM > >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> >> Subject: RE: [Xen-devel] million cycle interrupt > >> >> > >> >> > >> >> >From: Dan Magenheimer > >> >> >Sent: 2009年4月14日 5:15 > >> >> > > >> >> >> You can instrument irq_enter() and irq_exit() to read TSC > >> >> > > >> >> >Rather than do this generically and ensure I get all the macros > >> >> >correct (e.g. per_cpu, nesting) I manually instrumented three > >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one > >> >> >in smp_call_function(). ALL of them show an issue with max > >> >> >readings in the 300K-1M range... with smp_call_function showing > >> >> >the lowest max and the second in do_IRQ (the non-guest one) > >> >> >showing readings over 1M (and the guest one at about 800K). > >> >> > >> >> Since you already reach this step around calling actual action''s > >> >> handler, why not take one more step to measure every handler > >> >> (serial, apic, vtd, ...)? You can first simply print > >which handlers > >> >> are registered or invoked on your platform. If only one handler > >> >> is experienced with abnormal high latency, it''s possibly one > >> >> specific point. Or else you can suspect on some common code > >> >> shared by all handlers, or ... as Keir said, it could > be SMM. :-) > >> >> > >> >> Thanks, > >> >> Kevin > >> >> > >> >> > > >> >> >Interestingly, I get no readings at all over 60K when I > >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my > >> >> >quad-core-by-two-thread machine. This is versus several > >> >> >readings over 60K nearly every second when max_phys_cpus=8. > >> >> > > >> >> >> Otherwise who knows, it could even be system management mode > >> >> > > >> >> >I suppose measuring irq_enter/exist pairs still don''t rule > >> >> >this out. But the "large" interrupts don''t seem to happen > >> >> >(at least not nearly as frequently) with fewer physical > >> >> >processors enabled, so sys mgmt mode seems unlikely. > >> >> > > >> >> >Anyway, still a probable problem, still mostly a mystery > >> >> >as to what is actually happening. And, repeat, this has > >> >> >nothing to do with tmem... I''m just observing it using > >> >> >tmem as a convenient measurement tool. > >> >> > > >> >> >> -----Original Message----- > >> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> >> >> Sent: Monday, April 13, 2009 2:24 AM > >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail) > >> >> >> Subject: Re: [Xen-devel] million cycle interrupt > >> >> >> > >> >> >> > >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" > >> >> >> <dan.magenheimer@oracle.com> wrote: > >> >> >> > >> >> >> > Is a million cycles in an interrupt handler bad? Any > >> idea what > >> >> >> > might be consuming this? The evidence might imply > more cpus > >> >> >> > means longer interrupt, which bodes poorly for larger > >> machines. > >> >> >> > I tried disabling the timer rendezvous code (not positive I > >> >> >> > was successful), but still got large measurements, and > >> >> >> > eventually the machine froze up (but not before I observed > >> >> >> > the stime skew climbing quickly to the millisecond-plus > >> >> >> > range). > >> >> >> > >> >> >> You can instrument irq_enter() and irq_exit() to read TSC and > >> >> >> find out the > >> >> >> distribution of irq handling times for interruptions that Xen > >> >> >> knows about. > >> >> >> Otherwise who knows, it could even be system management > >> >> mode on that > >> >> >> particular box. > >> >> >> > >> >> >> -- Keir > >> >> >> > >> >> >> > >> >> >> > >> >> > > >> >> >_______________________________________________ > >> >> >Xen-devel mailing list > >> >> >Xen-devel@lists.xensource.com > >> >> >http://lists.xensource.com/xen-devel > >> >> > > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月14日 12:24 > >The IO-APIC-edge interrupt action is timer_interrupt.this could be either driven by PIT or HPET (in legacy replacement mode), and activated when PIT is using as a platform timer or when PIT/HPET is used to broadcast wakeup event to cpus in deep C-states. The logic in timer_interrupt is very simple, and maybe you could again go one more step tracing into that function, e.g. the block protected by pit_lock. :-)> >The MSI interrupt action is showing up strange >(2010001)... I'll need to look into this more.It's reasonable as action is allocated by xmalloc for those devices owned by guest. I guess you find two MSI instances in __do_IRQ_guest path? If yes, the logic to handle pending eoi may be suspected. Then you may use "ioapic_ack=old" to verify. From information by far, it looks like the weird high cost could be from individual handler implementation, or else you should be able to observe in every handler (such as serial).> >> lock contention? > >Possible, though the consistent max numbers make it >seem unlikely. > >> If it's really caused by SMT, it's more likely to be software >> caused resource contention due to doubled logical processors. > >You mentioned PIT earlier... I wonder if PIT can be >read in parallel if read from different cores, but >if read from different threads, it must be serialized? >A million cycles is probably about right for a platform >timer read, true?It looks too large to me. Thanks, Kevin> >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> Sent: Monday, April 13, 2009 9:05 PM >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >> >Sent: 2009年4月14日 10:59 >> > >> >Some more interesting data that echos my earlier >> >measurements: >> > >> >Booting with maxcpus=4, the three "large" interrupts >> >are all about 20000 cycles. Booting with maxcpus=6, >> >the three are at 146896, 183260, and 240959 cycles. >> > >> >So it definitely appears to get worse as maxcpus >> >gets larger. >> >> lock contention? >> >> > >> >I also wonder if it might be related to SMT? >> >> If it's really caused by SMT, it's more likely to be software >> caused resource contention due to doubled logical processors. >> At hardware level although cache is shared, ISRs in Xen are >> normally short and I don't think such high factor could come >> from there... >> >> Thanks, >> Kevin >> >> > >> >> One immediate trick is to print handler address for problematic >> > >> >I'll give that a try, but probably not until tomorrow. >> > >> >> -----Original Message----- >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> >> Sent: Monday, April 13, 2009 8:42 PM >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >> >> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >> >> >Sent: 2009年4月14日 10:34 >> >> > >> >> >> handler, why not take one more step to measure every handler >> >> > >> >> >Great idea! I added a max_cycles field to irq_desc_t >> >> >and check/update it at every interrupt in do_IRQ, >> >> >then print max_cycles in "xm debug-key i", >> >> >including a "max max". >> >> > >> >> >I'm not entirely sure how to interpret the output >> >> >from dump_irqs() but the only two IRQ's with >> >> >type==PCI-MSI have a "large" >> >> >max (450044 cycles and 730972 cycles). The third is >> >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 >> >> >cycles. No other interrupt has a max exceeding 10000. >> >> > >> >> >Here's the relevant output. How do I map this to >> >> >something meaningful? >> >> >> >> One immediate trick is to print handler address for problematic >> >> vector, and then search its name in your dumpped Xen symbol >> >> file. >> >> >> >> Thanks, >> >> Kevin >> >> >> >> > >> >> >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 >> >> >max_cycles=450044 in-flight=0 domain-list=0:254(----), >> >> > >> >> >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 >> >> >max_cycles=730972 in-flight=0 domain-list=0:255(----), >> >> >> >> Above two look like owned by dom0. >> >> >> >> > >> >> >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 >> >> >max_cycles=1047500 mapped, unbound >> >> >> >> >> >> looks like PIT, but not quite sure. >> >> >> >> > >> >> >(XEN) max_max_cycles = 1047500 >> >> > >> >> > >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> >> >> Sent: Monday, April 13, 2009 5:19 PM >> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> >> >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >> >> >> >> >> >> >From: Dan Magenheimer >> >> >> >Sent: 2009年4月14日 5:15 >> >> >> > >> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC >> >> >> > >> >> >> >Rather than do this generically and ensure I get all >the macros >> >> >> >correct (e.g. per_cpu, nesting) I manually instrumented three >> >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one >> >> >> >in smp_call_function(). ALL of them show an issue with max >> >> >> >readings in the 300K-1M range... with >smp_call_function showing >> >> >> >the lowest max and the second in do_IRQ (the non-guest one) >> >> >> >showing readings over 1M (and the guest one at about 800K). >> >> >> >> >> >> Since you already reach this step around calling >actual action's >> >> >> handler, why not take one more step to measure every handler >> >> >> (serial, apic, vtd, ...)? You can first simply print >> >which handlers >> >> >> are registered or invoked on your platform. If only one handler >> >> >> is experienced with abnormal high latency, it's possibly one >> >> >> specific point. Or else you can suspect on some common code >> >> >> shared by all handlers, or ... as Keir said, it could >> be SMM. :-) >> >> >> >> >> >> Thanks, >> >> >> Kevin >> >> >> >> >> >> > >> >> >> >Interestingly, I get no readings at all over 60K when I >> >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my >> >> >> >quad-core-by-two-thread machine. This is versus several >> >> >> >readings over 60K nearly every second when max_phys_cpus=8. >> >> >> > >> >> >> >> Otherwise who knows, it could even be system management mode >> >> >> > >> >> >> >I suppose measuring irq_enter/exist pairs still don't rule >> >> >> >this out. But the "large" interrupts don't seem to happen >> >> >> >(at least not nearly as frequently) with fewer physical >> >> >> >processors enabled, so sys mgmt mode seems unlikely. >> >> >> > >> >> >> >Anyway, still a probable problem, still mostly a mystery >> >> >> >as to what is actually happening. And, repeat, this has >> >> >> >nothing to do with tmem... I'm just observing it using >> >> >> >tmem as a convenient measurement tool. >> >> >> > >> >> >> >> -----Original Message----- >> >> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> >> >> >> Sent: Monday, April 13, 2009 2:24 AM >> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail) >> >> >> >> Subject: Re: [Xen-devel] million cycle interrupt >> >> >> >> >> >> >> >> >> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" >> >> >> >> <dan.magenheimer@oracle.com> wrote: >> >> >> >> >> >> >> >> > Is a million cycles in an interrupt handler bad? Any >> >> idea what >> >> >> >> > might be consuming this? The evidence might imply >> more cpus >> >> >> >> > means longer interrupt, which bodes poorly for larger >> >> machines. >> >> >> >> > I tried disabling the timer rendezvous code (not >positive I >> >> >> >> > was successful), but still got large measurements, and >> >> >> >> > eventually the machine froze up (but not before I observed >> >> >> >> > the stime skew climbing quickly to the millisecond-plus >> >> >> >> > range). >> >> >> >> >> >> >> >> You can instrument irq_enter() and irq_exit() to >read TSC and >> >> >> >> find out the >> >> >> >> distribution of irq handling times for >interruptions that Xen >> >> >> >> knows about. >> >> >> >> Otherwise who knows, it could even be system management >> >> >> mode on that >> >> >> >> particular box. >> >> >> >> >> >> >> >> -- Keir >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >_______________________________________________ >> >> >> >Xen-devel mailing list >> >> >> >Xen-devel@lists.xensource.com >> >> >> >http://lists.xensource.com/xen-devel >> >> >> > >> >> > >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/04/2009 05:23, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> If it''s really caused by SMT, it''s more likely to be software >> caused resource contention due to doubled logical processors. > > You mentioned PIT earlier... I wonder if PIT can be > read in parallel if read from different cores, but > if read from different threads, it must be serialized? > A million cycles is probably about right for a platform > timer read, true?You could validate that quite easily by adding your own timer read wrapped in TSC reads. Actually a PIT read, even though it requires multiple accesses over the ISA bus, should take less than 10us. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> be suspected. Then you may use "ioapic_ack=old" to verify.FYI ioapic_ack=old doesn''t seem to help> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Monday, April 13, 2009 10:45 PM > To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >Sent: 2009年4月14日 12:24 > > > >The IO-APIC-edge interrupt action is timer_interrupt. > > this could be either driven by PIT or HPET (in legacy replacement > mode), and activated when PIT is using as a platform timer or > when PIT/HPET is used to broadcast wakeup event to cpus in > deep C-states. The logic in timer_interrupt is very simple, and > maybe you could again go one more step tracing into that function, > e.g. the block protected by pit_lock. :-) > > > > >The MSI interrupt action is showing up strange > >(2010001)... I''ll need to look into this more. > > It''s reasonable as action is allocated by xmalloc for those devices > owned by guest. I guess you find two MSI instances in > __do_IRQ_guest path? If yes, the logic to handle pending eoi may > be suspected. Then you may use "ioapic_ack=old" to verify. > > From information by far, it looks like the weird high cost could be > from individual handler implementation, or else you should be able > to observe in every handler (such as serial). > > > > >> lock contention? > > > >Possible, though the consistent max numbers make it > >seem unlikely. > > > >> If it''s really caused by SMT, it''s more likely to be software > >> caused resource contention due to doubled logical processors. > > > >You mentioned PIT earlier... I wonder if PIT can be > >read in parallel if read from different cores, but > >if read from different threads, it must be serialized? > >A million cycles is probably about right for a platform > >timer read, true? > > It looks too large to me. > > Thanks, > Kevin > > > > >> -----Original Message----- > >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> Sent: Monday, April 13, 2009 9:05 PM > >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> Subject: RE: [Xen-devel] million cycle interrupt > >> > >> > >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >> >Sent: 2009年4月14日 10:59 > >> > > >> >Some more interesting data that echos my earlier > >> >measurements: > >> > > >> >Booting with maxcpus=4, the three "large" interrupts > >> >are all about 20000 cycles. Booting with maxcpus=6, > >> >the three are at 146896, 183260, and 240959 cycles. > >> > > >> >So it definitely appears to get worse as maxcpus > >> >gets larger. > >> > >> lock contention? > >> > >> > > >> >I also wonder if it might be related to SMT? > >> > >> If it''s really caused by SMT, it''s more likely to be software > >> caused resource contention due to doubled logical processors. > >> At hardware level although cache is shared, ISRs in Xen are > >> normally short and I don''t think such high factor could come > >> from there... > >> > >> Thanks, > >> Kevin > >> > >> > > >> >> One immediate trick is to print handler address for problematic > >> > > >> >I''ll give that a try, but probably not until tomorrow. > >> > > >> >> -----Original Message----- > >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> >> Sent: Monday, April 13, 2009 8:42 PM > >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> >> Subject: RE: [Xen-devel] million cycle interrupt > >> >> > >> >> > >> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >> >> >Sent: 2009年4月14日 10:34 > >> >> > > >> >> >> handler, why not take one more step to measure every handler > >> >> > > >> >> >Great idea! I added a max_cycles field to irq_desc_t > >> >> >and check/update it at every interrupt in do_IRQ, > >> >> >then print max_cycles in "xm debug-key i", > >> >> >including a "max max". > >> >> > > >> >> >I''m not entirely sure how to interpret the output > >> >> >from dump_irqs() but the only two IRQ''s with > >> >> >type==PCI-MSI have a "large" > >> >> >max (450044 cycles and 730972 cycles). The third is > >> >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 > >> >> >cycles. No other interrupt has a max exceeding 10000. > >> >> > > >> >> >Here''s the relevant output. How do I map this to > >> >> >something meaningful? > >> >> > >> >> One immediate trick is to print handler address for problematic > >> >> vector, and then search its name in your dumpped Xen symbol > >> >> file. > >> >> > >> >> Thanks, > >> >> Kevin > >> >> > >> >> > > >> >> >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 > >> >> >max_cycles=450044 in-flight=0 domain-list=0:254(----), > >> >> > > >> >> >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 > >> >> >max_cycles=730972 in-flight=0 domain-list=0:255(----), > >> >> > >> >> Above two look like owned by dom0. > >> >> > >> >> > > >> >> >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 > >> >> >max_cycles=1047500 mapped, unbound > >> >> > >> >> > >> >> looks like PIT, but not quite sure. > >> >> > >> >> > > >> >> >(XEN) max_max_cycles = 1047500 > >> >> > > >> >> > > >> >> > > >> >> > > >> >> >> -----Original Message----- > >> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> >> >> Sent: Monday, April 13, 2009 5:19 PM > >> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) > >> >> >> Subject: RE: [Xen-devel] million cycle interrupt > >> >> >> > >> >> >> > >> >> >> >From: Dan Magenheimer > >> >> >> >Sent: 2009年4月14日 5:15 > >> >> >> > > >> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC > >> >> >> > > >> >> >> >Rather than do this generically and ensure I get all > >the macros > >> >> >> >correct (e.g. per_cpu, nesting) I manually > instrumented three > >> >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one > >> >> >> >in smp_call_function(). ALL of them show an issue with max > >> >> >> >readings in the 300K-1M range... with > >smp_call_function showing > >> >> >> >the lowest max and the second in do_IRQ (the non-guest one) > >> >> >> >showing readings over 1M (and the guest one at about 800K). > >> >> >> > >> >> >> Since you already reach this step around calling > >actual action''s > >> >> >> handler, why not take one more step to measure every handler > >> >> >> (serial, apic, vtd, ...)? You can first simply print > >> >which handlers > >> >> >> are registered or invoked on your platform. If only > one handler > >> >> >> is experienced with abnormal high latency, it''s possibly one > >> >> >> specific point. Or else you can suspect on some common code > >> >> >> shared by all handlers, or ... as Keir said, it could > >> be SMM. :-) > >> >> >> > >> >> >> Thanks, > >> >> >> Kevin > >> >> >> > >> >> >> > > >> >> >> >Interestingly, I get no readings at all over 60K when I > >> >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my > >> >> >> >quad-core-by-two-thread machine. This is versus several > >> >> >> >readings over 60K nearly every second when max_phys_cpus=8. > >> >> >> > > >> >> >> >> Otherwise who knows, it could even be system > management mode > >> >> >> > > >> >> >> >I suppose measuring irq_enter/exist pairs still don''t rule > >> >> >> >this out. But the "large" interrupts don''t seem to happen > >> >> >> >(at least not nearly as frequently) with fewer physical > >> >> >> >processors enabled, so sys mgmt mode seems unlikely. > >> >> >> > > >> >> >> >Anyway, still a probable problem, still mostly a mystery > >> >> >> >as to what is actually happening. And, repeat, this has > >> >> >> >nothing to do with tmem... I''m just observing it using > >> >> >> >tmem as a convenient measurement tool. > >> >> >> > > >> >> >> >> -----Original Message----- > >> >> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> >> >> >> Sent: Monday, April 13, 2009 2:24 AM > >> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail) > >> >> >> >> Subject: Re: [Xen-devel] million cycle interrupt > >> >> >> >> > >> >> >> >> > >> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" > >> >> >> >> <dan.magenheimer@oracle.com> wrote: > >> >> >> >> > >> >> >> >> > Is a million cycles in an interrupt handler bad? Any > >> >> idea what > >> >> >> >> > might be consuming this? The evidence might imply > >> more cpus > >> >> >> >> > means longer interrupt, which bodes poorly for larger > >> >> machines. > >> >> >> >> > I tried disabling the timer rendezvous code (not > >positive I > >> >> >> >> > was successful), but still got large measurements, and > >> >> >> >> > eventually the machine froze up (but not before > I observed > >> >> >> >> > the stime skew climbing quickly to the millisecond-plus > >> >> >> >> > range). > >> >> >> >> > >> >> >> >> You can instrument irq_enter() and irq_exit() to > >read TSC and > >> >> >> >> find out the > >> >> >> >> distribution of irq handling times for > >interruptions that Xen > >> >> >> >> knows about. > >> >> >> >> Otherwise who knows, it could even be system management > >> >> >> mode on that > >> >> >> >> particular box. > >> >> >> >> > >> >> >> >> -- Keir > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> >_______________________________________________ > >> >> >> >Xen-devel mailing list > >> >> >> >Xen-devel@lists.xensource.com > >> >> >> >http://lists.xensource.com/xen-devel > >> >> >> > > >> >> > > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> You could validate that quite easily by adding your own timer > read wrapped > in TSC reads. Actually a PIT read, even though it requires > multiple accesses > over the ISA bus, should take less than 10us.I''ll take a look at that next. Taking average as well as max, the timer_interrupt interrupt is AVERAGING over 300K cycles (with 8 processors). This interrupt is 100Hz, correct? If so, that means a full 1% of one processor''s cycles are spent processing timer_interrupts! The MSI interrupts have large max, but the average is relatively small (under 5000 cycles). It still would be nice to know what is causing the max value, especially since it only occurs for me with more than 4 processors. But first, Keir, would you take this patch so that this kind of issue can be more easily diagnosed in the future? If you''re worried about the extra two global variable reads, you could wrap an "#ifndef NDEBUG" around it (with #else #define opt_measure_irqs 0), but it might be nice to have this diagnostic capability in released code. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I''ll take a look at that next.It appears that the call to smp_timer_broadcast_ipi() in timer_interrupt() is the cycle hog. And it definitely appears to be a scaleability problem! maxcpus=4: avg=1600, max=15000 (cycles, rounded) maxcpus=5: avg=2000, max=24000 maxcpus=6: avg=83000, max=244000 maxcpus=7: avg=198000, max=780000 maxcpus=8: avg=310000, max=1027000 The load is a 4vcpu PV EL5u2 32-bit domain continually compiling linux-2.6.28 with -j80. I killed the load after only a few minutes, so the max might get worse. On the other hand, just booting dom0 seems to put max in about the same range. Dan> -----Original Message----- > From: Dan Magenheimer > Sent: Tuesday, April 14, 2009 9:39 AM > To: Keir Fraser; Tian, Kevin; Xen-Devel (E-mail) > Subject: RE: [Xen-devel] million cycle interrupt > > > > You could validate that quite easily by adding your own timer > > read wrapped > > in TSC reads. Actually a PIT read, even though it requires > > multiple accesses > > over the ISA bus, should take less than 10us. > > I''ll take a look at that next. Taking average as > well as max, the timer_interrupt interrupt is > AVERAGING over 300K cycles (with 8 processors). > This interrupt is 100Hz, correct? If so, that means > a full 1% of one processor''s cycles are spent > processing timer_interrupts! > > The MSI interrupts have large max, but the average > is relatively small (under 5000 cycles). It still > would be nice to know what is causing the max > value, especially since it only occurs for me with > more than 4 processors. > > But first, Keir, would you take this patch so that this > kind of issue can be more easily diagnosed in the future? > If you''re worried about the extra two global variable > reads, you could wrap an "#ifndef NDEBUG" around it > (with #else #define opt_measure_irqs 0), but it might > be nice to have this diagnostic capability in released > code. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月15日 1:11 > >> I'll take a look at that next. > >It appears that the call to smp_timer_broadcast_ipi() >in timer_interrupt() is the cycle hog. And it >definitely appears to be a scaleability problem!So I guess processors on your platform at least support some deep C-states (>=C3), or else that broadcast path should be a nop, is it right? If you disable deep C-states in BIOS, I guess it then scales well.> >maxcpus=4: avg=1600, max=15000 (cycles, rounded) >maxcpus=5: avg=2000, max=24000 >maxcpus=6: avg=83000, max=244000 >maxcpus=7: avg=198000, max=780000 >maxcpus=8: avg=310000, max=1027000This looks sth related to IPI overhead. When more processors are available, more IPIs are required to contend the bus. But I'm not sure whether the attitude of change fully comes from that part. Which type of platform are you using? We'll also take a try here to first reproduce and then find exact reason.> >The load is a 4vcpu PV EL5u2 32-bit domain continually >compiling linux-2.6.28 with -j80. I killed the load >after only a few minutes, so the max might get worse. >On the other hand, just booting dom0 seems to put >max in about the same range.And average is then very low w/o dom0 only? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel