thr3ads.net - Xen devel - [Xen-devel] million cycle interrupt [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2009-Apr-12 20:16 UTC

[Xen-devel] million cycle interrupt

(I realize my last similar topic turned out to be a false alarm,
so I hope I am not "crying wolf" again.)

I''m still trying to understand some anomalous measurements in
tmem, particularly instances where the maximum cycle count
greatly exceeds the average.  It seems that tmem''s compression
code is a magnet for interruptions.  This inspired me to
create a more controlled magnet, an "interrupt honeypot".
To do this, at every tmem call, I run a loop which does nothing
but repetitively read tsc and check the difference between
successive reads.  On both of my test machines, the measurement
is normally well under 100 cycles.  Infrequently, I get a "large"
measurement which, since xen is non-preemptive, indicates
a lengthy interrupt (or possibly that the tsc is getting moved
forward).  My code uses per_cpu to ensure that there aren''t
any measurement/recording races (which were the issue with
my previous 10M "problem").

The result:  On my quad-core-by-two-thread machine, I frequently
get "large" measurements over 250000 cycles, with the max
just over 1 million (and actually just over 2^20).  Frequency
averages about one every 1-2 seconds, but the measurement
methodology makes it impossible to determine the true
frequency or spacing.  The vast majority of the "large" samples
are reported on cpu#0 but a handful are reported on other cpus.
This might also be methodology-related but the load is running
on 4 vcpus.

On the same machine, when I run with nosmp, I see no large
measurements.  And when I run the load with 1 vcpu, I see
a lower frequency (about one every ten seconds), but again
this could be due to the measurement methodology.

On my dual-core (no SMT) test machine, I see only a couple of
large measurements, 120438 cycles on cpu#0 and 120528 on cpu#1.
The same load is being run, though limited to 2 vcpus.

Is a million cycles in an interrupt handler bad?  Any idea what
might be consuming this?  The evidence might imply more cpus
means longer interrupt, which bodes poorly for larger machines.
I tried disabling the timer rendezvous code (not positive I
was successful), but still got large measurements, and
eventually the machine froze up (but not before I observed
the stime skew climbing quickly to the millisecond-plus
range).

Is there a way to cap the number of physical cpus seen by Xen
(other than nosmp to cap at one)?

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-13 08:23 UTC

head link

Re: [Xen-devel] million cycle interrupt

On 12/04/2009 21:16, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> Is a million cycles in an interrupt handler bad?  Any idea what
> might be consuming this?  The evidence might imply more cpus
> means longer interrupt, which bodes poorly for larger machines.
> I tried disabling the timer rendezvous code (not positive I
> was successful), but still got large measurements, and
> eventually the machine froze up (but not before I observed
> the stime skew climbing quickly to the millisecond-plus
> range).
You can instrument irq_enter() and irq_exit() to read TSC and find out the
distribution of irq handling times for interruptions that Xen knows about.
Otherwise who knows, it could even be system management mode on that
particular box.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Puthiyaparambil, Aravindh

2009-Apr-13 15:30 UTC

head link

RE: [Xen-devel] million cycle interrupt

>>Is there a way to cap the number of physical cpus seen by Xen
>>(other than nosmp to cap at one)?
There is a compile time option (max_phys_cpus) where you can specify the number
of LCPUs.

Aravindh Puthiyaparambil
Virtualization Engineer
Virtual Systems Development
Unisys, Tredyffrin PA

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-13 20:23 UTC

head link

RE: [Xen-devel] million cycle interrupt

Thanks Aravindh!

Hmmm... I''m suspicious of this.  When compiled with
max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1,
cores_per_socket=1 and threads_per_core=1.  Without
this compile option, cores_per_socket=4 and
threads_per_core=2... so I would expect only one
of these to change.  Hopefully the compile option
is simply changing the data reported by xm info.
It appears the guest still is using 4 physical processors.
> -----Original Message-----
> From: Puthiyaparambil, Aravindh
> [mailto:aravindh.puthiyaparambil@unisys.com]
> Sent: Monday, April 13, 2009 9:31 AM
> To: Dan Magenheimer; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
> 
> 
> >>Is there a way to cap the number of physical cpus seen by Xen
> >>(other than nosmp to cap at one)?
> 
> There is a compile time option (max_phys_cpus) where you can 
> specify the number of LCPUs.
> 
> Aravindh Puthiyaparambil
> Virtualization Engineer
> Virtual Systems Development
> Unisys, Tredyffrin PA
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-13 21:15 UTC

head link

RE: [Xen-devel] million cycle interrupt

> You can instrument irq_enter() and irq_exit() to read TSC 
Rather than do this generically and ensure I get all the macros
correct (e.g. per_cpu, nesting) I manually instrumented three
likely suspect irq_enter/exit pairs, two in do_IRQ() and one
in smp_call_function().  ALL of them show an issue with max
readings in the 300K-1M range... with smp_call_function showing
the lowest max and the second in do_IRQ (the non-guest one)
showing readings over 1M (and the guest one at about 800K).

Interestingly, I get no readings at all over 60K when I
recompile with max_phys_cpus=4 (and with nosmp) on my
quad-core-by-two-thread machine.  This is versus several
readings over 60K nearly every second when max_phys_cpus=8.
> Otherwise who knows, it could even be system management mode
I suppose measuring irq_enter/exist pairs still don''t rule
this out.  But the "large" interrupts don''t seem to happen
(at least not nearly as frequently) with fewer physical
processors enabled, so sys mgmt mode seems unlikely.

Anyway, still a probable problem, still mostly a mystery
as to what is actually happening.  And, repeat, this has
nothing to do with tmem... I''m just observing it using
tmem as a convenient measurement tool.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Monday, April 13, 2009 2:24 AM
> To: Dan Magenheimer; Xen-Devel (E-mail)
> Subject: Re: [Xen-devel] million cycle interrupt
> 
> 
> On 12/04/2009 21:16, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> > Is a million cycles in an interrupt handler bad?  Any idea what
> > might be consuming this?  The evidence might imply more cpus
> > means longer interrupt, which bodes poorly for larger machines.
> > I tried disabling the timer rendezvous code (not positive I
> > was successful), but still got large measurements, and
> > eventually the machine froze up (but not before I observed
> > the stime skew climbing quickly to the millisecond-plus
> > range).
> 
> You can instrument irq_enter() and irq_exit() to read TSC and 
> find out the
> distribution of irq handling times for interruptions that Xen 
> knows about.
> Otherwise who knows, it could even be system management mode on that
> particular box.
> 
>  -- Keir
> 
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Puthiyaparambil, Aravindh

2009-Apr-13 22:14 UTC

head link

RE: [Xen-devel] million cycle interrupt

I have only tried increasing the max_phys_cpus from 32 to 64, 96 etc. I would
think that decreasing it to 4 means that only the first 4 LCPUS were booted. The
rest should have been ignored.

-----Original Message-----
From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
Sent: Monday, April 13, 2009 4:24 PM
To: Puthiyaparambil, Aravindh; Xen-Devel (E-mail)
Subject: RE: [Xen-devel] million cycle interrupt

Thanks Aravindh!

Hmmm... I''m suspicious of this.  When compiled with
max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1,
cores_per_socket=1 and threads_per_core=1.  Without
this compile option, cores_per_socket=4 and
threads_per_core=2... so I would expect only one
of these to change.  Hopefully the compile option
is simply changing the data reported by xm info.
It appears the guest still is using 4 physical processors.
> -----Original Message-----
> From: Puthiyaparambil, Aravindh
> [mailto:aravindh.puthiyaparambil@unisys.com]
> Sent: Monday, April 13, 2009 9:31 AM
> To: Dan Magenheimer; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
>
>
> >>Is there a way to cap the number of physical cpus seen by Xen
> >>(other than nosmp to cap at one)?
>
> There is a compile time option (max_phys_cpus) where you can
> specify the number of LCPUs.
>
> Aravindh Puthiyaparambil
> Virtualization Engineer
> Virtual Systems Development
> Unisys, Tredyffrin PA
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-13 23:06 UTC

head link

RE: [Xen-devel] million cycle interrupt

use "maxcpus=" in your grub.conf.

Thanks
Kevin
>From: Dan Magenheimer
>Sent: 2009年4月14日 4:24
>
>Thanks Aravindh!
>
>Hmmm... I'm suspicious of this.  When compiled with
>max_phys_cpus=4, xm info shows nr_cpus=4, nr_nodes=1,
>cores_per_socket=1 and threads_per_core=1.  Without
>this compile option, cores_per_socket=4 and
>threads_per_core=2... so I would expect only one
>of these to change.  Hopefully the compile option
>is simply changing the data reported by xm info.
>It appears the guest still is using 4 physical processors.
>
>> -----Original Message-----
>> From: Puthiyaparambil, Aravindh
>> [mailto:aravindh.puthiyaparambil@unisys.com]
>> Sent: Monday, April 13, 2009 9:31 AM
>> To: Dan Magenheimer; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>> 
>> 
>> >>Is there a way to cap the number of physical cpus seen by Xen
>> >>(other than nosmp to cap at one)?
>> 
>> There is a compile time option (max_phys_cpus) where you can 
>> specify the number of LCPUs.
>> 
>> Aravindh Puthiyaparambil
>> Virtualization Engineer
>> Virtual Systems Development
>> Unisys, Tredyffrin PA
>>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>lists.xensource.com/xen-devel
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-13 23:18 UTC

head link

RE: [Xen-devel] million cycle interrupt

>From: Dan Magenheimer
>Sent: 2009年4月14日 5:15
>
>> You can instrument irq_enter() and irq_exit() to read TSC 
>
>Rather than do this generically and ensure I get all the macros
>correct (e.g. per_cpu, nesting) I manually instrumented three
>likely suspect irq_enter/exit pairs, two in do_IRQ() and one
>in smp_call_function().  ALL of them show an issue with max
>readings in the 300K-1M range... with smp_call_function showing
>the lowest max and the second in do_IRQ (the non-guest one)
>showing readings over 1M (and the guest one at about 800K).
Since you already reach this step around calling actual action's
handler, why not take one more step to measure every handler
(serial, apic, vtd, ...)? You can first simply print which handlers
are registered or invoked on your platform. If only one handler
is experienced with abnormal high latency, it's possibly one 
specific point. Or else you can suspect on some common code
shared by all handlers, or ... as Keir said, it could be SMM. :-)

Thanks,
Kevin
>
>Interestingly, I get no readings at all over 60K when I
>recompile with max_phys_cpus=4 (and with nosmp) on my
>quad-core-by-two-thread machine.  This is versus several
>readings over 60K nearly every second when max_phys_cpus=8.
>
>> Otherwise who knows, it could even be system management mode
>
>I suppose measuring irq_enter/exist pairs still don't rule
>this out.  But the "large" interrupts don't seem to happen
>(at least not nearly as frequently) with fewer physical
>processors enabled, so sys mgmt mode seems unlikely.
>
>Anyway, still a probable problem, still mostly a mystery
>as to what is actually happening.  And, repeat, this has
>nothing to do with tmem... I'm just observing it using
>tmem as a convenient measurement tool.
>
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Monday, April 13, 2009 2:24 AM
>> To: Dan Magenheimer; Xen-Devel (E-mail)
>> Subject: Re: [Xen-devel] million cycle interrupt
>> 
>> 
>> On 12/04/2009 21:16, "Dan Magenheimer" 
>> <dan.magenheimer@oracle.com> wrote:
>> 
>> > Is a million cycles in an interrupt handler bad?  Any idea what
>> > might be consuming this?  The evidence might imply more cpus
>> > means longer interrupt, which bodes poorly for larger machines.
>> > I tried disabling the timer rendezvous code (not positive I
>> > was successful), but still got large measurements, and
>> > eventually the machine froze up (but not before I observed
>> > the stime skew climbing quickly to the millisecond-plus
>> > range).
>> 
>> You can instrument irq_enter() and irq_exit() to read TSC and 
>> find out the
>> distribution of irq handling times for interruptions that Xen 
>> knows about.
>> Otherwise who knows, it could even be system management mode on that
>> particular box.
>> 
>>  -- Keir
>> 
>> 
>>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>lists.xensource.com/xen-devel
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 02:34 UTC

head link

RE: [Xen-devel] million cycle interrupt

> handler, why not take one more step to measure every handler
Great idea!  I added a max_cycles field to irq_desc_t
and check/update it at every interrupt in do_IRQ,
then print max_cycles in "xm debug-key i",
including a "max max".

I''m not entirely sure how to interpret the output
from dump_irqs() but the only two IRQ''s with
type==PCI-MSI have a "large"
max (450044 cycles and 730972 cycles).  The third is
Vec240, an IO-APIC-edge interrupt that maxes at 1047500
cycles.   No other interrupt has a max exceeding 10000.

Here''s the relevant output.  How do I map this to
something meaningful?

(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010 max_cycles=450044
in-flight=0 domain-list=0:254(----),

(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010 max_cycles=730972
in-flight=0 domain-list=0:255(----),

(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000 max_cycles=1047500
mapped, unbound

(XEN) max_max_cycles = 1047500



> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, April 13, 2009 5:19 PM
> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
>
>
> >From: Dan Magenheimer
> >Sent: 2009年4月14日 5:15
> >
> >> You can instrument irq_enter() and irq_exit() to read TSC
> >
> >Rather than do this generically and ensure I get all the macros
> >correct (e.g. per_cpu, nesting) I manually instrumented three
> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one
> >in smp_call_function().  ALL of them show an issue with max
> >readings in the 300K-1M range... with smp_call_function showing
> >the lowest max and the second in do_IRQ (the non-guest one)
> >showing readings over 1M (and the guest one at about 800K).
>
> Since you already reach this step around calling actual action''s
> handler, why not take one more step to measure every handler
> (serial, apic, vtd, ...)? You can first simply print which handlers
> are registered or invoked on your platform. If only one handler
> is experienced with abnormal high latency, it''s possibly one
> specific point. Or else you can suspect on some common code
> shared by all handlers, or ... as Keir said, it could be SMM. :-)
>
> Thanks,
> Kevin
>
> >
> >Interestingly, I get no readings at all over 60K when I
> >recompile with max_phys_cpus=4 (and with nosmp) on my
> >quad-core-by-two-thread machine.  This is versus several
> >readings over 60K nearly every second when max_phys_cpus=8.
> >
> >> Otherwise who knows, it could even be system management mode
> >
> >I suppose measuring irq_enter/exist pairs still don''t rule
> >this out.  But the "large" interrupts don''t seem to
happen
> >(at least not nearly as frequently) with fewer physical
> >processors enabled, so sys mgmt mode seems unlikely.
> >
> >Anyway, still a probable problem, still mostly a mystery
> >as to what is actually happening.  And, repeat, this has
> >nothing to do with tmem... I''m just observing it using
> >tmem as a convenient measurement tool.
> >
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> Sent: Monday, April 13, 2009 2:24 AM
> >> To: Dan Magenheimer; Xen-Devel (E-mail)
> >> Subject: Re: [Xen-devel] million cycle interrupt
> >>
> >>
> >> On 12/04/2009 21:16, "Dan Magenheimer"
> >> <dan.magenheimer@oracle.com> wrote:
> >>
> >> > Is a million cycles in an interrupt handler bad?  Any idea
what
> >> > might be consuming this?  The evidence might imply more cpus
> >> > means longer interrupt, which bodes poorly for larger
machines.
> >> > I tried disabling the timer rendezvous code (not positive I
> >> > was successful), but still got large measurements, and
> >> > eventually the machine froze up (but not before I observed
> >> > the stime skew climbing quickly to the millisecond-plus
> >> > range).
> >>
> >> You can instrument irq_enter() and irq_exit() to read TSC and
> >> find out the
> >> distribution of irq handling times for interruptions that Xen
> >> knows about.
> >> Otherwise who knows, it could even be system management
> mode on that
> >> particular box.
> >>
> >>  -- Keir
> >>
> >>
> >>
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >lists.xensource.com/xen-devel
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-14 02:42 UTC

head link

RE: [Xen-devel] million cycle interrupt

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月14日 10:34
>
>> handler, why not take one more step to measure every handler
>
>Great idea!  I added a max_cycles field to irq_desc_t
>and check/update it at every interrupt in do_IRQ,
>then print max_cycles in "xm debug-key i",
>including a "max max".
>
>I'm not entirely sure how to interpret the output
>from dump_irqs() but the only two IRQ's with
>type==PCI-MSI have a "large"
>max (450044 cycles and 730972 cycles).  The third is
>Vec240, an IO-APIC-edge interrupt that maxes at 1047500
>cycles.   No other interrupt has a max exceeding 10000.
>
>Here's the relevant output.  How do I map this to
>something meaningful?
One immediate trick is to print handler address for problematic
vector, and then search its name in your dumpped Xen symbol
file.

Thanks,
Kevin
>
>(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010 
>max_cycles=450044 in-flight=0 domain-list=0:254(----),
>
>(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010 
>max_cycles=730972 in-flight=0 domain-list=0:255(----),
Above two look like owned by dom0.
>
>(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000 
>max_cycles=1047500 mapped, unbound

looks like PIT, but not quite sure.
>
>(XEN) max_max_cycles = 1047500
>
>
>
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> Sent: Monday, April 13, 2009 5:19 PM
>> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>>
>>
>> >From: Dan Magenheimer
>> >Sent: 2009年4月14日 5:15
>> >
>> >> You can instrument irq_enter() and irq_exit() to read TSC
>> >
>> >Rather than do this generically and ensure I get all the macros
>> >correct (e.g. per_cpu, nesting) I manually instrumented three
>> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one
>> >in smp_call_function().  ALL of them show an issue with max
>> >readings in the 300K-1M range... with smp_call_function showing
>> >the lowest max and the second in do_IRQ (the non-guest one)
>> >showing readings over 1M (and the guest one at about 800K).
>>
>> Since you already reach this step around calling actual action's
>> handler, why not take one more step to measure every handler
>> (serial, apic, vtd, ...)? You can first simply print which handlers
>> are registered or invoked on your platform. If only one handler
>> is experienced with abnormal high latency, it's possibly one
>> specific point. Or else you can suspect on some common code
>> shared by all handlers, or ... as Keir said, it could be SMM. :-)
>>
>> Thanks,
>> Kevin
>>
>> >
>> >Interestingly, I get no readings at all over 60K when I
>> >recompile with max_phys_cpus=4 (and with nosmp) on my
>> >quad-core-by-two-thread machine.  This is versus several
>> >readings over 60K nearly every second when max_phys_cpus=8.
>> >
>> >> Otherwise who knows, it could even be system management mode
>> >
>> >I suppose measuring irq_enter/exist pairs still don't rule
>> >this out.  But the "large" interrupts don't seem to
happen
>> >(at least not nearly as frequently) with fewer physical
>> >processors enabled, so sys mgmt mode seems unlikely.
>> >
>> >Anyway, still a probable problem, still mostly a mystery
>> >as to what is actually happening.  And, repeat, this has
>> >nothing to do with tmem... I'm just observing it using
>> >tmem as a convenient measurement tool.
>> >
>> >> -----Original Message-----
>> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> >> Sent: Monday, April 13, 2009 2:24 AM
>> >> To: Dan Magenheimer; Xen-Devel (E-mail)
>> >> Subject: Re: [Xen-devel] million cycle interrupt
>> >>
>> >>
>> >> On 12/04/2009 21:16, "Dan Magenheimer"
>> >> <dan.magenheimer@oracle.com> wrote:
>> >>
>> >> > Is a million cycles in an interrupt handler bad?  Any
idea what
>> >> > might be consuming this?  The evidence might imply more
cpus
>> >> > means longer interrupt, which bodes poorly for larger
machines.
>> >> > I tried disabling the timer rendezvous code (not positive
I
>> >> > was successful), but still got large measurements, and
>> >> > eventually the machine froze up (but not before I
observed
>> >> > the stime skew climbing quickly to the millisecond-plus
>> >> > range).
>> >>
>> >> You can instrument irq_enter() and irq_exit() to read TSC and
>> >> find out the
>> >> distribution of irq handling times for interruptions that Xen
>> >> knows about.
>> >> Otherwise who knows, it could even be system management
>> mode on that
>> >> particular box.
>> >>
>> >>  -- Keir
>> >>
>> >>
>> >>
>> >
>> >_______________________________________________
>> >Xen-devel mailing list
>> >Xen-devel@lists.xensource.com
>> >lists.xensource.com/xen-devel
>> >
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 02:58 UTC

head link

RE: [Xen-devel] million cycle interrupt

Some more interesting data that echos my earlier
measurements:

Booting with maxcpus=4, the three "large" interrupts
are all about 20000 cycles.  Booting with maxcpus=6,
the three are at 146896, 183260, and 240959 cycles.

So it definitely appears to get worse as maxcpus
gets larger.

I also wonder if it might be related to SMT?
> One immediate trick is to print handler address for problematic
I''ll give that a try, but probably not until tomorrow.
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, April 13, 2009 8:42 PM
> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
>
>
> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >Sent: 2009年4月14日 10:34
> >
> >> handler, why not take one more step to measure every handler
> >
> >Great idea!  I added a max_cycles field to irq_desc_t
> >and check/update it at every interrupt in do_IRQ,
> >then print max_cycles in "xm debug-key i",
> >including a "max max".
> >
> >I''m not entirely sure how to interpret the output
> >from dump_irqs() but the only two IRQ''s with
> >type==PCI-MSI have a "large"
> >max (450044 cycles and 730972 cycles).  The third is
> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500
> >cycles.   No other interrupt has a max exceeding 10000.
> >
> >Here''s the relevant output.  How do I map this to
> >something meaningful?
>
> One immediate trick is to print handler address for problematic
> vector, and then search its name in your dumpped Xen symbol
> file.
>
> Thanks,
> Kevin
>
> >
> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010
> >max_cycles=450044 in-flight=0 domain-list=0:254(----),
> >
> >(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010
> >max_cycles=730972 in-flight=0 domain-list=0:255(----),
>
> Above two look like owned by dom0.
>
> >
> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000
> >max_cycles=1047500 mapped, unbound
>
>
> looks like PIT, but not quite sure.
>
> >
> >(XEN) max_max_cycles = 1047500
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> Sent: Monday, April 13, 2009 5:19 PM
> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> >> Subject: RE: [Xen-devel] million cycle interrupt
> >>
> >>
> >> >From: Dan Magenheimer
> >> >Sent: 2009年4月14日 5:15
> >> >
> >> >> You can instrument irq_enter() and irq_exit() to read TSC
> >> >
> >> >Rather than do this generically and ensure I get all the
macros
> >> >correct (e.g. per_cpu, nesting) I manually instrumented three
> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one
> >> >in smp_call_function().  ALL of them show an issue with max
> >> >readings in the 300K-1M range... with smp_call_function
showing
> >> >the lowest max and the second in do_IRQ (the non-guest one)
> >> >showing readings over 1M (and the guest one at about 800K).
> >>
> >> Since you already reach this step around calling actual
action''s
> >> handler, why not take one more step to measure every handler
> >> (serial, apic, vtd, ...)? You can first simply print which
handlers
> >> are registered or invoked on your platform. If only one handler
> >> is experienced with abnormal high latency, it''s possibly
one
> >> specific point. Or else you can suspect on some common code
> >> shared by all handlers, or ... as Keir said, it could be SMM. :-)
> >>
> >> Thanks,
> >> Kevin
> >>
> >> >
> >> >Interestingly, I get no readings at all over 60K when I
> >> >recompile with max_phys_cpus=4 (and with nosmp) on my
> >> >quad-core-by-two-thread machine.  This is versus several
> >> >readings over 60K nearly every second when max_phys_cpus=8.
> >> >
> >> >> Otherwise who knows, it could even be system management
mode
> >> >
> >> >I suppose measuring irq_enter/exist pairs still don''t
rule
> >> >this out.  But the "large" interrupts don''t
seem to happen
> >> >(at least not nearly as frequently) with fewer physical
> >> >processors enabled, so sys mgmt mode seems unlikely.
> >> >
> >> >Anyway, still a probable problem, still mostly a mystery
> >> >as to what is actually happening.  And, repeat, this has
> >> >nothing to do with tmem... I''m just observing it
using
> >> >tmem as a convenient measurement tool.
> >> >
> >> >> -----Original Message-----
> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> >> Sent: Monday, April 13, 2009 2:24 AM
> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
> >> >> Subject: Re: [Xen-devel] million cycle interrupt
> >> >>
> >> >>
> >> >> On 12/04/2009 21:16, "Dan Magenheimer"
> >> >> <dan.magenheimer@oracle.com> wrote:
> >> >>
> >> >> > Is a million cycles in an interrupt handler bad? 
Any
> idea what
> >> >> > might be consuming this?  The evidence might imply
more cpus
> >> >> > means longer interrupt, which bodes poorly for
larger
> machines.
> >> >> > I tried disabling the timer rendezvous code (not
positive I
> >> >> > was successful), but still got large measurements,
and
> >> >> > eventually the machine froze up (but not before I
observed
> >> >> > the stime skew climbing quickly to the
millisecond-plus
> >> >> > range).
> >> >>
> >> >> You can instrument irq_enter() and irq_exit() to read TSC
and
> >> >> find out the
> >> >> distribution of irq handling times for interruptions that
Xen
> >> >> knows about.
> >> >> Otherwise who knows, it could even be system management
> >> mode on that
> >> >> particular box.
> >> >>
> >> >>  -- Keir
> >> >>
> >> >>
> >> >>
> >> >
> >> >_______________________________________________
> >> >Xen-devel mailing list
> >> >Xen-devel@lists.xensource.com
> >> >lists.xensource.com/xen-devel
> >> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-14 03:04 UTC

head link

RE: [Xen-devel] million cycle interrupt

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月14日 10:59
>
>Some more interesting data that echos my earlier
>measurements:
>
>Booting with maxcpus=4, the three "large" interrupts
>are all about 20000 cycles.  Booting with maxcpus=6,
>the three are at 146896, 183260, and 240959 cycles.
>
>So it definitely appears to get worse as maxcpus
>gets larger.
lock contention?
>
>I also wonder if it might be related to SMT?
If it's really caused by SMT, it's more likely to be software
caused resource contention due to doubled logical processors. 
At hardware level although cache is shared, ISRs in Xen are 
normally short and I don't think such high factor could come
from there...

Thanks,
Kevin
>
>> One immediate trick is to print handler address for problematic
>
>I'll give that a try, but probably not until tomorrow.
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> Sent: Monday, April 13, 2009 8:42 PM
>> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>>
>>
>> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>> >Sent: 2009年4月14日 10:34
>> >
>> >> handler, why not take one more step to measure every handler
>> >
>> >Great idea!  I added a max_cycles field to irq_desc_t
>> >and check/update it at every interrupt in do_IRQ,
>> >then print max_cycles in "xm debug-key i",
>> >including a "max max".
>> >
>> >I'm not entirely sure how to interpret the output
>> >from dump_irqs() but the only two IRQ's with
>> >type==PCI-MSI have a "large"
>> >max (450044 cycles and 730972 cycles).  The third is
>> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500
>> >cycles.   No other interrupt has a max exceeding 10000.
>> >
>> >Here's the relevant output.  How do I map this to
>> >something meaningful?
>>
>> One immediate trick is to print handler address for problematic
>> vector, and then search its name in your dumpped Xen symbol
>> file.
>>
>> Thanks,
>> Kevin
>>
>> >
>> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010
>> >max_cycles=450044 in-flight=0 domain-list=0:254(----),
>> >
>> >(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010
>> >max_cycles=730972 in-flight=0 domain-list=0:255(----),
>>
>> Above two look like owned by dom0.
>>
>> >
>> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000
>> >max_cycles=1047500 mapped, unbound
>>
>>
>> looks like PIT, but not quite sure.
>>
>> >
>> >(XEN) max_max_cycles = 1047500
>> >
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> >> Sent: Monday, April 13, 2009 5:19 PM
>> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> >> Subject: RE: [Xen-devel] million cycle interrupt
>> >>
>> >>
>> >> >From: Dan Magenheimer
>> >> >Sent: 2009年4月14日 5:15
>> >> >
>> >> >> You can instrument irq_enter() and irq_exit() to read
TSC
>> >> >
>> >> >Rather than do this generically and ensure I get all the
macros
>> >> >correct (e.g. per_cpu, nesting) I manually instrumented
three
>> >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and
one
>> >> >in smp_call_function().  ALL of them show an issue with
max
>> >> >readings in the 300K-1M range... with smp_call_function
showing
>> >> >the lowest max and the second in do_IRQ (the non-guest
one)
>> >> >showing readings over 1M (and the guest one at about
800K).
>> >>
>> >> Since you already reach this step around calling actual
action's
>> >> handler, why not take one more step to measure every handler
>> >> (serial, apic, vtd, ...)? You can first simply print 
>which handlers
>> >> are registered or invoked on your platform. If only one
handler
>> >> is experienced with abnormal high latency, it's possibly
one
>> >> specific point. Or else you can suspect on some common code
>> >> shared by all handlers, or ... as Keir said, it could be SMM.
:-)
>> >>
>> >> Thanks,
>> >> Kevin
>> >>
>> >> >
>> >> >Interestingly, I get no readings at all over 60K when I
>> >> >recompile with max_phys_cpus=4 (and with nosmp) on my
>> >> >quad-core-by-two-thread machine.  This is versus several
>> >> >readings over 60K nearly every second when
max_phys_cpus=8.
>> >> >
>> >> >> Otherwise who knows, it could even be system
management mode
>> >> >
>> >> >I suppose measuring irq_enter/exist pairs still don't
rule
>> >> >this out.  But the "large" interrupts don't
seem to happen
>> >> >(at least not nearly as frequently) with fewer physical
>> >> >processors enabled, so sys mgmt mode seems unlikely.
>> >> >
>> >> >Anyway, still a probable problem, still mostly a mystery
>> >> >as to what is actually happening.  And, repeat, this has
>> >> >nothing to do with tmem... I'm just observing it using
>> >> >tmem as a convenient measurement tool.
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> >> >> Sent: Monday, April 13, 2009 2:24 AM
>> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
>> >> >> Subject: Re: [Xen-devel] million cycle interrupt
>> >> >>
>> >> >>
>> >> >> On 12/04/2009 21:16, "Dan Magenheimer"
>> >> >> <dan.magenheimer@oracle.com> wrote:
>> >> >>
>> >> >> > Is a million cycles in an interrupt handler bad?
Any
>> idea what
>> >> >> > might be consuming this?  The evidence might
imply more cpus
>> >> >> > means longer interrupt, which bodes poorly for
larger
>> machines.
>> >> >> > I tried disabling the timer rendezvous code (not
positive I
>> >> >> > was successful), but still got large
measurements, and
>> >> >> > eventually the machine froze up (but not before
I observed
>> >> >> > the stime skew climbing quickly to the
millisecond-plus
>> >> >> > range).
>> >> >>
>> >> >> You can instrument irq_enter() and irq_exit() to read
TSC and
>> >> >> find out the
>> >> >> distribution of irq handling times for interruptions
that Xen
>> >> >> knows about.
>> >> >> Otherwise who knows, it could even be system
management
>> >> mode on that
>> >> >> particular box.
>> >> >>
>> >> >>  -- Keir
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >_______________________________________________
>> >> >Xen-devel mailing list
>> >> >Xen-devel@lists.xensource.com
>> >> >lists.xensource.com/xen-devel
>> >> >
>> >
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 04:23 UTC

head link

RE: [Xen-devel] million cycle interrupt

The IO-APIC-edge interrupt action is timer_interrupt.

The MSI interrupt action is showing up strange
(2010001)... I''ll need to look into this more.
> lock contention?
Possible, though the consistent max numbers make it
seem unlikely.
> If it''s really caused by SMT, it''s more likely to be
software
> caused resource contention due to doubled logical processors.
You mentioned PIT earlier... I wonder if PIT can be
read in parallel if read from different cores, but
if read from different threads, it must be serialized?
A million cycles is probably about right for a platform
timer read, true?
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, April 13, 2009 9:05 PM
> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
>
>
> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >Sent: 2009年4月14日 10:59
> >
> >Some more interesting data that echos my earlier
> >measurements:
> >
> >Booting with maxcpus=4, the three "large" interrupts
> >are all about 20000 cycles.  Booting with maxcpus=6,
> >the three are at 146896, 183260, and 240959 cycles.
> >
> >So it definitely appears to get worse as maxcpus
> >gets larger.
>
> lock contention?
>
> >
> >I also wonder if it might be related to SMT?
>
> If it''s really caused by SMT, it''s more likely to be
software
> caused resource contention due to doubled logical processors.
> At hardware level although cache is shared, ISRs in Xen are
> normally short and I don''t think such high factor could come
> from there...
>
> Thanks,
> Kevin
>
> >
> >> One immediate trick is to print handler address for problematic
> >
> >I''ll give that a try, but probably not until tomorrow.
> >
> >> -----Original Message-----
> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> Sent: Monday, April 13, 2009 8:42 PM
> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> >> Subject: RE: [Xen-devel] million cycle interrupt
> >>
> >>
> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >> >Sent: 2009年4月14日 10:34
> >> >
> >> >> handler, why not take one more step to measure every
handler
> >> >
> >> >Great idea!  I added a max_cycles field to irq_desc_t
> >> >and check/update it at every interrupt in do_IRQ,
> >> >then print max_cycles in "xm debug-key i",
> >> >including a "max max".
> >> >
> >> >I''m not entirely sure how to interpret the output
> >> >from dump_irqs() but the only two IRQ''s with
> >> >type==PCI-MSI have a "large"
> >> >max (450044 cycles and 730972 cycles).  The third is
> >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500
> >> >cycles.   No other interrupt has a max exceeding 10000.
> >> >
> >> >Here''s the relevant output.  How do I map this to
> >> >something meaningful?
> >>
> >> One immediate trick is to print handler address for problematic
> >> vector, and then search its name in your dumpped Xen symbol
> >> file.
> >>
> >> Thanks,
> >> Kevin
> >>
> >> >
> >> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI         status=00000010
> >> >max_cycles=450044 in-flight=0 domain-list=0:254(----),
> >> >
> >> >(XEN)    Vec208 IRQ -1: type=PCI-MSI         status=00000010
> >> >max_cycles=730972 in-flight=0 domain-list=0:255(----),
> >>
> >> Above two look like owned by dom0.
> >>
> >> >
> >> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge    status=00000000
> >> >max_cycles=1047500 mapped, unbound
> >>
> >>
> >> looks like PIT, but not quite sure.
> >>
> >> >
> >> >(XEN) max_max_cycles = 1047500
> >> >
> >> >
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> >> Sent: Monday, April 13, 2009 5:19 PM
> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> >> >> Subject: RE: [Xen-devel] million cycle interrupt
> >> >>
> >> >>
> >> >> >From: Dan Magenheimer
> >> >> >Sent: 2009年4月14日 5:15
> >> >> >
> >> >> >> You can instrument irq_enter() and irq_exit() to
read TSC
> >> >> >
> >> >> >Rather than do this generically and ensure I get all
the macros
> >> >> >correct (e.g. per_cpu, nesting) I manually
instrumented three
> >> >> >likely suspect irq_enter/exit pairs, two in do_IRQ()
and one
> >> >> >in smp_call_function().  ALL of them show an issue
with max
> >> >> >readings in the 300K-1M range... with
smp_call_function showing
> >> >> >the lowest max and the second in do_IRQ (the
non-guest one)
> >> >> >showing readings over 1M (and the guest one at about
800K).
> >> >>
> >> >> Since you already reach this step around calling actual
action''s
> >> >> handler, why not take one more step to measure every
handler
> >> >> (serial, apic, vtd, ...)? You can first simply print
> >which handlers
> >> >> are registered or invoked on your platform. If only one
handler
> >> >> is experienced with abnormal high latency, it''s
possibly one
> >> >> specific point. Or else you can suspect on some common
code
> >> >> shared by all handlers, or ... as Keir said, it could
> be SMM. :-)
> >> >>
> >> >> Thanks,
> >> >> Kevin
> >> >>
> >> >> >
> >> >> >Interestingly, I get no readings at all over 60K when
I
> >> >> >recompile with max_phys_cpus=4 (and with nosmp) on my
> >> >> >quad-core-by-two-thread machine.  This is versus
several
> >> >> >readings over 60K nearly every second when
max_phys_cpus=8.
> >> >> >
> >> >> >> Otherwise who knows, it could even be system
management mode
> >> >> >
> >> >> >I suppose measuring irq_enter/exist pairs still
don''t rule
> >> >> >this out.  But the "large" interrupts
don''t seem to happen
> >> >> >(at least not nearly as frequently) with fewer
physical
> >> >> >processors enabled, so sys mgmt mode seems unlikely.
> >> >> >
> >> >> >Anyway, still a probable problem, still mostly a
mystery
> >> >> >as to what is actually happening.  And, repeat, this
has
> >> >> >nothing to do with tmem... I''m just
observing it using
> >> >> >tmem as a convenient measurement tool.
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Keir Fraser
[mailto:keir.fraser@eu.citrix.com]
> >> >> >> Sent: Monday, April 13, 2009 2:24 AM
> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
> >> >> >> Subject: Re: [Xen-devel] million cycle interrupt
> >> >> >>
> >> >> >>
> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer"
> >> >> >> <dan.magenheimer@oracle.com> wrote:
> >> >> >>
> >> >> >> > Is a million cycles in an interrupt handler
bad?  Any
> >> idea what
> >> >> >> > might be consuming this?  The evidence
might imply
> more cpus
> >> >> >> > means longer interrupt, which bodes poorly
for larger
> >> machines.
> >> >> >> > I tried disabling the timer rendezvous code
(not positive I
> >> >> >> > was successful), but still got large
measurements, and
> >> >> >> > eventually the machine froze up (but not
before I observed
> >> >> >> > the stime skew climbing quickly to the
millisecond-plus
> >> >> >> > range).
> >> >> >>
> >> >> >> You can instrument irq_enter() and irq_exit() to
read TSC and
> >> >> >> find out the
> >> >> >> distribution of irq handling times for
interruptions that Xen
> >> >> >> knows about.
> >> >> >> Otherwise who knows, it could even be system
management
> >> >> mode on that
> >> >> >> particular box.
> >> >> >>
> >> >> >>  -- Keir
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >_______________________________________________
> >> >> >Xen-devel mailing list
> >> >> >Xen-devel@lists.xensource.com
> >> >> >lists.xensource.com/xen-devel
> >> >> >
> >> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-14 04:45 UTC

head link

RE: [Xen-devel] million cycle interrupt

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月14日 12:24
>
>The IO-APIC-edge interrupt action is timer_interrupt.
this could be either driven by PIT or HPET (in legacy replacement
mode), and activated when PIT is using as a platform timer or
when PIT/HPET is used to broadcast wakeup event to cpus in 
deep C-states. The logic in timer_interrupt is very simple, and 
maybe you could again go one more step tracing into that function,
e.g. the block protected by pit_lock. :-)
>
>The MSI interrupt action is showing up strange
>(2010001)... I'll need to look into this more.
It's reasonable as action is allocated by xmalloc for those devices
owned by guest. I guess you find two MSI instances in 
__do_IRQ_guest path? If yes, the logic to handle pending eoi may
be suspected. Then you may use "ioapic_ack=old" to verify.

From information by far, it looks like the weird high cost could be
from individual handler implementation, or else you should be able
to observe in every handler (such as serial).
>
>> lock contention?
>
>Possible, though the consistent max numbers make it
>seem unlikely.
>
>> If it's really caused by SMT, it's more likely to be software
>> caused resource contention due to doubled logical processors.
>
>You mentioned PIT earlier... I wonder if PIT can be
>read in parallel if read from different cores, but
>if read from different threads, it must be serialized?
>A million cycles is probably about right for a platform
>timer read, true?
It looks too large to me.

Thanks,
Kevin
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> Sent: Monday, April 13, 2009 9:05 PM
>> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> Subject: RE: [Xen-devel] million cycle interrupt
>>
>>
>> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>> >Sent: 2009年4月14日 10:59
>> >
>> >Some more interesting data that echos my earlier
>> >measurements:
>> >
>> >Booting with maxcpus=4, the three "large" interrupts
>> >are all about 20000 cycles.  Booting with maxcpus=6,
>> >the three are at 146896, 183260, and 240959 cycles.
>> >
>> >So it definitely appears to get worse as maxcpus
>> >gets larger.
>>
>> lock contention?
>>
>> >
>> >I also wonder if it might be related to SMT?
>>
>> If it's really caused by SMT, it's more likely to be software
>> caused resource contention due to doubled logical processors.
>> At hardware level although cache is shared, ISRs in Xen are
>> normally short and I don't think such high factor could come
>> from there...
>>
>> Thanks,
>> Kevin
>>
>> >
>> >> One immediate trick is to print handler address for
problematic
>> >
>> >I'll give that a try, but probably not until tomorrow.
>> >
>> >> -----Original Message-----
>> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> >> Sent: Monday, April 13, 2009 8:42 PM
>> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> >> Subject: RE: [Xen-devel] million cycle interrupt
>> >>
>> >>
>> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>> >> >Sent: 2009年4月14日 10:34
>> >> >
>> >> >> handler, why not take one more step to measure every
handler
>> >> >
>> >> >Great idea!  I added a max_cycles field to irq_desc_t
>> >> >and check/update it at every interrupt in do_IRQ,
>> >> >then print max_cycles in "xm debug-key i",
>> >> >including a "max max".
>> >> >
>> >> >I'm not entirely sure how to interpret the output
>> >> >from dump_irqs() but the only two IRQ's with
>> >> >type==PCI-MSI have a "large"
>> >> >max (450044 cycles and 730972 cycles).  The third is
>> >> >Vec240, an IO-APIC-edge interrupt that maxes at 1047500
>> >> >cycles.   No other interrupt has a max exceeding 10000.
>> >> >
>> >> >Here's the relevant output.  How do I map this to
>> >> >something meaningful?
>> >>
>> >> One immediate trick is to print handler address for
problematic
>> >> vector, and then search its name in your dumpped Xen symbol
>> >> file.
>> >>
>> >> Thanks,
>> >> Kevin
>> >>
>> >> >
>> >> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI        
status=00000010
>> >> >max_cycles=450044 in-flight=0 domain-list=0:254(----),
>> >> >
>> >> >(XEN)    Vec208 IRQ -1: type=PCI-MSI        
status=00000010
>> >> >max_cycles=730972 in-flight=0 domain-list=0:255(----),
>> >>
>> >> Above two look like owned by dom0.
>> >>
>> >> >
>> >> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge   
status=00000000
>> >> >max_cycles=1047500 mapped, unbound
>> >>
>> >>
>> >> looks like PIT, but not quite sure.
>> >>
>> >> >
>> >> >(XEN) max_max_cycles = 1047500
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> >> >> Sent: Monday, April 13, 2009 5:19 PM
>> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
>> >> >> Subject: RE: [Xen-devel] million cycle interrupt
>> >> >>
>> >> >>
>> >> >> >From: Dan Magenheimer
>> >> >> >Sent: 2009年4月14日 5:15
>> >> >> >
>> >> >> >> You can instrument irq_enter() and
irq_exit() to read TSC
>> >> >> >
>> >> >> >Rather than do this generically and ensure I get
all
>the macros
>> >> >> >correct (e.g. per_cpu, nesting) I manually
instrumented three
>> >> >> >likely suspect irq_enter/exit pairs, two in
do_IRQ() and one
>> >> >> >in smp_call_function().  ALL of them show an
issue with max
>> >> >> >readings in the 300K-1M range... with 
>smp_call_function showing
>> >> >> >the lowest max and the second in do_IRQ (the
non-guest one)
>> >> >> >showing readings over 1M (and the guest one at
about 800K).
>> >> >>
>> >> >> Since you already reach this step around calling 
>actual action's
>> >> >> handler, why not take one more step to measure every
handler
>> >> >> (serial, apic, vtd, ...)? You can first simply print
>> >which handlers
>> >> >> are registered or invoked on your platform. If only
one handler
>> >> >> is experienced with abnormal high latency, it's
possibly one
>> >> >> specific point. Or else you can suspect on some
common code
>> >> >> shared by all handlers, or ... as Keir said, it could
>> be SMM. :-)
>> >> >>
>> >> >> Thanks,
>> >> >> Kevin
>> >> >>
>> >> >> >
>> >> >> >Interestingly, I get no readings at all over 60K
when I
>> >> >> >recompile with max_phys_cpus=4 (and with nosmp)
on my
>> >> >> >quad-core-by-two-thread machine.  This is versus
several
>> >> >> >readings over 60K nearly every second when
max_phys_cpus=8.
>> >> >> >
>> >> >> >> Otherwise who knows, it could even be system
management mode
>> >> >> >
>> >> >> >I suppose measuring irq_enter/exist pairs still
don't rule
>> >> >> >this out.  But the "large" interrupts
don't seem to happen
>> >> >> >(at least not nearly as frequently) with fewer
physical
>> >> >> >processors enabled, so sys mgmt mode seems
unlikely.
>> >> >> >
>> >> >> >Anyway, still a probable problem, still mostly a
mystery
>> >> >> >as to what is actually happening.  And, repeat,
this has
>> >> >> >nothing to do with tmem... I'm just observing
it using
>> >> >> >tmem as a convenient measurement tool.
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: Keir Fraser
[mailto:keir.fraser@eu.citrix.com]
>> >> >> >> Sent: Monday, April 13, 2009 2:24 AM
>> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
>> >> >> >> Subject: Re: [Xen-devel] million cycle
interrupt
>> >> >> >>
>> >> >> >>
>> >> >> >> On 12/04/2009 21:16, "Dan
Magenheimer"
>> >> >> >> <dan.magenheimer@oracle.com> wrote:
>> >> >> >>
>> >> >> >> > Is a million cycles in an interrupt
handler bad?  Any
>> >> idea what
>> >> >> >> > might be consuming this?  The evidence
might imply
>> more cpus
>> >> >> >> > means longer interrupt, which bodes
poorly for larger
>> >> machines.
>> >> >> >> > I tried disabling the timer rendezvous
code (not
>positive I
>> >> >> >> > was successful), but still got large
measurements, and
>> >> >> >> > eventually the machine froze up (but
not before I observed
>> >> >> >> > the stime skew climbing quickly to the
millisecond-plus
>> >> >> >> > range).
>> >> >> >>
>> >> >> >> You can instrument irq_enter() and
irq_exit() to
>read TSC and
>> >> >> >> find out the
>> >> >> >> distribution of irq handling times for 
>interruptions that Xen
>> >> >> >> knows about.
>> >> >> >> Otherwise who knows, it could even be system
management
>> >> >> mode on that
>> >> >> >> particular box.
>> >> >> >>
>> >> >> >>  -- Keir
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >_______________________________________________
>> >> >> >Xen-devel mailing list
>> >> >> >Xen-devel@lists.xensource.com
>> >> >> >lists.xensource.com/xen-devel
>> >> >> >
>> >> >
>> >
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2009-Apr-14 06:30 UTC

head link

Re: [Xen-devel] million cycle interrupt

On 14/04/2009 05:23, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> If it''s really caused by SMT, it''s more likely to be
software
>> caused resource contention due to doubled logical processors.
> 
> You mentioned PIT earlier... I wonder if PIT can be
> read in parallel if read from different cores, but
> if read from different threads, it must be serialized?
> A million cycles is probably about right for a platform
> timer read, true?
You could validate that quite easily by adding your own timer read wrapped
in TSC reads. Actually a PIT read, even though it requires multiple accesses
over the ISA bus, should take less than 10us.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 14:49 UTC

head link

RE: [Xen-devel] million cycle interrupt

> be suspected. Then you may use "ioapic_ack=old" to verify.
FYI ioapic_ack=old doesn''t seem to help

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Monday, April 13, 2009 10:45 PM
> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
>
>
> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >Sent: 2009年4月14日 12:24
> >
> >The IO-APIC-edge interrupt action is timer_interrupt.
>
> this could be either driven by PIT or HPET (in legacy replacement
> mode), and activated when PIT is using as a platform timer or
> when PIT/HPET is used to broadcast wakeup event to cpus in
> deep C-states. The logic in timer_interrupt is very simple, and
> maybe you could again go one more step tracing into that function,
> e.g. the block protected by pit_lock. :-)
>
> >
> >The MSI interrupt action is showing up strange
> >(2010001)... I''ll need to look into this more.
>
> It''s reasonable as action is allocated by xmalloc for those
devices
> owned by guest. I guess you find two MSI instances in
> __do_IRQ_guest path? If yes, the logic to handle pending eoi may
> be suspected. Then you may use "ioapic_ack=old" to verify.
>
> From information by far, it looks like the weird high cost could be
> from individual handler implementation, or else you should be able
> to observe in every handler (such as serial).
>
> >
> >> lock contention?
> >
> >Possible, though the consistent max numbers make it
> >seem unlikely.
> >
> >> If it''s really caused by SMT, it''s more likely
to be software
> >> caused resource contention due to doubled logical processors.
> >
> >You mentioned PIT earlier... I wonder if PIT can be
> >read in parallel if read from different cores, but
> >if read from different threads, it must be serialized?
> >A million cycles is probably about right for a platform
> >timer read, true?
>
> It looks too large to me.
>
> Thanks,
> Kevin
>
> >
> >> -----Original Message-----
> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> Sent: Monday, April 13, 2009 9:05 PM
> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> >> Subject: RE: [Xen-devel] million cycle interrupt
> >>
> >>
> >> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >> >Sent: 2009年4月14日 10:59
> >> >
> >> >Some more interesting data that echos my earlier
> >> >measurements:
> >> >
> >> >Booting with maxcpus=4, the three "large" interrupts
> >> >are all about 20000 cycles.  Booting with maxcpus=6,
> >> >the three are at 146896, 183260, and 240959 cycles.
> >> >
> >> >So it definitely appears to get worse as maxcpus
> >> >gets larger.
> >>
> >> lock contention?
> >>
> >> >
> >> >I also wonder if it might be related to SMT?
> >>
> >> If it''s really caused by SMT, it''s more likely
to be software
> >> caused resource contention due to doubled logical processors.
> >> At hardware level although cache is shared, ISRs in Xen are
> >> normally short and I don''t think such high factor could
come
> >> from there...
> >>
> >> Thanks,
> >> Kevin
> >>
> >> >
> >> >> One immediate trick is to print handler address for
problematic
> >> >
> >> >I''ll give that a try, but probably not until
tomorrow.
> >> >
> >> >> -----Original Message-----
> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> >> Sent: Monday, April 13, 2009 8:42 PM
> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail)
> >> >> Subject: RE: [Xen-devel] million cycle interrupt
> >> >>
> >> >>
> >> >> >From: Dan Magenheimer
[mailto:dan.magenheimer@oracle.com]
> >> >> >Sent: 2009年4月14日 10:34
> >> >> >
> >> >> >> handler, why not take one more step to measure
every handler
> >> >> >
> >> >> >Great idea!  I added a max_cycles field to irq_desc_t
> >> >> >and check/update it at every interrupt in do_IRQ,
> >> >> >then print max_cycles in "xm debug-key i",
> >> >> >including a "max max".
> >> >> >
> >> >> >I''m not entirely sure how to interpret the
output
> >> >> >from dump_irqs() but the only two IRQ''s with
> >> >> >type==PCI-MSI have a "large"
> >> >> >max (450044 cycles and 730972 cycles).  The third is
> >> >> >Vec240, an IO-APIC-edge interrupt that maxes at
1047500
> >> >> >cycles.   No other interrupt has a max exceeding
10000.
> >> >> >
> >> >> >Here''s the relevant output.  How do I map
this to
> >> >> >something meaningful?
> >> >>
> >> >> One immediate trick is to print handler address for
problematic
> >> >> vector, and then search its name in your dumpped Xen
symbol
> >> >> file.
> >> >>
> >> >> Thanks,
> >> >> Kevin
> >> >>
> >> >> >
> >> >> >(XEN)    Vec 49 IRQ -1: type=PCI-MSI        
status=00000010
> >> >> >max_cycles=450044 in-flight=0
domain-list=0:254(----),
> >> >> >
> >> >> >(XEN)    Vec208 IRQ -1: type=PCI-MSI        
status=00000010
> >> >> >max_cycles=730972 in-flight=0
domain-list=0:255(----),
> >> >>
> >> >> Above two look like owned by dom0.
> >> >>
> >> >> >
> >> >> >(XEN)    Vec240 IRQ  0: type=IO-APIC-edge   
status=00000000
> >> >> >max_cycles=1047500 mapped, unbound
> >> >>
> >> >>
> >> >> looks like PIT, but not quite sure.
> >> >>
> >> >> >
> >> >> >(XEN) max_max_cycles = 1047500
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> >> >> Sent: Monday, April 13, 2009 5:19 PM
> >> >> >> To: Dan Magenheimer; Keir Fraser; Xen-Devel
(E-mail)
> >> >> >> Subject: RE: [Xen-devel] million cycle interrupt
> >> >> >>
> >> >> >>
> >> >> >> >From: Dan Magenheimer
> >> >> >> >Sent: 2009年4月14日 5:15
> >> >> >> >
> >> >> >> >> You can instrument irq_enter() and
irq_exit() to read TSC
> >> >> >> >
> >> >> >> >Rather than do this generically and ensure I
get all
> >the macros
> >> >> >> >correct (e.g. per_cpu, nesting) I manually
> instrumented three
> >> >> >> >likely suspect irq_enter/exit pairs, two in
do_IRQ() and one
> >> >> >> >in smp_call_function().  ALL of them show an
issue with max
> >> >> >> >readings in the 300K-1M range... with
> >smp_call_function showing
> >> >> >> >the lowest max and the second in do_IRQ (the
non-guest one)
> >> >> >> >showing readings over 1M (and the guest one
at about 800K).
> >> >> >>
> >> >> >> Since you already reach this step around calling
> >actual action''s
> >> >> >> handler, why not take one more step to measure
every handler
> >> >> >> (serial, apic, vtd, ...)? You can first simply
print
> >> >which handlers
> >> >> >> are registered or invoked on your platform. If
only
> one handler
> >> >> >> is experienced with abnormal high latency,
it''s possibly one
> >> >> >> specific point. Or else you can suspect on some
common code
> >> >> >> shared by all handlers, or ... as Keir said, it
could
> >> be SMM. :-)
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Kevin
> >> >> >>
> >> >> >> >
> >> >> >> >Interestingly, I get no readings at all over
60K when I
> >> >> >> >recompile with max_phys_cpus=4 (and with
nosmp) on my
> >> >> >> >quad-core-by-two-thread machine.  This is
versus several
> >> >> >> >readings over 60K nearly every second when
max_phys_cpus=8.
> >> >> >> >
> >> >> >> >> Otherwise who knows, it could even be
system
> management mode
> >> >> >> >
> >> >> >> >I suppose measuring irq_enter/exist pairs
still don''t rule
> >> >> >> >this out.  But the "large"
interrupts don''t seem to happen
> >> >> >> >(at least not nearly as frequently) with
fewer physical
> >> >> >> >processors enabled, so sys mgmt mode seems
unlikely.
> >> >> >> >
> >> >> >> >Anyway, still a probable problem, still
mostly a mystery
> >> >> >> >as to what is actually happening.  And,
repeat, this has
> >> >> >> >nothing to do with tmem... I''m just
observing it using
> >> >> >> >tmem as a convenient measurement tool.
> >> >> >> >
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Keir Fraser
[mailto:keir.fraser@eu.citrix.com]
> >> >> >> >> Sent: Monday, April 13, 2009 2:24 AM
> >> >> >> >> To: Dan Magenheimer; Xen-Devel (E-mail)
> >> >> >> >> Subject: Re: [Xen-devel] million cycle
interrupt
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On 12/04/2009 21:16, "Dan
Magenheimer"
> >> >> >> >> <dan.magenheimer@oracle.com>
wrote:
> >> >> >> >>
> >> >> >> >> > Is a million cycles in an
interrupt handler bad?  Any
> >> >> idea what
> >> >> >> >> > might be consuming this?  The
evidence might imply
> >> more cpus
> >> >> >> >> > means longer interrupt, which
bodes poorly for larger
> >> >> machines.
> >> >> >> >> > I tried disabling the timer
rendezvous code (not
> >positive I
> >> >> >> >> > was successful), but still got
large measurements, and
> >> >> >> >> > eventually the machine froze up
(but not before
> I observed
> >> >> >> >> > the stime skew climbing quickly to
the millisecond-plus
> >> >> >> >> > range).
> >> >> >> >>
> >> >> >> >> You can instrument irq_enter() and
irq_exit() to
> >read TSC and
> >> >> >> >> find out the
> >> >> >> >> distribution of irq handling times for
> >interruptions that Xen
> >> >> >> >> knows about.
> >> >> >> >> Otherwise who knows, it could even be
system management
> >> >> >> mode on that
> >> >> >> >> particular box.
> >> >> >> >>
> >> >> >> >>  -- Keir
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >>
>_______________________________________________
> >> >> >> >Xen-devel mailing list
> >> >> >> >Xen-devel@lists.xensource.com
> >> >> >> >lists.xensource.com/xen-devel
> >> >> >> >
> >> >> >
> >> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 15:38 UTC

head link

RE: [Xen-devel] million cycle interrupt

> You could validate that quite easily by adding your own timer 
> read wrapped
> in TSC reads. Actually a PIT read, even though it requires 
> multiple accesses
> over the ISA bus, should take less than 10us.
I''ll take a look at that next.  Taking average as
well as max, the timer_interrupt interrupt is
AVERAGING over 300K cycles (with 8 processors).
This interrupt is 100Hz, correct?  If so, that means
a full 1% of one processor''s cycles are spent
processing timer_interrupts!

The MSI interrupts have large max, but the average
is relatively small (under 5000 cycles). It still
would be nice to know what is causing the max
value, especially since it only occurs for me with
more than 4 processors.

But first, Keir, would you take this patch so that this
kind of issue can be more easily diagnosed in the future?
If you''re worried about the extra two global variable
reads, you could wrap an "#ifndef NDEBUG" around it
(with #else #define opt_measure_irqs 0), but it might
be nice to have this diagnostic capability in released
code.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-14 17:10 UTC

head link

RE: [Xen-devel] million cycle interrupt

> I''ll take a look at that next.
It appears that the call to smp_timer_broadcast_ipi()
in timer_interrupt() is the cycle hog.  And it
definitely appears to be a scaleability problem!

maxcpus=4: avg=1600, max=15000 (cycles, rounded)
maxcpus=5: avg=2000, max=24000
maxcpus=6: avg=83000, max=244000
maxcpus=7: avg=198000, max=780000
maxcpus=8: avg=310000, max=1027000

The load is a 4vcpu PV EL5u2 32-bit domain continually
compiling linux-2.6.28 with -j80.  I killed the load
after only a few minutes, so the max might get worse.
On the other hand, just booting dom0 seems to put
max in about the same range.

Dan
> -----Original Message-----
> From: Dan Magenheimer 
> Sent: Tuesday, April 14, 2009 9:39 AM
> To: Keir Fraser; Tian, Kevin; Xen-Devel (E-mail)
> Subject: RE: [Xen-devel] million cycle interrupt
> 
> 
> > You could validate that quite easily by adding your own timer 
> > read wrapped
> > in TSC reads. Actually a PIT read, even though it requires 
> > multiple accesses
> > over the ISA bus, should take less than 10us.
> 
> I''ll take a look at that next.  Taking average as
> well as max, the timer_interrupt interrupt is
> AVERAGING over 300K cycles (with 8 processors).
> This interrupt is 100Hz, correct?  If so, that means
> a full 1% of one processor''s cycles are spent
> processing timer_interrupts!
> 
> The MSI interrupts have large max, but the average
> is relatively small (under 5000 cycles). It still
> would be nice to know what is causing the max
> value, especially since it only occurs for me with
> more than 4 processors.
> 
> But first, Keir, would you take this patch so that this
> kind of issue can be more easily diagnosed in the future?
> If you''re worried about the extra two global variable
> reads, you could wrap an "#ifndef NDEBUG" around it
> (with #else #define opt_measure_irqs 0), but it might
> be nice to have this diagnostic capability in released
> code.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-14 23:19 UTC

head link

RE: [Xen-devel] million cycle interrupt

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月15日 1:11
>
>> I'll take a look at that next.
>
>It appears that the call to smp_timer_broadcast_ipi()
>in timer_interrupt() is the cycle hog.  And it
>definitely appears to be a scaleability problem!
So I guess processors on your platform at least support some
deep C-states (>=C3), or else that broadcast path should be
a nop, is it right? If you disable deep C-states in BIOS, I guess
it then scales well.
>
>maxcpus=4: avg=1600, max=15000 (cycles, rounded)
>maxcpus=5: avg=2000, max=24000
>maxcpus=6: avg=83000, max=244000
>maxcpus=7: avg=198000, max=780000
>maxcpus=8: avg=310000, max=1027000
This looks sth related to IPI overhead. When more processors
are available, more IPIs are required to contend the bus. But
I'm not sure whether the attitude of change fully comes from
that part. Which type of platform are you using? We'll also
take a try here to first reproduce and then find exact reason.
>
>The load is a 4vcpu PV EL5u2 32-bit domain continually
>compiling linux-2.6.28 with -j80.  I killed the load
>after only a few minutes, so the max might get worse.
>On the other hand, just booting dom0 seems to put
>max in about the same range.
And average is then very low w/o dom0 only?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Xen devel - Apr 2009 - million cycle interrupt

[Xen-devel] million cycle interrupt

Re: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

Re: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt

RE: [Xen-devel] million cycle interrupt