thr3ads.net - Xen devel - [Xen-devel] 10 million cycles disappearing [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2009-Apr-07 23:54 UTC

[Xen-devel] 10 million cycles disappearing

I''ve been seeing a possible performance problem off and on
and I''ve spent some time tracking it but haven''t made much
progress and have to give up for now, so I thought I''d at
least document what I know and see if it sounds familiar
to anyone.

The problem: Something in Xen seems to periodically take about
10M cycles.  I think it is an interrupt and I think it is
taking a lock related to memory allocation and holding
it for a LONG time (i.e. 10M cycles or close).

I am measuring inside a hypercall using TSC, taking a TSC
reading at entry to the hypercall code and at exit.  Xen
is not pre-emptive, so it can''t be switching context or
something, right?  Nearly all of the readings are less
than 100K cycles, but some samples are "huge" and
usually at 9M-10M cycles.  Since I am recording the max
difference between the TSCs, the max "huge" grows over
a long period of time, but eventually converges close
to 10M (and this is a 3Ghz processor).  I can see
it grow using "watch".  And I''ve NEVER seen a reading
over 10M.

I am able to disable interrupts and still take
measurements.  Roughly half of the measurements
occur when doing a hypercall-subop that does no
memory allocation and roughly half occur when doing
a hypercall-subop that DOES do memory allocation.
With interrupts disabled, the subop that DOES
memory allocation still asymptotically approaches
10M.  The one that does NOT do memory allocation,
stays relatively small.

I''m currently measuring on Xen 3.3.1 but I think I''ve
seen similar results on xen-unstable.  A single 2-vcpu
domain is running (in addition to domain0).

Does any of that sound familiar?  Any smoking guns?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-08 00:33 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

I remember that some previous post (from Jan?) solved some heavy
operation incurred in some special action of writable page able. Not 
recall the detail, but your description about subop with memory
allocation leads me to that part...

Thanks,
Kevin
>From: Dan Magenheimer
>Sent: 2009年4月8日 7:54
>
>I've been seeing a possible performance problem off and on
>and I've spent some time tracking it but haven't made much
>progress and have to give up for now, so I thought I'd at
>least document what I know and see if it sounds familiar
>to anyone.
>
>The problem: Something in Xen seems to periodically take about
>10M cycles.  I think it is an interrupt and I think it is
>taking a lock related to memory allocation and holding
>it for a LONG time (i.e. 10M cycles or close).
>
>I am measuring inside a hypercall using TSC, taking a TSC
>reading at entry to the hypercall code and at exit.  Xen
>is not pre-emptive, so it can't be switching context or
>something, right?  Nearly all of the readings are less
>than 100K cycles, but some samples are "huge" and
>usually at 9M-10M cycles.  Since I am recording the max
>difference between the TSCs, the max "huge" grows over
>a long period of time, but eventually converges close
>to 10M (and this is a 3Ghz processor).  I can see
>it grow using "watch".  And I've NEVER seen a reading
>over 10M.
>
>I am able to disable interrupts and still take
>measurements.  Roughly half of the measurements
>occur when doing a hypercall-subop that does no
>memory allocation and roughly half occur when doing
>a hypercall-subop that DOES do memory allocation.
>With interrupts disabled, the subop that DOES
>memory allocation still asymptotically approaches
>10M.  The one that does NOT do memory allocation,
>stays relatively small.
>
>I'm currently measuring on Xen 3.3.1 but I think I've
>seen similar results on xen-unstable.  A single 2-vcpu
>domain is running (in addition to domain0).
>
>Does any of that sound familiar?  Any smoking guns?
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-08 01:37 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

Thanks!  Do you happen to know what changeset?

I''ll try re-rebasing to xen-unstable latest and see
if it''s reproduceable.

Dan
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Tuesday, April 07, 2009 6:33 PM
> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com
> Subject: RE: [Xen-devel] 10 million cycles disappearing
>
>
> I remember that some previous post (from Jan?) solved some heavy
> operation incurred in some special action of writable page able. Not
> recall the detail, but your description about subop with memory
> allocation leads me to that part...
>
> Thanks,
> Kevin
>
> >From: Dan Magenheimer
> >Sent: 2009年4月8日 7:54
> >
> >I''ve been seeing a possible performance problem off and on
> >and I''ve spent some time tracking it but haven''t made
much
> >progress and have to give up for now, so I thought I''d at
> >least document what I know and see if it sounds familiar
> >to anyone.
> >
> >The problem: Something in Xen seems to periodically take about
> >10M cycles.  I think it is an interrupt and I think it is
> >taking a lock related to memory allocation and holding
> >it for a LONG time (i.e. 10M cycles or close).
> >
> >I am measuring inside a hypercall using TSC, taking a TSC
> >reading at entry to the hypercall code and at exit.  Xen
> >is not pre-emptive, so it can''t be switching context or
> >something, right?  Nearly all of the readings are less
> >than 100K cycles, but some samples are "huge" and
> >usually at 9M-10M cycles.  Since I am recording the max
> >difference between the TSCs, the max "huge" grows over
> >a long period of time, but eventually converges close
> >to 10M (and this is a 3Ghz processor).  I can see
> >it grow using "watch".  And I''ve NEVER seen a
reading
> >over 10M.
> >
> >I am able to disable interrupts and still take
> >measurements.  Roughly half of the measurements
> >occur when doing a hypercall-subop that does no
> >memory allocation and roughly half occur when doing
> >a hypercall-subop that DOES do memory allocation.
> >With interrupts disabled, the subop that DOES
> >memory allocation still asymptotically approaches
> >10M.  The one that does NOT do memory allocation,
> >stays relatively small.
> >
> >I''m currently measuring on Xen 3.3.1 but I think I''ve
> >seen similar results on xen-unstable.  A single 2-vcpu
> >domain is running (in addition to domain0).
> >
> >Does any of that sound familiar?  Any smoking guns?
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-09 00:46 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

The problem still occurs on latest tip (c/s 19515). :-(

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Tuesday, April 07, 2009 6:33 PM
> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com
> Subject: RE: [Xen-devel] 10 million cycles disappearing
>
>
> I remember that some previous post (from Jan?) solved some heavy
> operation incurred in some special action of writable page able. Not
> recall the detail, but your description about subop with memory
> allocation leads me to that part...
>
> Thanks,
> Kevin
>
> >From: Dan Magenheimer
> >Sent: 2009年4月8日 7:54
> >
> >I''ve been seeing a possible performance problem off and on
> >and I''ve spent some time tracking it but haven''t made
much
> >progress and have to give up for now, so I thought I''d at
> >least document what I know and see if it sounds familiar
> >to anyone.
> >
> >The problem: Something in Xen seems to periodically take about
> >10M cycles.  I think it is an interrupt and I think it is
> >taking a lock related to memory allocation and holding
> >it for a LONG time (i.e. 10M cycles or close).
> >
> >I am measuring inside a hypercall using TSC, taking a TSC
> >reading at entry to the hypercall code and at exit.  Xen
> >is not pre-emptive, so it can''t be switching context or
> >something, right?  Nearly all of the readings are less
> >than 100K cycles, but some samples are "huge" and
> >usually at 9M-10M cycles.  Since I am recording the max
> >difference between the TSCs, the max "huge" grows over
> >a long period of time, but eventually converges close
> >to 10M (and this is a 3Ghz processor).  I can see
> >it grow using "watch".  And I''ve NEVER seen a
reading
> >over 10M.
> >
> >I am able to disable interrupts and still take
> >measurements.  Roughly half of the measurements
> >occur when doing a hypercall-subop that does no
> >memory allocation and roughly half occur when doing
> >a hypercall-subop that DOES do memory allocation.
> >With interrupts disabled, the subop that DOES
> >memory allocation still asymptotically approaches
> >10M.  The one that does NOT do memory allocation,
> >stays relatively small.
> >
> >I''m currently measuring on Xen 3.3.1 but I think I''ve
> >seen similar results on xen-unstable.  A single 2-vcpu
> >domain is running (in addition to domain0).
> >
> >Does any of that sound familiar?  Any smoking guns?
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-09 04:47 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

In your later test with memory allocation, is it still same case that
only some samples approach 10M, or the average close to 10M?
If average cost is high, then xenoprofile could be useful here to
show hotpot to you.

Thanks,
Kevin
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月9日 8:47
>
>The problem still occurs on latest tip (c/s 19515). :-(
>
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> Sent: Tuesday, April 07, 2009 6:33 PM
>> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com
>> Subject: RE: [Xen-devel] 10 million cycles disappearing
>>
>>
>> I remember that some previous post (from Jan?) solved some heavy
>> operation incurred in some special action of writable page able. Not
>> recall the detail, but your description about subop with memory
>> allocation leads me to that part...
>>
>> Thanks,
>> Kevin
>>
>> >From: Dan Magenheimer
>> >Sent: 2009年4月8日 7:54
>> >
>> >I've been seeing a possible performance problem off and on
>> >and I've spent some time tracking it but haven't made much
>> >progress and have to give up for now, so I thought I'd at
>> >least document what I know and see if it sounds familiar
>> >to anyone.
>> >
>> >The problem: Something in Xen seems to periodically take about
>> >10M cycles.  I think it is an interrupt and I think it is
>> >taking a lock related to memory allocation and holding
>> >it for a LONG time (i.e. 10M cycles or close).
>> >
>> >I am measuring inside a hypercall using TSC, taking a TSC
>> >reading at entry to the hypercall code and at exit.  Xen
>> >is not pre-emptive, so it can't be switching context or
>> >something, right?  Nearly all of the readings are less
>> >than 100K cycles, but some samples are "huge" and
>> >usually at 9M-10M cycles.  Since I am recording the max
>> >difference between the TSCs, the max "huge" grows over
>> >a long period of time, but eventually converges close
>> >to 10M (and this is a 3Ghz processor).  I can see
>> >it grow using "watch".  And I've NEVER seen a reading
>> >over 10M.
>> >
>> >I am able to disable interrupts and still take
>> >measurements.  Roughly half of the measurements
>> >occur when doing a hypercall-subop that does no
>> >memory allocation and roughly half occur when doing
>> >a hypercall-subop that DOES do memory allocation.
>> >With interrupts disabled, the subop that DOES
>> >memory allocation still asymptotically approaches
>> >10M.  The one that does NOT do memory allocation,
>> >stays relatively small.
>> >
>> >I'm currently measuring on Xen 3.3.1 but I think I've
>> >seen similar results on xen-unstable.  A single 2-vcpu
>> >domain is running (in addition to domain0).
>> >
>> >Does any of that sound familiar?  Any smoking guns?
>> >
>> >_______________________________________________
>> >Xen-devel mailing list
>> >Xen-devel@lists.xensource.com
>> >http://lists.xensource.com/xen-devel
>> >
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-09 18:19 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

After looking into this a bit more, it appears to be
only an artifact and a race in my measurement code
(buried in macros where it was non-obvious).

Thanks for the help and sorry for the noise.

P.S. Average of tmem ops including page copy and
compression/decompresion is in the 20K-50K cycle range.
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Wednesday, April 08, 2009 10:47 PM
> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com
> Subject: RE: [Xen-devel] 10 million cycles disappearing
>
>
> In your later test with memory allocation, is it still same case that
> only some samples approach 10M, or the average close to 10M?
> If average cost is high, then xenoprofile could be useful here to
> show hotpot to you.
>
> Thanks,
> Kevin
>
> >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> >Sent: 2009年4月9日 8:47
> >
> >The problem still occurs on latest tip (c/s 19515). :-(
> >
> >
> >> -----Original Message-----
> >> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> >> Sent: Tuesday, April 07, 2009 6:33 PM
> >> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com
> >> Subject: RE: [Xen-devel] 10 million cycles disappearing
> >>
> >>
> >> I remember that some previous post (from Jan?) solved some heavy
> >> operation incurred in some special action of writable page
> able. Not
> >> recall the detail, but your description about subop with memory
> >> allocation leads me to that part...
> >>
> >> Thanks,
> >> Kevin
> >>
> >> >From: Dan Magenheimer
> >> >Sent: 2009年4月8日 7:54
> >> >
> >> >I''ve been seeing a possible performance problem off
and on
> >> >and I''ve spent some time tracking it but
haven''t made much
> >> >progress and have to give up for now, so I thought
I''d at
> >> >least document what I know and see if it sounds familiar
> >> >to anyone.
> >> >
> >> >The problem: Something in Xen seems to periodically take about
> >> >10M cycles.  I think it is an interrupt and I think it is
> >> >taking a lock related to memory allocation and holding
> >> >it for a LONG time (i.e. 10M cycles or close).
> >> >
> >> >I am measuring inside a hypercall using TSC, taking a TSC
> >> >reading at entry to the hypercall code and at exit.  Xen
> >> >is not pre-emptive, so it can''t be switching context
or
> >> >something, right?  Nearly all of the readings are less
> >> >than 100K cycles, but some samples are "huge" and
> >> >usually at 9M-10M cycles.  Since I am recording the max
> >> >difference between the TSCs, the max "huge" grows
over
> >> >a long period of time, but eventually converges close
> >> >to 10M (and this is a 3Ghz processor).  I can see
> >> >it grow using "watch".  And I''ve NEVER seen
a reading
> >> >over 10M.
> >> >
> >> >I am able to disable interrupts and still take
> >> >measurements.  Roughly half of the measurements
> >> >occur when doing a hypercall-subop that does no
> >> >memory allocation and roughly half occur when doing
> >> >a hypercall-subop that DOES do memory allocation.
> >> >With interrupts disabled, the subop that DOES
> >> >memory allocation still asymptotically approaches
> >> >10M.  The one that does NOT do memory allocation,
> >> >stays relatively small.
> >> >
> >> >I''m currently measuring on Xen 3.3.1 but I think
I''ve
> >> >seen similar results on xen-unstable.  A single 2-vcpu
> >> >domain is running (in addition to domain0).
> >> >
> >> >Does any of that sound familiar?  Any smoking guns?
> >> >
> >> >_______________________________________________
> >> >Xen-devel mailing list
> >> >Xen-devel@lists.xensource.com
> >> >http://lists.xensource.com/xen-devel
> >> >
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Apr-10 00:17 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2009年4月10日 2:19
>
>After looking into this a bit more, it appears to be
>only an artifact and a race in my measurement code
>(buried in macros where it was non-obvious).
good.
>
>Thanks for the help and sorry for the noise.
>
>P.S. Average of tmem ops including page copy and
>compression/decompresion is in the 20K-50K cycle range.
How frequent is your tmem ops in a normal and extreme
usage?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Apr-10 13:20 UTC

head link

RE: [Xen-devel] 10 million cycles disappearing

> >P.S. Average of tmem ops including page copy and
> >compression/decompresion is in the 20K-50K cycle range.
>
> How frequent is your tmem ops in a normal and extreme
> usage?
It''s really hard to answer that.  There is no "normal"
usage as tmem gets used when there is memory pressure.
I stress-test it continually running "make -j80" on
linux-2.6.28 (usually with memory=768M and maxmem=1792M)
and with that load there are a few thousand tmem ops
per second.

Note also that the 20K-50K cycle range is WITH compression.
Compression is optional, slows the tmem op by 5x-10x,
but increases memory usage by 2x-4x.

When a VM is under memory pressure, it is often doing
I/O so CPU utilization is low.  Tmem absorbs some of the
otherwise unutilized CPU cycles and underutilized memory
to reduce I/O.

Does that help?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2009 - 10 million cycles disappearing

[Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing

RE: [Xen-devel] 10 million cycles disappearing