I''ve been seeing a possible performance problem off and on and I''ve spent some time tracking it but haven''t made much progress and have to give up for now, so I thought I''d at least document what I know and see if it sounds familiar to anyone. The problem: Something in Xen seems to periodically take about 10M cycles. I think it is an interrupt and I think it is taking a lock related to memory allocation and holding it for a LONG time (i.e. 10M cycles or close). I am measuring inside a hypercall using TSC, taking a TSC reading at entry to the hypercall code and at exit. Xen is not pre-emptive, so it can''t be switching context or something, right? Nearly all of the readings are less than 100K cycles, but some samples are "huge" and usually at 9M-10M cycles. Since I am recording the max difference between the TSCs, the max "huge" grows over a long period of time, but eventually converges close to 10M (and this is a 3Ghz processor). I can see it grow using "watch". And I''ve NEVER seen a reading over 10M. I am able to disable interrupts and still take measurements. Roughly half of the measurements occur when doing a hypercall-subop that does no memory allocation and roughly half occur when doing a hypercall-subop that DOES do memory allocation. With interrupts disabled, the subop that DOES memory allocation still asymptotically approaches 10M. The one that does NOT do memory allocation, stays relatively small. I''m currently measuring on Xen 3.3.1 but I think I''ve seen similar results on xen-unstable. A single 2-vcpu domain is running (in addition to domain0). Does any of that sound familiar? Any smoking guns? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I remember that some previous post (from Jan?) solved some heavy operation incurred in some special action of writable page able. Not recall the detail, but your description about subop with memory allocation leads me to that part... Thanks, Kevin>From: Dan Magenheimer >Sent: 2009年4月8日 7:54 > >I've been seeing a possible performance problem off and on >and I've spent some time tracking it but haven't made much >progress and have to give up for now, so I thought I'd at >least document what I know and see if it sounds familiar >to anyone. > >The problem: Something in Xen seems to periodically take about >10M cycles. I think it is an interrupt and I think it is >taking a lock related to memory allocation and holding >it for a LONG time (i.e. 10M cycles or close). > >I am measuring inside a hypercall using TSC, taking a TSC >reading at entry to the hypercall code and at exit. Xen >is not pre-emptive, so it can't be switching context or >something, right? Nearly all of the readings are less >than 100K cycles, but some samples are "huge" and >usually at 9M-10M cycles. Since I am recording the max >difference between the TSCs, the max "huge" grows over >a long period of time, but eventually converges close >to 10M (and this is a 3Ghz processor). I can see >it grow using "watch". And I've NEVER seen a reading >over 10M. > >I am able to disable interrupts and still take >measurements. Roughly half of the measurements >occur when doing a hypercall-subop that does no >memory allocation and roughly half occur when doing >a hypercall-subop that DOES do memory allocation. >With interrupts disabled, the subop that DOES >memory allocation still asymptotically approaches >10M. The one that does NOT do memory allocation, >stays relatively small. > >I'm currently measuring on Xen 3.3.1 but I think I've >seen similar results on xen-unstable. A single 2-vcpu >domain is running (in addition to domain0). > >Does any of that sound familiar? Any smoking guns? > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks! Do you happen to know what changeset? I''ll try re-rebasing to xen-unstable latest and see if it''s reproduceable. Dan> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Tuesday, April 07, 2009 6:33 PM > To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com > Subject: RE: [Xen-devel] 10 million cycles disappearing > > > I remember that some previous post (from Jan?) solved some heavy > operation incurred in some special action of writable page able. Not > recall the detail, but your description about subop with memory > allocation leads me to that part... > > Thanks, > Kevin > > >From: Dan Magenheimer > >Sent: 2009年4月8日 7:54 > > > >I''ve been seeing a possible performance problem off and on > >and I''ve spent some time tracking it but haven''t made much > >progress and have to give up for now, so I thought I''d at > >least document what I know and see if it sounds familiar > >to anyone. > > > >The problem: Something in Xen seems to periodically take about > >10M cycles. I think it is an interrupt and I think it is > >taking a lock related to memory allocation and holding > >it for a LONG time (i.e. 10M cycles or close). > > > >I am measuring inside a hypercall using TSC, taking a TSC > >reading at entry to the hypercall code and at exit. Xen > >is not pre-emptive, so it can''t be switching context or > >something, right? Nearly all of the readings are less > >than 100K cycles, but some samples are "huge" and > >usually at 9M-10M cycles. Since I am recording the max > >difference between the TSCs, the max "huge" grows over > >a long period of time, but eventually converges close > >to 10M (and this is a 3Ghz processor). I can see > >it grow using "watch". And I''ve NEVER seen a reading > >over 10M. > > > >I am able to disable interrupts and still take > >measurements. Roughly half of the measurements > >occur when doing a hypercall-subop that does no > >memory allocation and roughly half occur when doing > >a hypercall-subop that DOES do memory allocation. > >With interrupts disabled, the subop that DOES > >memory allocation still asymptotically approaches > >10M. The one that does NOT do memory allocation, > >stays relatively small. > > > >I''m currently measuring on Xen 3.3.1 but I think I''ve > >seen similar results on xen-unstable. A single 2-vcpu > >domain is running (in addition to domain0). > > > >Does any of that sound familiar? Any smoking guns? > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The problem still occurs on latest tip (c/s 19515). :-(> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Tuesday, April 07, 2009 6:33 PM > To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com > Subject: RE: [Xen-devel] 10 million cycles disappearing > > > I remember that some previous post (from Jan?) solved some heavy > operation incurred in some special action of writable page able. Not > recall the detail, but your description about subop with memory > allocation leads me to that part... > > Thanks, > Kevin > > >From: Dan Magenheimer > >Sent: 2009年4月8日 7:54 > > > >I''ve been seeing a possible performance problem off and on > >and I''ve spent some time tracking it but haven''t made much > >progress and have to give up for now, so I thought I''d at > >least document what I know and see if it sounds familiar > >to anyone. > > > >The problem: Something in Xen seems to periodically take about > >10M cycles. I think it is an interrupt and I think it is > >taking a lock related to memory allocation and holding > >it for a LONG time (i.e. 10M cycles or close). > > > >I am measuring inside a hypercall using TSC, taking a TSC > >reading at entry to the hypercall code and at exit. Xen > >is not pre-emptive, so it can''t be switching context or > >something, right? Nearly all of the readings are less > >than 100K cycles, but some samples are "huge" and > >usually at 9M-10M cycles. Since I am recording the max > >difference between the TSCs, the max "huge" grows over > >a long period of time, but eventually converges close > >to 10M (and this is a 3Ghz processor). I can see > >it grow using "watch". And I''ve NEVER seen a reading > >over 10M. > > > >I am able to disable interrupts and still take > >measurements. Roughly half of the measurements > >occur when doing a hypercall-subop that does no > >memory allocation and roughly half occur when doing > >a hypercall-subop that DOES do memory allocation. > >With interrupts disabled, the subop that DOES > >memory allocation still asymptotically approaches > >10M. The one that does NOT do memory allocation, > >stays relatively small. > > > >I''m currently measuring on Xen 3.3.1 but I think I''ve > >seen similar results on xen-unstable. A single 2-vcpu > >domain is running (in addition to domain0). > > > >Does any of that sound familiar? Any smoking guns? > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
In your later test with memory allocation, is it still same case that only some samples approach 10M, or the average close to 10M? If average cost is high, then xenoprofile could be useful here to show hotpot to you. Thanks, Kevin>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月9日 8:47 > >The problem still occurs on latest tip (c/s 19515). :-( > > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> Sent: Tuesday, April 07, 2009 6:33 PM >> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com >> Subject: RE: [Xen-devel] 10 million cycles disappearing >> >> >> I remember that some previous post (from Jan?) solved some heavy >> operation incurred in some special action of writable page able. Not >> recall the detail, but your description about subop with memory >> allocation leads me to that part... >> >> Thanks, >> Kevin >> >> >From: Dan Magenheimer >> >Sent: 2009年4月8日 7:54 >> > >> >I've been seeing a possible performance problem off and on >> >and I've spent some time tracking it but haven't made much >> >progress and have to give up for now, so I thought I'd at >> >least document what I know and see if it sounds familiar >> >to anyone. >> > >> >The problem: Something in Xen seems to periodically take about >> >10M cycles. I think it is an interrupt and I think it is >> >taking a lock related to memory allocation and holding >> >it for a LONG time (i.e. 10M cycles or close). >> > >> >I am measuring inside a hypercall using TSC, taking a TSC >> >reading at entry to the hypercall code and at exit. Xen >> >is not pre-emptive, so it can't be switching context or >> >something, right? Nearly all of the readings are less >> >than 100K cycles, but some samples are "huge" and >> >usually at 9M-10M cycles. Since I am recording the max >> >difference between the TSCs, the max "huge" grows over >> >a long period of time, but eventually converges close >> >to 10M (and this is a 3Ghz processor). I can see >> >it grow using "watch". And I've NEVER seen a reading >> >over 10M. >> > >> >I am able to disable interrupts and still take >> >measurements. Roughly half of the measurements >> >occur when doing a hypercall-subop that does no >> >memory allocation and roughly half occur when doing >> >a hypercall-subop that DOES do memory allocation. >> >With interrupts disabled, the subop that DOES >> >memory allocation still asymptotically approaches >> >10M. The one that does NOT do memory allocation, >> >stays relatively small. >> > >> >I'm currently measuring on Xen 3.3.1 but I think I've >> >seen similar results on xen-unstable. A single 2-vcpu >> >domain is running (in addition to domain0). >> > >> >Does any of that sound familiar? Any smoking guns? >> > >> >_______________________________________________ >> >Xen-devel mailing list >> >Xen-devel@lists.xensource.com >> >http://lists.xensource.com/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
After looking into this a bit more, it appears to be only an artifact and a race in my measurement code (buried in macros where it was non-obvious). Thanks for the help and sorry for the noise. P.S. Average of tmem ops including page copy and compression/decompresion is in the 20K-50K cycle range.> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Wednesday, April 08, 2009 10:47 PM > To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com > Subject: RE: [Xen-devel] 10 million cycles disappearing > > > In your later test with memory allocation, is it still same case that > only some samples approach 10M, or the average close to 10M? > If average cost is high, then xenoprofile could be useful here to > show hotpot to you. > > Thanks, > Kevin > > >From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > >Sent: 2009年4月9日 8:47 > > > >The problem still occurs on latest tip (c/s 19515). :-( > > > > > >> -----Original Message----- > >> From: Tian, Kevin [mailto:kevin.tian@intel.com] > >> Sent: Tuesday, April 07, 2009 6:33 PM > >> To: Dan Magenheimer; Xen-Devel (E-mail); jbeulich@novell.com > >> Subject: RE: [Xen-devel] 10 million cycles disappearing > >> > >> > >> I remember that some previous post (from Jan?) solved some heavy > >> operation incurred in some special action of writable page > able. Not > >> recall the detail, but your description about subop with memory > >> allocation leads me to that part... > >> > >> Thanks, > >> Kevin > >> > >> >From: Dan Magenheimer > >> >Sent: 2009年4月8日 7:54 > >> > > >> >I''ve been seeing a possible performance problem off and on > >> >and I''ve spent some time tracking it but haven''t made much > >> >progress and have to give up for now, so I thought I''d at > >> >least document what I know and see if it sounds familiar > >> >to anyone. > >> > > >> >The problem: Something in Xen seems to periodically take about > >> >10M cycles. I think it is an interrupt and I think it is > >> >taking a lock related to memory allocation and holding > >> >it for a LONG time (i.e. 10M cycles or close). > >> > > >> >I am measuring inside a hypercall using TSC, taking a TSC > >> >reading at entry to the hypercall code and at exit. Xen > >> >is not pre-emptive, so it can''t be switching context or > >> >something, right? Nearly all of the readings are less > >> >than 100K cycles, but some samples are "huge" and > >> >usually at 9M-10M cycles. Since I am recording the max > >> >difference between the TSCs, the max "huge" grows over > >> >a long period of time, but eventually converges close > >> >to 10M (and this is a 3Ghz processor). I can see > >> >it grow using "watch". And I''ve NEVER seen a reading > >> >over 10M. > >> > > >> >I am able to disable interrupts and still take > >> >measurements. Roughly half of the measurements > >> >occur when doing a hypercall-subop that does no > >> >memory allocation and roughly half occur when doing > >> >a hypercall-subop that DOES do memory allocation. > >> >With interrupts disabled, the subop that DOES > >> >memory allocation still asymptotically approaches > >> >10M. The one that does NOT do memory allocation, > >> >stays relatively small. > >> > > >> >I''m currently measuring on Xen 3.3.1 but I think I''ve > >> >seen similar results on xen-unstable. A single 2-vcpu > >> >domain is running (in addition to domain0). > >> > > >> >Does any of that sound familiar? Any smoking guns? > >> > > >> >_______________________________________________ > >> >Xen-devel mailing list > >> >Xen-devel@lists.xensource.com > >> >http://lists.xensource.com/xen-devel > >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: 2009年4月10日 2:19 > >After looking into this a bit more, it appears to be >only an artifact and a race in my measurement code >(buried in macros where it was non-obvious).good.> >Thanks for the help and sorry for the noise. > >P.S. Average of tmem ops including page copy and >compression/decompresion is in the 20K-50K cycle range.How frequent is your tmem ops in a normal and extreme usage? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >P.S. Average of tmem ops including page copy and > >compression/decompresion is in the 20K-50K cycle range. > > How frequent is your tmem ops in a normal and extreme > usage?It''s really hard to answer that. There is no "normal" usage as tmem gets used when there is memory pressure. I stress-test it continually running "make -j80" on linux-2.6.28 (usually with memory=768M and maxmem=1792M) and with that load there are a few thousand tmem ops per second. Note also that the 20K-50K cycle range is WITH compression. Compression is optional, slows the tmem op by 5x-10x, but increases memory usage by 2x-4x. When a VM is under memory pressure, it is often doing I/O so CPU utilization is low. Tmem absorbs some of the otherwise unutilized CPU cycles and underutilized memory to reduce I/O. Does that help? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel