Hi, Aligned periodic vpts can improve the HVM guest power consumption a lot, especially while the guest using high HZ such as 1000HZ. This patch aligns all periodic vpts except vlapic to the period bound. For vlapic, only make it aligned while using the new option "align_periodic_vpt". Signed-off-by: Wei Gang <gang.wei@intel.com> diff -r 4ac8bc60c000 xen/arch/x86/hvm/vpt.c --- a/xen/arch/x86/hvm/vpt.c Tue Feb 10 05:51:00 2009 +0000 +++ b/xen/arch/x86/hvm/vpt.c Wed Feb 11 18:12:27 2009 +0800 @@ -354,6 +354,22 @@ void pt_migrate(struct vcpu *v) spin_unlock(&v->arch.hvm_vcpu.tm_lock); } +/* + * option "align_periodic_vpt" will make vlapic''s expires aligned with other + * vpts while possible. + * + * CAUTION: + * While vlapic timer ticking too close to the pit. We saw a userspace + * application getting the wrong answer because long CPU bound sequences + * appeared to run with zero CPU time. This only showed up with old Linux + * kernels (IIRC, it was with Red Hat 3 U8). So this option may cause a + * regression in this case. + */ +static int opt_align_periodic_vpt = 0; +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); + +extern s_time_t align_timer(s_time_t firsttick, uint64_t period); + void create_periodic_time( struct vcpu *v, struct periodic_time *pt, uint64_t delta, uint64_t period, uint8_t irq, time_cb *cb, void *data) @@ -389,8 +405,13 @@ void create_periodic_time( * LAPIC ticks for process accounting can see long sequences of process * ticks incorrectly accounted to interrupt processing. */ - if ( !pt->one_shot && (pt->source == PTSRC_lapic) ) - pt->scheduled += delta >> 1; + if ( !pt->one_shot ) + { + pt->scheduled = align_timer(pt->scheduled, pt->period); + if ( !opt_align_periodic_vpt && (pt->source == PTSRC_lapic) ) + pt->scheduled += delta >> 1; + } + pt->cb = cb; pt->priv = data; diff -r 4ac8bc60c000 xen/common/timer.c --- a/xen/common/timer.c Tue Feb 10 05:51:00 2009 +0000 +++ b/xen/common/timer.c Wed Feb 11 18:12:27 2009 +0800 @@ -473,6 +473,14 @@ void process_pending_timers(void) timer_softirq_action(); } +/* calculate the aligned first tick time for the given periodic vpt */ +s_time_t align_timer(s_time_t firsttick, uint64_t period) +{ + if ( !period ) + return firsttick; + + return firsttick + period - (firsttick % period); +} static void dump_timerq(unsigned char key) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/02/2009 11:05, "Wei, Gang" <gang.wei@intel.com> wrote:> + * CAUTION: > + * While vlapic timer ticking too close to the pit. We saw a userspace > + * application getting the wrong answer because long CPU bound sequences > + * appeared to run with zero CPU time. This only showed up with old Linux > + * kernels (IIRC, it was with Red Hat 3 U8). So this option may cause a > + * regression in this case. > + */ > +static int opt_align_periodic_vpt = 0; > +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); > +Presumably there are common cases where not aligning vlapic too has significant power overheads? Personally I''m not sure I care too much about a minor regression on RH3, if this patch is worthwhile at all I think it should be always on and at most have a domain config option. I think a boot option will never ever be used. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/02/2009 11:05, "Wei, Gang" <gang.wei@intel.com> wrote:> Aligned periodic vpts can improve the HVM guest power consumption a lot, > especially while the guest using high HZ such as 1000HZ. > This patch aligns all periodic vpts except vlapic to the period bound. For > vlapic, only make it aligned while using the new option "align_periodic_vpt". > > Signed-off-by: Wei Gang <gang.wei@intel.com>Also, Intel already contributed code to merge up timers. It''s the expiry-range patch in common/timer.c. This could be used by vpt.c to add a per-domain acceptable range on vpt expiries. High-frequency timers would then naturally fire together. Having a per-domain config option for this would be something that would actually seem more generically useful (could be used perhaps for other timers beyond vpt.c even). This seems to me a more intuitive and gracefully selectable/de-selectable alternative to this proposed patch, which really looks like a hardcoded hack. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser >Sent: Wednesday, February 11, 2009 7:23 PM > >On 11/02/2009 11:05, "Wei, Gang" <gang.wei@intel.com> wrote: > >> + * CAUTION: >> + * While vlapic timer ticking too close to the pit. We saw >a userspace >> + * application getting the wrong answer because long CPU >bound sequences >> + * appeared to run with zero CPU time. This only showed up >with old Linux >> + * kernels (IIRC, it was with Red Hat 3 U8). So this option >may cause a >> + * regression in this case. >> + */ >> +static int opt_align_periodic_vpt = 0; >> +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); >> + > >Presumably there are common cases where not aligning vlapic too has >significant power overheads? Personally I''m not sure I careyes, it''s necessary as average C-state residency is almost halved if not aligning, and thus draw higher power Thanks Kevin>too much about a >minor regression on RH3, if this patch is worthwhile at all I think it >should be always on and at most have a domain config option. I >think a boot >option will never ever be used. > > -- Keir > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser >Sent: Wednesday, February 11, 2009 7:34 PM > >On 11/02/2009 11:05, "Wei, Gang" <gang.wei@intel.com> wrote: > >> Aligned periodic vpts can improve the HVM guest power >consumption a lot, >> especially while the guest using high HZ such as 1000HZ. >> This patch aligns all periodic vpts except vlapic to the >period bound. For >> vlapic, only make it aligned while using the new option >"align_periodic_vpt". >> >> Signed-off-by: Wei Gang <gang.wei@intel.com> > >Also, Intel already contributed code to merge up timers. It''s the >expiry-range patch in common/timer.c. This could be used by >vpt.c to add a >per-domain acceptable range on vpt expiries. High-frequency >timers would >then naturally fire together. Having a per-domain config >option for this >would be something that would actually seem more generically >useful (could >be used perhaps for other timers beyond vpt.c even). > >This seems to me a more intuitive and gracefully >selectable/de-selectable >alternative to this proposed patch, which really looks like a hardcoded >hack. >nice idea. But one quick think in my mind leads to one issue. Now Xen timer doesn''t differentiate single-shot or periodic timer. Then such per-domain range option could also impact single-shot timer servicing same domain... Of course current global slop option has same effect. But it''d be better to mitigate the side effect on single-shot timer. Is it feasible to add a new set_timer_range interface for explicit invocation, e.g. by vpt, while still keeping original global slop option applying to all? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Tian, Kevin >Sent: Wednesday, February 11, 2009 7:48 PM > >>From: Keir Fraser >>Sent: Wednesday, February 11, 2009 7:34 PM >> >>On 11/02/2009 11:05, "Wei, Gang" <gang.wei@intel.com> wrote: >> >>> Aligned periodic vpts can improve the HVM guest power >>consumption a lot, >>> especially while the guest using high HZ such as 1000HZ. >>> This patch aligns all periodic vpts except vlapic to the >>period bound. For >>> vlapic, only make it aligned while using the new option >>"align_periodic_vpt". >>> >>> Signed-off-by: Wei Gang <gang.wei@intel.com> >> >>Also, Intel already contributed code to merge up timers. It''s the >>expiry-range patch in common/timer.c. This could be used by >>vpt.c to add a >>per-domain acceptable range on vpt expiries. High-frequency >>timers would >>then naturally fire together. Having a per-domain config >>option for this >>would be something that would actually seem more generically >>useful (could >>be used perhaps for other timers beyond vpt.c even). >> >>This seems to me a more intuitive and gracefully >>selectable/de-selectable >>alternative to this proposed patch, which really looks like a >hardcoded >>hack. >> > >nice idea. But one quick think in my mind leads to one issue. Now >Xen timer doesn''t differentiate single-shot or periodic timer. >Then such >per-domain range option could also impact single-shot timer servicing >same domain... Of course current global slop option has same effect. >But it''d be better to mitigate the side effect on single-shot >timer. Is it >feasible to add a new set_timer_range interface for explicit >invocation, >e.g. by vpt, while still keeping original global slop option >applying to all? >Think it more, I think that Jimmy''s patch is simpler and more accurate for the purpose. It''s just a one-time adjustment for periodical timer, and no harm to single-shot timer. It can be enabled by default, while per-domain range has side-effect unless adding more code to differentiate timers which is not worthy. Of course per-domain switch is still required to disable it as your previous comment, for old guest. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/02/2009 12:00, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Think it more, I think that Jimmy''s patch is simpler and more > accurate for the purpose. It''s just a one-time adjustment for > periodical timer, and no harm to single-shot timer. It can be > enabled by default, while per-domain range has side-effect > unless adding more code to differentiate timers which is not > worthy. > > Of course per-domain switch is still required to disable it as > your previous comment, for old guest.I''d actually be interested in knowing how just bumping Xen cmdline option timer_slop= would influence power usage and guest timers. No new code needed, a nice sliding dial (per host) for power usage versus timer accuracy. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, February 11, 2009 9:06 PM > >On 11/02/2009 12:00, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Think it more, I think that Jimmy''s patch is simpler and more >> accurate for the purpose. It''s just a one-time adjustment for >> periodical timer, and no harm to single-shot timer. It can be >> enabled by default, while per-domain range has side-effect >> unless adding more code to differentiate timers which is not >> worthy. >> >> Of course per-domain switch is still required to disable it as >> your previous comment, for old guest. > >I''d actually be interested in knowing how just bumping Xen >cmdline option >timer_slop= would influence power usage and guest timers. No new code >needed, a nice sliding dial (per host) for power usage versus timer >accuracy. >We''ll present some in-depth data in near summit. Basically for a single 2-vcpu HVM RHEL5u1 on a two core mobile platform, 1ms slop, compared to default 50us, could bring 7.5% more power saving by reducing timer interrupt by a factor of 3 (RHEL5u1 is by default 1000HZ meaning 3000 virtual interrupts for 1 vPIT and 2 vAPIC, and then 1ms slop roughly drops interrupt to ~1000). By running SPECpower, power efficiency score is also slightly improved. However when we run iperf to check latency, the data became unstable. So range timer does affects latency, but in general is a power efficient feature to fit requirement where power matters more. It''s especially useful at cpu over-commitment where more chances to align timers and reduce interrupts by a higher factor. While range timer impacts all timers nondistinctively (xen timer itself is in essential one shot), Jimmy''s patch tends to reach similar effect for periodical timer (since once align at 1st shot, so does latter), while leaving single shot timer as it is w/o touching global slop. To me above two are not identical which reduces power in different level.:-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > + * CAUTION: > > + * While vlapic timer ticking too close to the pit. We > saw a userspace > > + * application getting the wrong answer because long CPU > bound sequences > > + * appeared to run with zero CPU time. This only showed up > with old Linux > > + * kernels (IIRC, it was with Red Hat 3 U8). So this > option may cause a > > + * regression in this case. > > + */ > > +static int opt_align_periodic_vpt = 0; > > +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); > > + > > Presumably there are common cases where not aligning vlapic too has > significant power overheads? Personally I''m not sure I care > too much about a > minor regression on RH3, if this patch is worthwhile at all I think it > should be always on and at most have a domain config option. > I think a boot > option will never ever be used.Given the wide variety of guests, and clocksource defaults/choices in those guests, I''m leery about this change, especially turning it on by default. The consequences of guest clocks losing time or gaining time or appearing to go backwards are significant and the potential problems go well beyond "a minor regression on RH3" and IMHO are much more impactful for customers than saving a watt or two. It is difficult in a simple test environment to reproduce the problem unless you know what you are looking for. Virtual Iron had a rather extensive set of test cases and I''d like to see those run before this is turned on by default. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer >Sent: Wednesday, February 11, 2009 10:56 PM > >> > + * CAUTION: >> > + * While vlapic timer ticking too close to the pit. We >> saw a userspace >> > + * application getting the wrong answer because long CPU >> bound sequences >> > + * appeared to run with zero CPU time. This only showed up >> with old Linux >> > + * kernels (IIRC, it was with Red Hat 3 U8). So this >> option may cause a >> > + * regression in this case. >> > + */ >> > +static int opt_align_periodic_vpt = 0; >> > +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); >> > + >> >> Presumably there are common cases where not aligning vlapic too has >> significant power overheads? Personally I''m not sure I care >> too much about a >> minor regression on RH3, if this patch is worthwhile at all >I think it >> should be always on and at most have a domain config option. >> I think a boot >> option will never ever be used. > >Given the wide variety of guests, and clocksource defaults/choices >in those guests, I''m leery about this change, especially turning it >on by default. The consequences of guest clocks losing time or gaining >time or appearing to go backwards are significant and the potential >problems go well beyond "a minor regression on RH3" and IMHO are >much more impactful for customers than saving a watt or two.I''m not sure why you count this feature as the cause for guest time inaccuracy. This patch only shifts 1st expiration of periodical timer, and later expirations are all exactly as expected relative to previous one.> >It is difficult in a simple test environment to reproduce the >problem unless you know what you are looking for. >Virtual Iron had a rather extensive set of test cases and I''d >like to see those run before this is turned on by default. >VI''s issue is not about time inaccuracy or performance, which is about accounting. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I''m not sure why you count this feature as the cause for guest time > inaccuracy. This patch only shifts 1st expiration of periodical timer, > and later expirations are all exactly as expected relative to previous > one.OK, if that is correct, I have no problem with the patch.> VI''s issue is not about time inaccuracy or performance, which is > about accounting.Many of the problems (and Oracle has seen customers with similar problems) are related to the timing of delivery of "ticks" relative to each other, e.g. if consecutive ticks are sometimes 0.01s apart and sometimes 0.0099s apart and sometimes 0.0101s apart, this causes different problems for different guests with different default/chosen clocksources. Dan> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Wednesday, February 11, 2009 8:25 AM > To: Dan Magenheimer; Keir Fraser; Wei, Gang; xen-devel > Subject: RE: [Xen-devel] Re: [PATCH] Align periodic vpts > > > >From: Dan Magenheimer > >Sent: Wednesday, February 11, 2009 10:56 PM > > > >> > + * CAUTION: > >> > + * While vlapic timer ticking too close to the pit. We > >> saw a userspace > >> > + * application getting the wrong answer because long CPU > >> bound sequences > >> > + * appeared to run with zero CPU time. This only showed up > >> with old Linux > >> > + * kernels (IIRC, it was with Red Hat 3 U8). So this > >> option may cause a > >> > + * regression in this case. > >> > + */ > >> > +static int opt_align_periodic_vpt = 0; > >> > +boolean_param("align_periodic_vpt", opt_align_periodic_vpt); > >> > + > >> > >> Presumably there are common cases where not aligning vlapic too has > >> significant power overheads? Personally I''m not sure I care > >> too much about a > >> minor regression on RH3, if this patch is worthwhile at all > >I think it > >> should be always on and at most have a domain config option. > >> I think a boot > >> option will never ever be used. > > > >Given the wide variety of guests, and clocksource defaults/choices > >in those guests, I''m leery about this change, especially turning it > >on by default. The consequences of guest clocks losing time > or gaining > >time or appearing to go backwards are significant and the potential > >problems go well beyond "a minor regression on RH3" and IMHO are > >much more impactful for customers than saving a watt or two. > > I''m not sure why you count this feature as the cause for guest time > inaccuracy. This patch only shifts 1st expiration of periodical timer, > and later expirations are all exactly as expected relative to previous > one. > > > > >It is difficult in a simple test environment to reproduce the > >problem unless you know what you are looking for. > >Virtual Iron had a rather extensive set of test cases and I''d > >like to see those run before this is turned on by default. > > > > VI''s issue is not about time inaccuracy or performance, which is > about accounting. > > Thanks, > Kevin_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >Sent: Wednesday, February 11, 2009 11:36 PM > >> I''m not sure why you count this feature as the cause for guest time >> inaccuracy. This patch only shifts 1st expiration of >periodical timer, >> and later expirations are all exactly as expected relative >to previous >> one. > >OK, if that is correct, I have no problem with the patch. > >> VI''s issue is not about time inaccuracy or performance, which is >> about accounting. > >Many of the problems (and Oracle has seen customers with similar >problems) are related to the timing of delivery of "ticks" >relative to each other, e.g. if consecutive ticks are sometimes >0.01s apart and sometimes 0.0099s apart and sometimes 0.0101s >apart, this causes different problems for different guests with >different default/chosen clocksources. >Yep, I agree. It''s a common issue from virtualization. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wednesday, February 11, 2009 9:21 PM, Tian, Kevin wrote:>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Wednesday, February 11, 2009 9:06 PM >> >> I''d actually be interested in knowing how just bumping Xen >> cmdline option >> timer_slop= would influence power usage and guest timers. No new code >> needed, a nice sliding dial (per host) for power usage versus timer accuracy....> So range timer does affects latency, but in general is a power > efficient feature to fit requirement where power matters more. It''s > especially useful at cpu over-commitment where more chances > to align timers and reduce interrupts by a higher factor. > > While range timer impacts all timers nondistinctively (xen timer > itself is in essential one shot), Jimmy''s patch tends to reach > similar effect for periodical timer (since once align at 1st shot, > so does latter), while leaving single shot timer as it is w/o touching > global slop. > > To me above two are not identical which reduces power in different > level.:-)Just as Kevin explained, aligning periodic timer in the beginning could bring powe gain with less impact to timer expiry accuracy, so it is suitable for using before we dig out most of the side effect for large slop range timer. I will try to make the option per-domain and then resend the patch. Meanwhile, I am also prefer to make this option default on. Any further comments? Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday, February 12, 2009 10:10 AM, Wei, Gang wrote:> Just as Kevin explained, aligning periodic timer in the beginning could bring > powe gain with less impact to timer expiry accuracy, so it is suitable for > using before we dig out most of the side effect for large slop range timer. I > will try to make the option per-domain and then resend the patch. Meanwhile, > I am also prefer to make this option default on. Any further comments?Here is the updated patch which makes a per-domain option "vpt_align". The C3 average residency was doubled with vpt_align=1for RHEL5 hvm guest. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel