This patch is the updated version of the deferrable timer patch (http://lists.xensource.com/archives/html/xen-devel/2008-07/msg00835.html). The major changes are: - rebase with latest upstream: sync with the timer link list chagne (changeset 18381), sync with the heap memory allocation change (changeset 18671) - Per Keir''s comment, more usage is found: use the range timer in HVM virtual periodic timer (xen/arch/x86/hvm/vpt.c). Since vpt has the logic to handle missing ticks, it is an ideal place to use range timer. - the original deferrable timer is renamed to range timer, to better describe the conecpt. This renaming is inspired by a similar proposal in linux kernel community (http://kerneltrap.org/mailarchive/linux-kernel/2008/9/1/3160404) A preliminary measurement is done in a dual core mobile platform (Montivina). When creating 4 RHEL5 HVM UP guests, the range timer patch can reduce ~5% power consumption. More measurement is onging. This post is inteneded to get more feedback to refine it. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote:> The major changes are: > - rebase with latest upstream: sync with the timer link list chagne (changeset > 18381), sync with the heap memory allocation change (changeset 18671) > - Per Keir''s comment, more usage is found: use the range timer in HVM virtual > periodic timer (xen/arch/x86/hvm/vpt.c). Since vpt has the logic to handle > missing ticks, it is an ideal place to use range timer. > - the original deferrable timer is renamed to range timer, to better describe > the conecpt. This renaming is inspired by a similar proposal in linux kernel > community (http://kerneltrap.org/mailarchive/linux-kernel/2008/9/1/3160404) > > A preliminary measurement is done in a dual core mobile platform (Montivina). > When creating 4 RHEL5 HVM UP guests, the range timer patch can reduce ~5% > power consumption. More measurement is onging. This post is inteneded to get > more feedback to refine it.You''ll have to explain how the implementation in timer.c works. I''m not sure I believe it really does. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote:> A preliminary measurement is done in a dual core mobile platform (Montivina). > When creating 4 RHEL5 HVM UP guests, the range timer patch can reduce ~5% > power consumption. More measurement is onging. This post is inteneded to get > more feedback to refine it.Is this 5% of total system power consumption? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote:> - Per Keir''s comment, more usage is found: use the range timer in HVM virtual > periodic timer (xen/arch/x86/hvm/vpt.c). Since vpt has the logic to handle > missing ticks, it is an ideal place to use range timer.The fact is that just about any timer in Xen could be much sloppier than it is. What about if we just make slop configurable and change the default up to say 1ms? Then for lower power you could bump it some more at the cost of things like scheduling getting a bit less accurate (and frankly does that matter? :-). My fear with range timers is that it''ll actually creep out to nearly all current users of the timer interface, and they''ll all need to express some slightly suspect policy about how much inaccuracy they can cope with. Fact is, in just about all cases, that''s not an obvious thing to be able to specify. In fact it''s a tradeoff -- e.g., between power consumption and ''accuracy'' -- which is perhaps better centrally managed and why I think some centrally tweakable knob like the existing SLOP macro may actually be a better thing to work with. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2008/10/28 Keir Fraser <keir.fraser@eu.citrix.com>:> On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote: > > You'll have to explain how the implementation in timer.c works. I'm not sure > I believe it really does. > > -- Keir >Oh, fair enough. Let me explain how the range timer is implemented. Basically, the timer softirq handler (timer_softirq_action) need to do two tasks: execute the expired timer and find the nearest unexpired timer deadline to reprogram the hardware timer (local APIC timer in x86). Currently, the timer deadline is single point, and these two tasks are accomplished with the help of timer heap. The timer heap is ordered by the timer deadline, and the first element of the heap always has the nearest deadline. So the two tasks can be done by continuously picking the first element. If the element is expired, then execute. If the element is not expired, then the nearest unexpired deadline is found. Things become a little bit complicated when the timer link list is introduced. Link list is used to store the timer when heap is out of memory. So timer softirq handler need to firstly allocate heap memory and move the timer from link list back to heap, then the rest processing is the same as before. if there is still timer in the link list after the processing, softirq timer handler need to do one more thing: go through the link list one by one to execute the expired timer and find the nearest timer deadline. With the range timer introduced, the timer deadline becomes a range, saying [Tstart, Tend], the timer can be executed at any time in [Tstart,Tend]. Firstly, let's see how the nearest deadline is calculated. We still let the timer heap be ordered by Tstart, i.e. Timer heap is organized as [Ts1, Te1], [Ts2, Te2], …, [TsN,TeN], where Ts1<Ts2<… <TsN. And the nearest deadline range [Ts', Te'] intersection of [Ts1, Te1] [Ts2, Te2] … [TsK, TeK], where K is the largest integer that [Ts', Te'] is not NULL. For example, if there are 4 timers: T1=[100, 103], T2=[101, 104], T3=[102,104], T4=[104, 106], Then the nearest deadline range is: [Ts',Te'] = [102, 103], i.e. Intersection of T1, T2, T3. (if T4 is added, [Ts' Te'] will become NULL), and the timer T1, T2, T3 can executed in range [102, 103]. The following code from range-core.patch do the things above: + start = 0; + end = STIME_MAX; + while ( (GET_HEAP_SIZE(heap) != 0) && + ((t = heap[1])->expires <= end) ) + { + remove_from_heap(heap, t); + + start = t->expires; + if ( end > t->expires_end) + end = t->expires_end; + + t->list_next = ts->ready_list; + ts->ready_list = t; + t->status = TIMER_STATUS_in_ready_list; + } + this_cpu(timer_deadline) = (start + end) / 2; Note that the timer T1, T2, T3 are removed from heap during the calculation, so a ready list is added in "struct timers" to store these ready timers, so that these timers can be executed once the deadline is reached. If there is link list timer, we need to move the link list timer to heap first. If there is ready timer, we also need to move the ready timer to heap first. Heap memory is reallocated if necessary. queue_ready_timer() do the things mentioned above. Another thing is to execute the expired timer. The only extra thing is to execute the ready timer; the rest is similar as the original code path. That is all the required change, not so complicated as it looks, right :) Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2008/10/28 Keir Fraser <keir.fraser@eu.citrix.com>:> On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote: > >> A preliminary measurement is done in a dual core mobile platform (Montivina). >> When creating 4 RHEL5 HVM UP guests, the range timer patch can reduce ~5% >> power consumption. More measurement is onging. This post is inteneded to get >> more feedback to refine it. > > Is this 5% of total system power consumption? > > -- Keir >Yes, it is total system power consumptions, excluding the LCD Monitor. the system power consumption is measured by digital power meter. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 14:40, "Yu Ke" <mr.yuke@gmail.com> wrote:> Another thing is to execute the expired timer. The only extra thingis to> execute the ready timer; the rest is similar as the originalcode path. That> is all the required change, not so complicated as it looks, right :)Yeah, actually this implementation seems to work out quite nicely. I''ll still stay I''m not convinced it''s a good interface for callers though. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 14:43, "Yu Ke" <mr.yuke@gmail.com> wrote:>> Is this 5% of total system power consumption? >> >> -- Keir >> > > Yes, it is total system power consumptions, excluding the LCD Monitor. > the system power consumption is measured by digital power meter.That''s not bad then. I wonder if there are simpler ways to get similar wins however. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2008/10/28 Keir Fraser <keir.fraser@eu.citrix.com>:> On 28/10/08 06:57, "Yu, Ke" <ke.yu@intel.com> wrote: > >> - Per Keir''s comment, more usage is found: use the range timer in HVM virtual >> periodic timer (xen/arch/x86/hvm/vpt.c). Since vpt has the logic to handle >> missing ticks, it is an ideal place to use range timer. > > The fact is that just about any timer in Xen could be much sloppier than it > is. What about if we just make slop configurable and change the default up > to say 1ms? Then for lower power you could bump it some more at the cost of > things like scheduling getting a bit less accurate (and frankly does that > matter? :-).That is doable. my only concern is that the slop configuration is global and will affect all the timer, it may be hard to choose an acceptable and reasonable value. current 50ns TIMER_SLOP may be the up limit. I am not sure if it is appropriate to set the TIME_SLOP to as large as 1ms.> My fear with range timers is that it''ll actually creep out to > nearly all current users of the timer interface, and they''ll all need to > express some slightly suspect policy about how much inaccuracy they can cope > with. Fact is, in just about all cases, that''s not an obvious thing to be > able to specify. In fact it''s a tradeoff -- e.g., between power consumption > and ''accuracy'' -- which is perhaps better centrally managed and why I think > some centrally tweakable knob like the existing SLOP macro may actually be a > better thing to work with. > > -- Keir >the range timer implementation still keep the original set_timer API, which is a special type of range timer whose expire range is [a,a], so the semantic of the set_timer is exactly the same as before. For those user who did not want to tolorate the inaccuracy, they can still use the set_timer as before. and IMHO, the power consumption and inaccuracy trade-off is not a central policy, each user know better about its tolerance, so it may be better to let user to decide. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/10/08 15:15, "Yu Ke" <mr.yuke@gmail.com> wrote:> and IMHO, the power consumption and inaccuracy trade-off is not a > central policy, each user know better about its tolerance, so it may > be better to let user to decide.Yes, I can see there is a fundamental difference in points of view here. I would point out that, in your second patch, it''s not clear there''s any particular reason for the constants chosen in: MIN(pt->period/8, MILLISECS(1)) How did you arrive at this formula? Why 1ms rather than 2ms, 10ms, or 500us? Why 8 rather than 16 or 4? Ultimately the entity that really knows what bounds are reasonable on sloppiness of a guest timer device would be the guest itself: the kernel, or applications running on it, or users interacting with that guest software. Otherwise I think you''re making a somewhat uninformed tradeoff between performance and power. And in that case, if the balance point doesn''t need to be chosen all that accurately, then centralising and hiding the sloppiness seems a good idea to me. Also that allows the potential for easier central tunability: do you want performance more than power, or vice versa? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Tuesday, October 28, 2008 11:37 PM > >On 28/10/08 15:15, "Yu Ke" <mr.yuke@gmail.com> wrote: > >> and IMHO, the power consumption and inaccuracy trade-off is not a >> central policy, each user know better about its tolerance, so it may >> be better to let user to decide. > >Yes, I can see there is a fundamental difference in points of >view here. I >would point out that, in your second patch, it''s not clear there''s any >particular reason for the constants chosen in: > MIN(pt->period/8, MILLISECS(1)) > >How did you arrive at this formula? Why 1ms rather than 2ms, >10ms, or 500us? >Why 8 rather than 16 or 4? Ultimately the entity that really knows what >bounds are reasonable on sloppiness of a guest timer device >would be the >guest itself: the kernel, or applications running on it, or users >interacting with that guest software. Otherwise I think you''re making a >somewhat uninformed tradeoff between performance and power. And in that >case, if the balance point doesn''t need to be chosen all that >accurately, >then centralising and hiding the sloppiness seems a good idea >to me. Also >that allows the potential for easier central tunability: do you want >performance more than power, or vice versa? >Yes, this is a valid concern. Simplicity is better if we''re not sure the gain by making things complex. I agree that a central slop control is cleaner here. In the meantime how about also adding a flag to disable slop per-timer base? Then timers with stricter delivery requirement can add this flag even when global slop is enabled. Or may be this control can be exposed to user by domctl interface, as a per-domain configurable option. Actually range timer doesn''t hurt performance much as immediately imagined, especially for periodical timers. Normally just 1st alignment may not follow assigned interval, and once they''re aligned, later behavior should be similar as before. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/10/08 02:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Yes, this is a valid concern. Simplicity is better if we''re not sure > the gain by making things complex. I agree that a central slop > control is cleaner here. In the meantime how about also adding a flag > to disable slop per-timer base? Then timers with stricter delivery > requirement can add this flag even when global slop is enabled. > Or may be this control can be exposed to user by domctl interface, > as a per-domain configurable option.I actually wonder whether we would get similar to your 5% win by just increasing the SLOP parameter to a fixed 1ms. That would equal your worst-case slop in the second range-timer patch for vpt timers, and I don''t really see why any timer in Xen wouldn''t be able to deal with that. One thing that would be interesting would be some micro-statistics such as average C3 residency time, to go along with your macro-statistic of total power consumption. Seeing the relationship between the two would let us see how much of an effect the residency time has on power consumption and give us something more precise to aim for in terms of desirable residency time. We could even hack the C3 code in Xen to deliberately set long timeouts, so we can measure that relationship more precisely and at extremes. If nothing else it might tell us what sort of slop is worthwhile and, if that''s small enough, perhaps we can just live with it?> Actually range timer doesn''t hurt performance much as immediately > imagined, especially for periodical timers. Normally just 1st > alignment may not follow assigned interval, and once they''re aligned, > later behavior should be similar as before.I like the range-timer implementation, I''m just not convinced about the interface. At the moment, I think I''d rather split the interface into usage types (a bit like you suggest above with your domctl idea) -- e.g., this is a guest timer, or this is a low-rate periodic timer -- and then hide the policy about what is good enough from the consumers of those interfaces. Perhaps the range-timer implementation will still be a useful mechanism behind such interfaces. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2008/10/29 Keir Fraser <keir.fraser@eu.citrix.com>:> On 29/10/08 02:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Yes, this is a valid concern. Simplicity is better if we''re not sure >> the gain by making things complex. I agree that a central slop >> control is cleaner here. In the meantime how about also adding a flag >> to disable slop per-timer base? Then timers with stricter delivery >> requirement can add this flag even when global slop is enabled. >> Or may be this control can be exposed to user by domctl interface, >> as a per-domain configurable option. > > I actually wonder whether we would get similar to your 5% win by just > increasing the SLOP parameter to a fixed 1ms. That would equal your > worst-case slop in the second range-timer patch for vpt timers, and I don''t > really see why any timer in Xen wouldn''t be able to deal with that.Increasing SLOP to 1ms should have the the similar 5% gain, as your analysis, it is the worst case of range timer application in vpt. I can redo the measurement to double confirm.> > One thing that would be interesting would be some micro-statistics such as > average C3 residency time, to go along with your macro-statistic of total > power consumption. Seeing the relationship between the two would let us see > how much of an effect the residency time has on power consumption and give > us something more precise to aim for in terms of desirable residency time. > We could even hack the C3 code in Xen to deliberately set long timeouts, so > we can measure that relationship more precisely and at extremes. If nothing > else it might tell us what sort of slop is worthwhile and, if that''s small > enough, perhaps we can just live with it?That is interesting thing to do. we can measure C3 residency with different TIME_SLOP value, to see relationship between C3 residency and TIME_SLOP.> >> Actually range timer doesn''t hurt performance much as immediately >> imagined, especially for periodical timers. Normally just 1st >> alignment may not follow assigned interval, and once they''re aligned, >> later behavior should be similar as before. > > I like the range-timer implementation, I''m just not convinced about the > interface. At the moment, I think I''d rather split the interface into usage > types (a bit like you suggest above with your domctl idea) -- e.g., this is > a guest timer, or this is a low-rate periodic timer -- and then hide the > policy about what is good enough from the consumers of those interfaces. > Perhaps the range-timer implementation will still be a useful mechanism > behind such interfaces. > > -- Keir >Yes, it is unfair for caller to figure out the range as complex as "MIN( pt->period/8, MILLISECS(1) )". and it would be convenient if there is easy-to-use interface, although to be honest, I have no much idea on how would the interface looks like currently. here is my two cents just for your reference. I have tried range timer in several places: - platform time overflow timer: this timer can be expires anytime ahead of the deadline, so the range is like [deadline-period/n, deadline] - cpufreq ondemand sampling timer: this timer is not so strict to deadline, both ahead or behind deadline a bit is fine, so the range can be like [deadline-period/n, deadline+period/n] - HVM virtual period timer: this timer allow expires behind deadline a bit, because it has logic to handling missing ticks. so the range can be [deadline, deadline + period/n]. to be more conservative, it is safe to add up limit of 1ms. the 1ms comes from the commodity OS frequency, e.g. Linux kernel has 1000/250/100 HZ, and windows has 64 HZ, 1ms should meet the up limit of 1000HZ. the user from various sub-system should have more knowledge on their timer usage model. it would be good if they can provide their requirement. if enough usage model is got, then probably we can uniform then and provide easy-to-use API for them. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> here is my two cents just for your reference. I have tried range timer > in several places: > <snip> > - HVM virtual period timer: this timer allow expires behind deadline a > bit, because it has logic to handling missing ticks. so the range can > be [deadline, deadline + period/n]. to be more conservative, it is > safe to add up limit of 1ms. the 1ms comes from the commodity OS > frequency, e.g. Linux kernel has 1000/250/100 HZ, and windows has 64 > HZ, 1ms should meet the up limit of 1000HZ.A reminder that hvm HPET usage by some Linux kernels has a problem where delivery of a tick sooner than expected can cause the guests apparent time to advance too quickly. Dave Winchell submitted a patch for this some time back that didn''t get accepted (for various reasons) and I''m not sure if this range timer implementation will make it better or worse. But I''m pretty sure that if ticks are delivered at: T+0d+3s, T+1d+2s, T+2d+1s where d=1/Hz and s=slop, the problem will be observed. Dave, can you comment? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu Ke wrote:> 2008/10/29 Keir Fraser <keir.fraser@eu.citrix.com>: >> On 29/10/08 02:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >>> Yes, this is a valid concern. Simplicity is better if we''re not sure >>> the gain by making things complex. I agree that a central slop >>> control is cleaner here. In the meantime how about also adding a >>> flag to disable slop per-timer base? Then timers with stricter >>> delivery requirement can add this flag even when global slop is >>> enabled. >>> Or may be this control can be exposed to user by domctl interface, >>> as a per-domain configurable option. >> >> I actually wonder whether we would get similar to your 5% win by just >> increasing the SLOP parameter to a fixed 1ms. That would equal your >> worst-case slop in the second range-timer patch for vpt timers, and >> I don''t really see why any timer in Xen wouldn''t be able to deal >> with that. > > Increasing SLOP to 1ms should have the the similar 5% gain, as your > analysis, it is the worst case of range timer application in vpt. I > can redo the measurement to double confirm.I have finished the measurement, when TIME_SLOP increase to 1ms, there is similar power consumption gain, this time it is 4% (14.0W vs 14.6W) . By theory 1ms TIMER_SLOP should have more gain than the range timer. The diferrence may be due to the test environment noise. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/10/08 08:08, "Yu, Ke" <ke.yu@intel.com> wrote:>> Increasing SLOP to 1ms should have the the similar 5% gain, as your >> analysis, it is the worst case of range timer application in vpt. I >> can redo the measurement to double confirm. > > I have finished the measurement, when TIME_SLOP increase to 1ms, there is > similar power consumption gain, this time it is 4% (14.0W vs 14.6W) . By > theory 1ms TIMER_SLOP should have more gain than the range timer. The > diferrence may be due to the test environment noise.It may also be because your patch tends to delay the timer deadline somewhat, and so by the time it goes off you actually sometimes have one or two more timers you can deliver in the batch? OTOH it could be experimental noise as you suggest, and certainly this change gets us pretty close to your win. We''d have to do further experiments to see if increasing TIMER_SLOP noticeably degrades system performance, but we''d need to do that with the range-timer approach too. One thing: as Dan pointed out, some things don''t want to get their timeout early, but generally callers are much more tolerant of getting timeouts a bit late. Possibly we should set the timer_deadline to nearest timeout + TIMER_SLOP, and then when executing timers be strict about not executing timers early (or at least no more than the existing 50us ''mini'' slop)? Or perhaps actually having range timers in timer.c is worthwhile for future extensions and for now we can just set every timer to [deadline, deadline+configurable_global_slop]. Then the existing range-timer mechanism ought to find a sensible deadline to aim for, only delaying timer events when there is a benefit to doing so. I''m thinking out loud. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer wrote:>> here is my two cents just for your reference. I have tried range >> timer in several places: <snip> >> - HVM virtual period timer: this timer allow expires behind deadline >> a bit, because it has logic to handling missing ticks. so the range >> can be [deadline, deadline + period/n]. to be more conservative, it >> is safe to add up limit of 1ms. the 1ms comes from the commodity OS >> frequency, e.g. Linux kernel has 1000/250/100 HZ, and windows has 64 >> HZ, 1ms should meet the up limit of 1000HZ. > > A reminder that hvm HPET usage by some Linux kernels has a > problem where delivery of a tick sooner than expected can > cause the guests apparent time to advance too quickly. > Dave Winchell submitted a patch for this some time back > that didn''t get accepted (for various reasons) and I''m > not sure if this range timer implementation will make it > better or worse. > > But I''m pretty sure that if ticks are delivered at: > > T+0d+3s, T+1d+2s, T+2d+1s > > where d=1/Hz and s=slop, the problem will be observed. > > Dave, can you comment? > > DanIn the range-timer-use.patch, vpt timer range is [deadline, deadline+ MIN(period/8, 1ms)], so the tick will only expire later than expected, thus will not trigger the issue you mentioned. Anyway, range timer is just API, caller has the flexibility to decide how to use it. in the HPET case, since we know HPET does not allow early ticks, we can dedicatedly set range to avoid that. If we increasing TIME_SLOP value, HPET ticks may expires sooner than expected. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/10/08 08:31, "Yu, Ke" <ke.yu@intel.com> wrote:> In the range-timer-use.patch, vpt timer range is [deadline, deadline+ > MIN(period/8, 1ms)], so the tick will only expire later than expected, thus > will not trigger the issue you mentioned. Anyway, range timer is just API, > caller has the flexibility to decide how to use it. in the HPET case, since we > know HPET does not allow early ticks, we can dedicatedly set range to avoid > that. > > If we increasing TIME_SLOP value, HPET ticks may expires sooner than expected.Yeah, this is an argument for hacking in a new kind of slop for deadline setting, or building on range timers (but keeping the existing timer interface). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
2008/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:> On 30/10/08 08:08, "Yu, Ke" <ke.yu@intel.com> wrote: > >>> Increasing SLOP to 1ms should have the the similar 5% gain, as your >>> analysis, it is the worst case of range timer application in vpt. I >>> can redo the measurement to double confirm. >> >> I have finished the measurement, when TIME_SLOP increase to 1ms, there is >> similar power consumption gain, this time it is 4% (14.0W vs 14.6W) . By >> theory 1ms TIMER_SLOP should have more gain than the range timer. The >> diferrence may be due to the test environment noise. > > It may also be because your patch tends to delay the timer deadline > somewhat, and so by the time it goes off you actually sometimes have one or > two more timers you can deliver in the batch? OTOH it could be experimental > noise as you suggest, and certainly this change gets us pretty close to your > win. We''d have to do further experiments to see if increasing TIMER_SLOP > noticeably degrades system performance, but we''d need to do that with the > range-timer approach too.oh yes, you remind me that the range timer also has 50ns TIMER_SLOP, this may also help. and yes, I am doing the range timer performance test, to make sure it does not hurt performance. I can also test the performance of 1ms TIMER_SLOP.> > One thing: as Dan pointed out, some things don''t want to get their timeout > early, but generally callers are much more tolerant of getting timeouts a > bit late. Possibly we should set the timer_deadline to nearest timeout + > TIMER_SLOP, and then when executing timers be strict about not executing > timers early (or at least no more than the existing 50us ''mini'' slop)?yes, this way can avoid the issue Dan pointed out.> > Or perhaps actually having range timers in timer.c is worthwhile for future > extensions and for now we can just set every timer to [deadline, > deadline+configurable_global_slop]. Then the existing range-timer mechanism > ought to find a sensible deadline to aim for, only delaying timer events > when there is a benefit to doing so.I am fine with configurable global slop, although some timer may not benefit much from this. e.g. the 50HZ (20ms) cpufreq DBS timer, whose timer range can be at least [deadline, deadline+5ms] :) Best Regards Ke> > I''m thinking out loud. :-) > > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu Ke wrote:> 2008/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >> On 30/10/08 08:08, "Yu, Ke" <ke.yu@intel.com> wrote: >> >> >> Or perhaps actually having range timers in timer.c is worthwhile for >> future extensions and for now we can just set every timer to >> [deadline, deadline+configurable_global_slop]. Then the existing >> range-timer mechanism ought to find a sensible deadline to aim for, >> only delaying timer events when there is a benefit to doing so. > > I am fine with configurable global slop, although some timer may not > benefit much from this. e.g. the 50HZ (20ms) cpufreq DBS timer, whose > timer range can be at least [deadline, deadline+5ms] :) >Looks I misunderstand your points. Do you mean using the range timer implementation to implement the interface of [deadline, deadline+configurable_global_slop]? If so, I would more prefer this approach and would be happy to revise the patch:) Compared to the first approach, i.e. setting the timer_deadline to nearest timeout +TIMER_SLOP TIME_SLOP , range timer implementation does have advantage. In the first approach, the timer would be always delayed, even when those timers are aligned. While range timer would only delay those timer at the first time, and then would expires without delay. For example, think about two 250HZ timers t1, t2, which will expire at T and T+1ms. Suppose TIMER_SLOP is 1ms, and suppose apic timer deadline is set to T+1ms. In the first apporach, the apic timer will expire at: T+1ms: t1 and t2 is executed, and deadline become T+1ms+4ms, and apic timer deadline is set to T+1ms+4ms+TIMER_SLOP (T+6ms) T+6ms: t1 and t2 is executed, and dealine become T+6ms+4ms, and apic timder deadline is set to T+6ms+4ms+TIMER_SLOP (T+11ms) T+11ms:...... In the range timer appraoch, the apic timer will expire at: T+1ms: t1 and t2 is executed, and deadline range become [T+5ms, T+6ms], and apic timder deadline is set to T+5ms T+5ms: t1 and t2 is executed, and deadline range become [T+9ms, T+10ms], and apic timder deadline is set to T+9ms T+9ms: ......>From the above scenario, we can see t1 and t2 is always delayed for 1ms in the first approach, while not in range timer approach. so range timer is more close to the semantic of original timer.Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/08 02:23, "Yu, Ke" <ke.yu@intel.com> wrote:> Looks I misunderstand your points. Do you mean using the range timer > implementation to implement the interface of [deadline, > deadline+configurable_global_slop]? If so, I would more prefer this approach > and would be happy to revise the patch:)I mean that set_timer() would always queue a range timer with window of size configurable_global_slop. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 31/10/08 02:23, "Yu, Ke" <ke.yu@intel.com> wrote: > >> Looks I misunderstand your points. Do you mean using the range timer >> implementation to implement the interface of [deadline, >> deadline+configurable_global_slop]? If so, I would more prefer this >> approach and would be happy to revise the patch:) > > I mean that set_timer() would always queue a range timer with window > of size configurable_global_slop. > > -- KeirI see. Then I will start to revise the range timer patch as follow: - remove the TIMER_SLOP macro - add "global timer slop" xen option, and make set_timer to use the global slop. - remove the set_range_timer API, since it is not easy to use currently Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/08 07:50, "Yu, Ke" <ke.yu@intel.com> wrote:> I see. Then I will start to revise the range timer patch as follow: > - remove the TIMER_SLOP macro > - add "global timer slop" xen option, and make set_timer to use the global > slop. > - remove the set_range_timer API, since it is not easy to use currentlySounds reasonable. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 31/10/08 07:50, "Yu, Ke" <ke.yu@intel.com> wrote: > >> I see. Then I will start to revise the range timer patch as follow: >> - remove the TIMER_SLOP macro >> - add "global timer slop" xen option, and make set_timer to use the >> global slop. >> - remove the set_range_timer API, since it is not easy to use >> currently > > Sounds reasonable. > > -- KeirThis is the updated range timer patch. I have done basic test on 50ns and 1ms timer slop, both looks fine. Best Regards Ke ========================== Add range timer support this patch convert the timer into range timer, which can expires at any time within the range. A configurable global timer slop is introduced to config the timer''s default range as [deadline, deadline + timer_slop]. Signed-off-by: Yu Ke <ke.yu@intel.com> Wei Gang <gang.wei@intel.com> diff -r bec755616e8e xen/arch/x86/hpet.c --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -13,8 +13,6 @@ #include <asm/fixmap.h> #include <asm/div64.h> #include <asm/hpet.h> - -#define STIME_MAX ((s_time_t)((uint64_t)~0ull>>1)) #define MAX_DELTA_NS MILLISECS(10*1000) #define MIN_DELTA_NS MICROSECS(20) diff -r bec755616e8e xen/common/timer.c --- a/xen/common/timer.c +++ b/xen/common/timer.c @@ -25,13 +25,15 @@ * We pull handlers off the timer list this far in future, * rather than reprogramming the time hardware. */ -#define TIMER_SLOP (50*1000) /* ns */ +static unsigned int timer_slop __read_mostly = 50000; /* 50 ns */ +integer_param("timer_slop", timer_slop); struct timers { spinlock_t lock; struct timer **heap; struct timer *list; struct timer *running; + struct timer *ready_list; /* ready timers for next fire */ } __cacheline_aligned; static DEFINE_PER_CPU(struct timers, timers); @@ -85,6 +87,33 @@ static void up_heap(struct timer **heap, t->heap_offset = pos; } +/* Grow heap memory. Return the allocated heap, NULL if failed */ +static struct timer** grow_heap(struct timers *ts) +{ + int old_limit, new_limit; + struct timer **heap, **newheap; + + heap = ts->heap; + + /* old_limit == (2^n)-1; new_limit == (2^(n+4))-1 */ + old_limit = GET_HEAP_LIMIT(heap); + new_limit = ((old_limit + 1) << 4) - 1; + + newheap = xmalloc_array(struct timer *, new_limit + 1); + + if ( newheap != NULL ) + { + spin_lock_irq(&ts->lock); + memcpy(newheap, heap, (old_limit + 1) * sizeof(*heap)); + SET_HEAP_LIMIT(newheap, new_limit); + ts->heap = newheap; + spin_unlock_irq(&ts->lock); + if ( old_limit != 0 ) + xfree(heap); + } + + return newheap; +} /* Delete @t from @heap. Return TRUE if new top of heap. */ static int remove_from_heap(struct timer **heap, struct timer *t) @@ -177,6 +206,9 @@ static int remove_entry(struct timers *t case TIMER_STATUS_in_list: rc = remove_from_list(&timers->list, t); break; + case TIMER_STATUS_in_ready_list: + rc = remove_from_list(&timers->ready_list, t); + break; default: rc = 0; BUG(); @@ -202,6 +234,20 @@ static int add_entry(struct timers *time /* Fall back to adding to the slower linked list. */ t->status = TIMER_STATUS_in_list; return add_to_list(&timers->list, t); +} + +static void move_list_to_heap(struct timers *ts, struct timer **list) +{ + struct timer *t, *next; + + next = *list; + *list = NULL; + while ( (t = next) != NULL ) + { + next = t->list_next; + t->status = TIMER_STATUS_inactive; + add_entry(ts, t); + } } static inline void __add_timer(struct timer *timer) @@ -247,7 +293,7 @@ static inline void timer_unlock(struct t #define timer_unlock_irqrestore(t, flags) \ do { timer_unlock(t); local_irq_restore(flags); } while ( 0 ) - +/* Set timer that can expire in period [expires, expires + timer_slop] */ void set_timer(struct timer *timer, s_time_t expires) { unsigned long flags; @@ -257,14 +303,14 @@ void set_timer(struct timer *timer, s_ti if ( active_timer(timer) ) __stop_timer(timer); - timer->expires = expires; + timer->expires = expires; + timer->expires_end = expires + timer_slop; if ( likely(timer->status != TIMER_STATUS_killed) ) __add_timer(timer); timer_unlock_irqrestore(timer, flags); } - void stop_timer(struct timer *timer) { @@ -343,54 +389,92 @@ void kill_timer(struct timer *timer) cpu_relax(); } +static void queue_ready_timer(struct timers* ts) +{ + struct timer *t, **heap; + s_time_t start, end; + + if ( unlikely (ts->ready_list != NULL) ) + move_list_to_heap(ts, &ts->ready_list); + + while ( unlikely (ts->list != NULL) ) + { + spin_unlock_irq(&ts->lock); + if ( unlikely (grow_heap(ts) == NULL) ) + { + printk(XENLOG_ERR "timer heap grow failed\n"); + spin_lock_irq(&ts->lock); + break; + } + spin_lock_irq(&ts->lock); + + move_list_to_heap(ts, &ts->list); + } + + heap = ts->heap; + start = 0; + end = STIME_MAX; + + while ( (GET_HEAP_SIZE(heap) != 0) && + ((t = heap[1])->expires <= end) ) + { + remove_from_heap(heap, t); + + start = t->expires; + if ( end > t->expires_end) + end = t->expires_end; + + t->list_next = ts->ready_list; + ts->ready_list = t; + t->status = TIMER_STATUS_in_ready_list; + } + this_cpu(timer_deadline) = start; +} + static void timer_softirq_action(void) { - struct timer *t, **heap, *next; + struct timer *t, **heap; struct timers *ts; - s_time_t now, deadline; + s_time_t now; void (*fn)(void *); void *data; ts = &this_cpu(timers); - heap = ts->heap; /* If we are using overflow linked list, try to allocate a larger heap. */ if ( unlikely(ts->list != NULL) ) - { - /* old_limit == (2^n)-1; new_limit == (2^(n+4))-1 */ - int old_limit = GET_HEAP_LIMIT(heap); - int new_limit = ((old_limit + 1) << 4) - 1; - struct timer **newheap = xmalloc_array(struct timer *, new_limit + 1); - if ( newheap != NULL ) - { - spin_lock_irq(&ts->lock); - memcpy(newheap, heap, (old_limit + 1) * sizeof(*heap)); - SET_HEAP_LIMIT(newheap, new_limit); - ts->heap = newheap; - spin_unlock_irq(&ts->lock); - if ( old_limit != 0 ) - xfree(heap); - heap = newheap; - } - } + grow_heap(ts); + heap = ts->heap; spin_lock_irq(&ts->lock); /* Try to move timers from overflow linked list to more efficient heap. */ - next = ts->list; - ts->list = NULL; - while ( unlikely((t = next) != NULL) ) - { - next = t->list_next; - t->status = TIMER_STATUS_inactive; - add_entry(ts, t); - } + move_list_to_heap (ts, &ts->list); now = NOW(); + /* Execute ready timer first */ + if ( this_cpu(timer_deadline) < now ) + { + while ( likely((t = ts->ready_list)!= NULL) ) + { + fn = t->function; + data = t->data; + + ts->ready_list = t->list_next; + t->status = TIMER_STATUS_inactive; + + ts->running = t; + + spin_unlock_irq(&ts->lock); + (*fn)(data); + spin_lock_irq(&ts->lock); + } + } + while ( (GET_HEAP_SIZE(heap) != 0) && - ((t = heap[1])->expires < (now + TIMER_SLOP)) ) + ((t = heap[1])->expires < now ) ) { remove_entry(ts, t); @@ -404,16 +488,10 @@ static void timer_softirq_action(void) spin_lock_irq(&ts->lock); } - deadline = GET_HEAP_SIZE(heap) ? heap[1]->expires : 0; - while ( unlikely((t = ts->list) != NULL) ) { - if ( t->expires >= (now + TIMER_SLOP) ) - { - if ( (deadline == 0) || (deadline > t->expires) ) - deadline = t->expires; + if ( t->expires >= now ) break; - } ts->list = t->list_next; t->status = TIMER_STATUS_inactive; @@ -430,8 +508,8 @@ static void timer_softirq_action(void) ts->running = NULL; - this_cpu(timer_deadline) = deadline; - if ( !reprogram_timer(deadline) ) + queue_ready_timer(ts); + if ( !reprogram_timer(this_cpu(timer_deadline)) ) raise_softirq(TIMER_SOFTIRQ); spin_unlock_irq(&ts->lock); diff -r bec755616e8e xen/include/xen/time.h --- a/xen/include/xen/time.h +++ b/xen/include/xen/time.h @@ -52,6 +52,7 @@ struct tm gmtime(unsigned long t); #define SECONDS(_s) ((s_time_t)((_s) * 1000000000ULL)) #define MILLISECS(_ms) ((s_time_t)((_ms) * 1000000ULL)) #define MICROSECS(_us) ((s_time_t)((_us) * 1000ULL)) +#define STIME_MAX ((s_time_t)((uint64_t)~0ull>>1)) extern void update_vcpu_system_time(struct vcpu *v); extern void update_domain_wallclock_time(struct domain *d); diff -r bec755616e8e xen/include/xen/timer.h --- a/xen/include/xen/timer.h +++ b/xen/include/xen/timer.h @@ -15,6 +15,7 @@ struct timer { struct timer { /* System time expiry value (nanoseconds since boot). */ s_time_t expires; + s_time_t expires_end; /* Position in active-timer data structure. */ union { @@ -36,6 +37,7 @@ struct timer { #define TIMER_STATUS_killed 1 /* Not in use; canot be activated. */ #define TIMER_STATUS_in_heap 2 /* In use; on timer heap. */ #define TIMER_STATUS_in_list 3 /* In use; on overflow linked list. */ +#define TIMER_STATUS_in_ready_list 4 /* In use; on ready linked list. */ uint8_t status; }; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/08 08:37, "Yu, Ke" <ke.yu@intel.com> wrote:>> Sounds reasonable. >> >> -- Keir > > This is the updated range timer patch. I have done basic test on 50ns and 1ms > timer slop, both looks fine.Thanks. I moved the implementation around quite a bit, and checked it in as c/s 18744. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel