Credit scheduler uses a 10ms timer for vcpu accounting, and 30ms for system wide accounting. One puzzle from me is whether current accounting is a bit rough. One vcpu may only execute small quantum, say 1ms, and then scheduled out. However it''s credit may be counted down by 100 due to 10ms timer expired in between. Then another vcpu may execute 18ms, but also with 100 credits substracted if, only one tick is hit. Will this result unfair credit assurance in some pattern? For example: cpu0 A: 75 (spin, under) -> current B: 75 (spin, under) C: 75 (spin, under) ---- D: 75 (io, under) -> block ''A'' first execute 8ms, and then D is waken up: cpu0 D: 75 (io, under) -> current B: 75 (spin, under) C: 75 (spin, under) A: 75 (spin, under) -> credit is still 75 ''D'' execute 1ms, and then sleep again. Now B runs: cpu0 B: 75 (spin, under) -> current C: 75 (spin, under) A: 75 (spin, under) -> credit is still 75 ---- D: 75 (io, under) -> sleep ''B'' execute 2ms, with csched_tick triggered in between. Then ''D'' is waken up again: cpu0 D: 75 (io, under) -> current C: 75 (spin, under) A: 75 (spin, under) -> credit is still 75 B: -25 (spin, over) -> lower priority Then the net effect is in last accounting cycle (30ms), ''B'' is put in a lower priority compared to other spin vcpus. Not sure whether this is an over-sensitive concern in real workload, since above is just one assumed scenario in my mind. Maybe in reality above transient unfairness will be fixed in a long run, from average P.O.V. Simply from design point of view, how much overhead may add to schedule phase if adding fine-grained accounting there? The accounting logic in csched_vcpu_acct seems simple enough. csched_cpu_pick may be still kept in this 10ms tick, or relax it to 30ms is also OK? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/6/08 06:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Then the net effect is in last accounting cycle (30ms), ''B'' is put > in a lower priority compared to other spin vcpus. Not sure whether > this is an over-sensitive concern in real workload, since above > is just one assumed scenario in my mind. Maybe in reality above > transient unfairness will be fixed in a long run, from average P.O.V. > > Simply from design point of view, how much overhead may add > to schedule phase if adding fine-grained accounting there? The > accounting logic in csched_vcpu_acct seems simple enough. > csched_cpu_pick may be still kept in this 10ms tick, or relax it > to 30ms is also OK?I''m not really sure that the credit scheduler needs to be tick-based. Why not account at nanosecond granularity and do away with the arbitrary tick granularity? Some degree of hysteresis or minimum scheduling granularity could be used to avoid an unnecessarily high rate of context switches. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: 2008年6月19日 15:27 > >On 19/6/08 06:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Then the net effect is in last accounting cycle (30ms), ''B'' is put >> in a lower priority compared to other spin vcpus. Not sure whether >> this is an over-sensitive concern in real workload, since above >> is just one assumed scenario in my mind. Maybe in reality above >> transient unfairness will be fixed in a long run, from average P.O.V. >> >> Simply from design point of view, how much overhead may add >> to schedule phase if adding fine-grained accounting there? The >> accounting logic in csched_vcpu_acct seems simple enough. >> csched_cpu_pick may be still kept in this 10ms tick, or relax it >> to 30ms is also OK? > >I''m not really sure that the credit scheduler needs to be >tick-based. Why >not account at nanosecond granularity and do away with the >arbitrary tick >granularity? Some degree of hysteresis or minimum scheduling >granularity >could be used to avoid an unnecessarily high rate of context switches. >I recalled some post from Emmanuel to mention that split accounting from context switch path, to reduce overhead. System wide accouting may be worthy with a 30ms timer, but at least vcpu accounting can be carried in context switch path easily which is light enough. Or maybe whole ticks can be removed as long as a roughly 30ms accounting interval is ensured by tweaking the scheduler timer. Is that whe you suggested? Or was there any early experiment showing some badness if not doing tick based style? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/6/08 08:54, "Tian, Kevin" <kevin.tian@intel.com> wrote:> I recalled some post from Emmanuel to mention that split accounting > from context switch path, to reduce overhead. System wide accouting > may be worthy with a 30ms timer, but at least vcpu accounting can > be carried in context switch path easily which is light enough. > > Or maybe whole ticks can be removed as long as a roughly 30ms > accounting interval is ensured by tweaking the scheduler timer. Is > that whe you suggested? > > Or was there any early experiment showing some badness if not doing > tick based style?I don''t think the no-ticks alternative was ever implemented or measured. We already mess with stop_timer/set_timer and calculating current system time within the context-switch path. So we''re already well set up to the accounting on that path, imo. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: 2008年6月19日 16:04 > >On 19/6/08 08:54, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> I recalled some post from Emmanuel to mention that split accounting >> from context switch path, to reduce overhead. System wide accouting >> may be worthy with a 30ms timer, but at least vcpu accounting can >> be carried in context switch path easily which is light enough. >> >> Or maybe whole ticks can be removed as long as a roughly 30ms >> accounting interval is ensured by tweaking the scheduler timer. Is >> that whe you suggested? >> >> Or was there any early experiment showing some badness if not doing >> tick based style? > >I don''t think the no-ticks alternative was ever implemented or >measured. > >We already mess with stop_timer/set_timer and calculating >current system >time within the context-switch path. So we''re already well set >up to the >accounting on that path, imo. >Agree. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel