Scheduling Groups Scheduling groups provide the ability to combine domains into a group assocation. One domain is designated as the group master. The other domains are designated as group members. There may be only one master domain for each group, and one or more member domains. A single domain may be either a group master or group member, and never both at the same time. scheduling groups are visible to Xen as a new sub-operation within the domain control Scheduling op. When xen receives a Scheduling op containing the sched-group sub op, it dispatches to the generic scheduling layer, which then calls the active scheduler with the sub-ob information. The grouping/ungrouping, accounting, and scheduling work is done in the active scheduler, whichever that may be. By delegating the processing of the sched-group sub op to the specific scheduler (as opposed to the generic scheduling layer), scheduling groups are opaque to the rest of Xen and each scheduler is free to implement grouping how it sees fit. This also keeps the bulk of the code changes within a specific scheduler implementation and limits the changes to other areas of Xen. There are several possible implementations of scheduling groups. For example, groups could be used as the foundation for a secondary scheduler, where the first level schedules groups, and the second level schedules domains within groups. A good use case is creating a group in which the domains need to be scheduled in a specific order. For example, a stub domain hosting a device model must run prior the the hvm domain it is servicing. Scheduling groups might also be used for support of real-time guests. All real-time guests could be placed in one group which will scheduled pre-emptively and at a finer granularity than other domains. This patchset is limited to groups sharing the master domain''s timeslices in the credit scheduler. The three patches that follow are organized as follows: 1/4: Implements the sched-group sub op within the domctrl scheduling hypercall. 2/4: The credit implementation of scheduling groups. 3/4: Tools update for supporting scheduling groups. 4/4:sched-group C utility program to add and remove domains to and from groups. Mike -- Mike D. Day Virtualization Architect and Sr. Technical Staff Member, IBM LTC Cell: 919 412-3900 ST: mdday@us.ibm.com | AIM: ncmikeday | Yahoo IM: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Mike I have one question about this issue. Is there any performance improvement results are available? Thanks Atsushi SAKAI "Mike D. Day" <ncmike@us.ibm.com> wrote:> Scheduling Groups >[snip]> Scheduling groups might also be used for support of real-time > guests. All real-time guests could be placed in one group which will > scheduled pre-emptively and at a finer granularity than other domains. > > This patchset is limited to groups sharing the master domain''s > timeslices in the credit scheduler. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/05/07 09:06 +0900, Atsushi SAKAI wrote:>Hi, Mike > >I have one question about this issue. >Is there any performance improvement results are available? > >Thanks >Atsushi SAKAISakai, I will perform some micro benchmarks with the patch and without, and with grouped domains and no groups. I expect there will be no performance benefit and no detriment. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Mike Then I have a question. When your scheduler patch effects performance improvement? (like improvement of various latency for Driver Domain etc.) Thanks Atsushi SAKAI "Mike D. Day" <ncmike@us.ibm.com> wrote:> On 11/05/07 09:06 +0900, Atsushi SAKAI wrote: > >Hi, Mike > > > >I have one question about this issue. > >Is there any performance improvement results are available? > > > >Thanks > >Atsushi SAKAI > > Sakai, > > I will perform some micro benchmarks with the patch and without, and > with grouped domains and no groups. > > I expect there will be no performance benefit and no detriment. > > Mike > > -- > Mike D. Day > IBM LTC > Cell: 919 412-3900 > Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner > PGP key: http://www.ncultra.org/ncmike/pubkey.asc_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/05/07 22:22 +0900, Atsushi SAKAI wrote:>Hi, Mike > > Then I have a question. > When your scheduler patch effects performance improvement? >(like improvement of various latency for Driver Domain etc.)Sakai, After these first patches get accepted I plan on implementing a feature where one domain can tell the scheduler it would like to run *after* a different domain in the same group. Xen already does this for dom0, but we also think it will improve performance for hvm domains when they use a stub domain for device emulation. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/5/07 20:41, "Mike D. Day" <ncmike@us.ibm.com> wrote:> After these first patches get accepted I plan on implementing a > feature where one domain can tell the scheduler it would like to run > *after* a different domain in the same group. > > Xen already does this for dom0Does it? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/05/07 22:11 +0100, Keir Fraser wrote:>On 11/5/07 20:41, "Mike D. Day" <ncmike@us.ibm.com> wrote: > >> After these first patches get accepted I plan on implementing a >> feature where one domain can tell the scheduler it would like to run >> *after* a different domain in the same group. >> >> Xen already does this for dom0 > >Does it?When an hvm domain does an emulated io, xen creates an event which is delivered to the idle domain (xen), which in turn triggers a soft interrupt on domain 0. In effect dom0 is immediately scheduled. Dom 0 then emulates the io while the hvm domain is sleeping. I assume the same holds for other i/o events to domain zero, although I haven''t looked at that code. This is the kind of scheduling effect that would be nice to have for a stub domain in relation to an hvm domian. But the stub domain doesn''t have any special status that schedules it to run immediately. -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/5/07 01:16, "Mike D. Day" <ncmike@us.ibm.com> wrote:>>> After these first patches get accepted I plan on implementing a >>> feature where one domain can tell the scheduler it would like to run >>> *after* a different domain in the same group. >>> >>> Xen already does this for dom0 >> >> Does it? > > When an hvm domain does an emulated io, xen creates an event which is > delivered to the idle domain (xen), which in turn triggers a soft > interrupt on domain 0. In effect dom0 is immediately scheduled. Dom 0 > then emulates the io while the hvm domain is sleeping.I thought we decoded the instruction to be emulated in Xen (in the context of the HVM domain, so current==hvm-domain) then packaged it up in a shared-memory page and notified qemu-dm in dom0 via an event channel. To my knowledge there''s no special treatment of this event channel or of dom0: the notification is treated just like any wake-up of any arbitrary domain. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/05/07 08:43 +0100, Keir Fraser wrote:>On 12/5/07 01:16, "Mike D. Day" <ncmike@us.ibm.com> wrote: > >I thought we decoded the instruction to be emulated in Xen (in the context >of the HVM domain, so current==hvm-domain) then packaged it up in a >shared-memory page and notified qemu-dm in dom0 via an event channel. To my >knowledge there''s no special treatment of this event channel or of dom0: the >notification is treated just like any wake-up of any arbitrary domain. >Hi Keir, thanks for clarifying. Has anyone considered giving priority to dom0 (or any domain) when a domU is blocking on completion of an event being handled by dom0? Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >I thought we decoded the instruction to be emulated in Xen (in the > context > >of the HVM domain, so current==hvm-domain) then packaged it up in a > >shared-memory page and notified qemu-dm in dom0 via an event channel. > To my > >knowledge there''s no special treatment of this event channel or of > dom0: the > >notification is treated just like any wake-up of any arbitrarydomain.> > Hi Keir, thanks for clarifying. Has anyone considered giving priority > to dom0 (or any domain) when a domU is blocking on completion of an > event being handled by dom0?It''s not entirely clear that eagerly scheduling dom0 is the best thing to do -- in fact, some of Lucy''s data suggested that schedulers that tended to be less eager to pre-empt promoted more batching and hence better throughput. My preferred way of tackling this is to make promotion of batching more explicit by implementing either a ''deferred send'' or ''lazy receive''[*] for event channel notifications. If we have these, it shouldn''t matter if we''re eager to schedule dom0, for h/w interrupts, which intuitively what we should be doing. It might be instructive to implement strict priority for dom0 and confirm that it makes things worse under at least some IO workloads (particularly network ones). It would then be nice to implement deferred send or lazy receive to then see whether this gives us the best of both worlds. Ian [*] deferred send would enable a domain to request that a notification be sent in X nanoseconds, or whenever it blocks, which ever comes sooner. If it has one of these deferred notifications outstanding it can still trigger the notification immediately by using the normal send event call. [only when the notification actually happens does the receiving domain unblock and hence be eligible to be selected by the scheduler] Lazy receive has a similar effect to deferred send, and is very similar to what modern NICs do. It would enable a domain to say that for a particular event channel it wants to be unblocked X nanoseconds after the event becomes pending rather than immediately. There are pros and cons to both approaches, and they''re not necessarily mutually exclusive. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''m happy to see other folks interested in the concept of domain groups because it relates to some of my work. We seem to share many common goals. Particularly, we both made use of the hypervisor to support groups of domains. My first impressions: Master/Slave Roles ------------------ While the master/slave model is certainly popular, it is not the only relationship between domains. Consider a groups of peers. One way to associate domains without an implicit hierarchy is to make groups a first-class object, just like domains. You can anchor scheduling information against the group instead of on a single domain. This approach also has the benefit of allowing other projects to use group information for more than scheduling. Usability --------- Assuming I read your patches correctly, the intended use is that administrators must remember which domains are masters and which are members (and do so by domid). Since it''s easy to lose track of domids, I suggest augmenting xm list and the supporting infrastructure to provide cues to the administrator about group membership and roles. Also, it would seem sched-group.c supports specification of members by domid only. Support for using domain names and uuids would be useful because domids do change. Assigning group names and group uuids quickly becomes essential too. Imagine the chore of finding one member domain among many groups using only the master''s domid. Reboots, migration, suspend/resume and save/restore all change the domid, making it much more difficult. This leads me to my next questions... Migration --------- What happens when any of the group members migrate? Ditto for suspend/resume and save/restore. Automatically rebuilding groups after these events is crucial. So, it would be nice to see preservation of the group association across the full lifecycle of domains. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Mike I am plan to do this from scheduler view not driver-domain view. (but My plan is investigate Lucy''s GbE(web server) problem only) If just dom0 boost is needed, attached patch is the first place to start this investigation. (boost dom0 only) And I think it should study dynamic (from fixed 30msec) dispatch, (I plan to do this.) before considering preemption. c.f. As for my scheduler issue, weight for vcpu-pin problem(already posted this ML as RFC) is first priority. dom0 tuning is secoond priority.... And my study is smp ia64 not up x86. Thanks Atsushi SAKAI "Mike D. Day" <ncmike@us.ibm.com> wrote:> Hi Keir, thanks for clarifying. Has anyone considered giving priority > to dom0 (or any domain) when a domU is blocking on completion of an > event being handled by dom0?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day
2007-May-15 14:05 UTC
[Xen-devel] Re: 0/4 Xen Scheduling Groups - some microbenchmarks
On 10/05/07 17:33 -0400, Mike D. Day wrote:>Scheduling Groups > >Scheduling groups provide the ability to combine domains into a group >assocation. One domain is designated as the group master. The other >domains are designated as group members. There may be only one master >domain for each group, and one or more member domains. A single domain >may be either a group master or group member, and never both at the >same time. >I ran Rusty Russell''s virtbench on xen-unstable without the scheduling groups patches, and with the patches. http://ozlabs.org/~rusty/virtbench/ Virtbench automatically creates four paravirtual guests and runs a series of microbenchmarks, including some that test inter-guest communication. I ran these benchmarks on an athlon x2 Linux svm 2.6.18-xen #1 SMP Tue Apr 24 11:01:39 EDT 2007 x86_64 GNU/Linux The results are mixed - the unpatched credit scheduler is faster on some microbenchmarks and slower on others. Smaller == better UNPATCHED RESULTS Bringing up machines.. Time for one context switch via pipe: 6082 (6014 - 6160) Time for one Copy-on-Write fault: 5628 (5472 - 5783) Time to exec client once: 581656 (562984 - 598656) Time for one fork/exit/wait: 357625 (353625 - 365812) Time to send 4 MB from host: 21401750 (20443000 - 31179250) Time for one syscall via libc: 684 (684 - 685) Time to walk linear 64 MB: 686312 (682437 - 689125) Time to walk random 64 MB: 945250 (940750 - 948875) Time for one outb PIO operation: 298 (298 - 300) Time for two PTE updates: 5250 (5228 - 5264) Time to read from disk (256 kB): 7879125 (5765500 - 18967062) Time for one disk read: 25246 (24839 - 35062) Time to send 4 MB between guests: 16729375 (14412437 - 30318562) Time for inter-guest pingpong: 70152 (68960 - 104968) Time to sendfile 4 MB between guests: 46993000 (16927000 - 380530000) Time to receive 1000 1k UDPs between guests: 47102000 (11709000 - 3427813000) mdday@svm:~/src/virtbench$ WITH SCHEDULING GROUP PATCHES Bringing up machines.. Time for one context switch via pipe: 6117 (6016 - 6279) Time for one Copy-on-Write fault: 5566 (5438 - 5740) Time to exec client once: 584031 (569593 - 595093) Time for one fork/exit/wait: 332406 (321125 - 350375) Time to send 4 MB from host: 21453250 (20431500 - 29317750) Time for one syscall via libc: 683 (683 - 690) Time to walk linear 64 MB: 688312 (684000 - 691625) Time to walk random 64 MB: 931125 (929250 - 933125) Time for one outb PIO operation: 282 (282 - 292) Time for two PTE updates: 5184 (5174 - 5204) Time to read from disk (256 kB): 7963437 (5850375 - 17305687) Time for one disk read: 25332 (24867 - 65437) Time to send 4 MB between guests: 15529187 (13590187 - 34647875) Time for inter-guest pingpong: 81570 (68687 - 98175) Time to sendfile 4 MB between guests: 36325000 (16996000 - 503692000) Time to receive 1000 1k UDPs between guests: 43529000 (10414000 - 228768000) The only code that is in the fast patch is a conditional: static inline struct csched_dom *master_dom(struct csched_dom *d) { if ( d->is_member ) return d->master; return d; } I''m going to make a change to remove this conditional from the fast path and retest. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/05/07 18:27 -0400, Chris wrote:>I''m happy to see other folks interested in the concept of domain groups >because it relates to some of my work. We seem to share many common >goals. Particularly, we both made use of the hypervisor to support >groups of domains. My first impressions: > >Master/Slave Roles >------------------ >While the master/slave model is certainly popular, it is not the only >relationship between domains. Consider a groups of peers. One way to >associate domains without an implicit hierarchy is to make groups a >first-class object, just like domains. You can anchor scheduling >information against the group instead of on a single domain. This >approach also has the benefit of allowing other projects to use group >information for more than scheduling.This is a good idea. When I made my first attempt at scheduling groups I put the infrastructure in place to make groups a first-class object. However, I decided to rewrite the code to push most the changes into the credit scheduler because doing so dramatically reduced the size of the patches, and it made the patches more discrete and modular. I feel that starting out with a minimal patchset that meets the specific requirements (accurate scheduling of stub domains) was the right first step. Extending the concept of groups within Xen is a good next step.>Usability >--------- >Assuming I read your patches correctly, the intended use is that >administrators must remember which domains are masters and which are >members (and do so by domid). Since it''s easy to lose track of domids, >I suggest augmenting xm list and the supporting infrastructure to >provide cues to the administrator about group membership and roles.Yes, that is definitely needed.>Also, it would seem sched-group.c supports specification of members by >domid only. Support for using domain names and uuids would be useful >because domids do change.Yes, very true. This is probably best implemented in the tools themselves.>Assigning group names and group uuids quickly becomes essential too. >Imagine the chore of finding one member domain among many groups using >only the master''s domid. Reboots, migration, suspend/resume and >save/restore all change the domid, making it much more difficult. This >leads me to my next questions... > >Migration >--------- >What happens when any of the group members migrate? Ditto for >suspend/resume and save/restore. Automatically rebuilding groups after >these events is crucial. So, it would be nice to see preservation of >the group association across the full lifecycle of domains.Right now groups are an ephemeral object. They are destroyed when the group master domain is destroyed (or migrated). Groups are easy to re-compose using a hypercall, and I haven''t seen a use case that would disallow re-composition of groups after a short interval rather than abosolute persistence. Without the requirement absolute persistence I''m not sure this too also cannot be best be implemented using tools. What are you using domain groups for? Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mike D. Day wrote:>> Also, it would seem sched-group.c supports specification of members by >> domid only. Support for using domain names and uuids would be useful >> because domids do change. > > Yes, very true. This is probably best implemented in the tools > themselves.Agreed. The vast majority of the implementation should be in the tools. As a matter identifying groups from within the VMM, it is important for groups to have a uuid in the VMM. It is useful for the VMM to know that a migrated group is indeed the same as the original.>> What happens when any of the group members migrate? Ditto for >> suspend/resume and save/restore. Automatically rebuilding groups after >> these events is crucial. So, it would be nice to see preservation of >> the group association across the full lifecycle of domains. > > Right now groups are an ephemeral object. They are destroyed when the > group master domain is destroyed (or migrated). Groups are easy to > re-compose using a hypercall, and I haven''t seen a use case that would > disallow re-composition of groups after a short interval rather than > abosolute persistence. Without the requirement absolute persistence > I''m not sure this too also cannot be best be implemented using tools.Much of my rational for persistence is an effort to reduce burden on system administrators. Although manually reconstructing groups is easy with few small groups, it still creates extra work. Consider life as an data center administrator with upcoming planned downtime. When migrating a rack full of domains, the administrator doesn''t want to remember which domains belong together. Migration of the entire group as a single unit is a major win for the administrator in terms of reducing the number of moving parts he has to care about. Another reason for making groups a first class object is that the administrator doesn''t need to care which scheduler the migration target VMM is using.> What are you using domain groups for?We''re building a virtualization platform in which groups play a central role. Not too long ago I submitted a set of patches with a generic domain group implementation that strives to address the issues I''ve begun to raise. Scheduling is a natural place for groups. Groups are also applicable to security policy. VMM support for a generic group abstraction allows simplifications of the security policy and the architecture that supports it. I believe groups provide the biggest value as a generic abstraction that''s flexible enough for many purposes. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel