Hi, On running on a dual/quad core does the Xen scheduler take into account the physical layout of the cores. For example if a VM has two vcpus, and there are 4 physical cpus free, will it take care to assign the 2vcpus (from a VM) to 2 pcpus on the same socket. Thanks - Prabha _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 4/21/07, pak333@comcast.net <pak333@comcast.net> wrote:> > Hi, > > On running on a dual/quad core does the Xen scheduler take into account > the physical layout of the cores. > For example if a VM has two vcpus, and there are 4 physical cpus free, > will it take care to assign the 2vcpus (from a VM) to 2 pcpus on the same > socket. >Shouldn''t this is be concealed.You may never know which physical CPU is actualyl running a vcpu. It may get scheduled on to a different physical CPU depending on the load on the other CPUs. CMIIW ~psr Thanks> - Prabha > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- --- pradeep singh rautela "Genius is 1% inspiration, and 99% perspiration" - not me :) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 21/04/07 06:03 +0000, pak333@comcast.net wrote:> > Hi, > > > > On running on a dual/quad core does the Xen scheduler take into > account the physical layout of the cores. > > For example if a VM has two vcpus, and there are 4 physical cpus > free, will it take care to assign the 2vcpus (from a VM) to 2 pcpus > on the same socket.The scheduler only knows the affinity of vcpus for physical cpus. The affinity is determined by a userspace application and can be modified using a domain control hypercall. Look in xen/common/domctl.c around line 568 for the following: case XEN_DOMCTL_setvcpuaffinity: case XEN_DOMCTL_getvcpuaffinity: When the credit scheduler migrates a vcpu to a pcpu, it only considers pcpus for which the affinity bit is set. If the userspace application sets affinity such that only the bits set for pcpus on the same socket, then the vcpu will only run on pcpu''s sharing the same socket. Mike -- Mike D. Day IBM LTC Cell: 919 412-3900 Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner PGP key: http://www.ncultra.org/ncmike/pubkey.asc _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks. A little more clarification. Here is an example. I have multiple VMs, each with 2 vcpus. There is no user affinity. So i will let the vcpus run whereever the Xen scheduler chooses. My system has 2 dual core sockets. If all 4 pcpus are idle, then will the scheduler assign the vcpus of a VM to the same socket pcpus. If while running, 2 pcpus from different sockets become available, the scheduler will assign 2 vcpus to those two pcpus. Does the scheduler do any optimization as to moving the vcpus of a vm to the same socket or just assign the vcpus as they become ready. or if 3 pcpus are idle will the scheduler assign vcpus from a VM to the same socket. Basically all my questions boil down to this: Does the Scheduler know about the pcpu layout(same socket) and does it do any scheduling based on that. Thanks Prabha -------------- Original message -------------- From: "Mike D. Day" <ncmike@us.ibm.com>> On 21/04/07 06:03 +0000, pak333@comcast.net wrote: > > > > Hi, > > > > > > > > On running on a dual/quad core does the Xen scheduler take into > > account the physical layout of the cores. > > > > For example if a VM has two vcpus, and there are 4 physical cpus > > free, will it take care to assign the 2vcpus (from a VM) to 2 pcpus > > on the same socket. > > > The scheduler only knows the affinity of vcpus for physical > cpus. The affinity is determined by a userspace application and can > be modified using a domain control hypercall. Look in > xen/common/domctl.c around line 568 for the following: > > case XEN_DOMCTL_setvcpuaffinity: > case XEN_DOMCTL_getvcpuaffinity: > > > > When the credit scheduler migrates a vcpu to a pcpu, it only considers > pcpus for which the affinity bit is set. If the userspace application > sets affinity such that only the bits set for pcpus on the same > socket, then the vcpu will only run on pcpu''s sharing the same > socket. > > > Mike > > -- > Mike D. Day > IBM LTC > Cell: 919 412-3900 > Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: ultra.runner > PGP key: http://www.ncultra.org/ncmike/pubkey.asc > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > pak333@comcast.net > Sent: 23 April 2007 20:34 > To: ncmike@us.ibm.com > Cc: Mike D. Day; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Re: Xen scheduler > > Thanks. A little more clarification. > > Here is an example. > > I have multiple VMs, each with 2 vcpus. There is no user > affinity. So i will let the vcpus run whereever the Xen > scheduler chooses. My system has 2 dual core sockets. > > If all 4 pcpus are idle, then will the scheduler assign > the vcpus of a VM to the same socket pcpus. > If while running, 2 pcpus from different sockets become > available, the scheduler will assign 2 vcpus to those two > pcpus. Does the scheduler do any optimization as to moving > the vcpus of a vm to the same socket or just assign the vcpus > as they become ready.Xen''s current schedulers doesn''t have any clue about CPU cores and their relationship to sockets, memory locations or any other such things. So if you get two VCPUS from the same domain or two different domains on one of your physical dual core CPU is entirely random, and will remain so at any time a VCPU is rescheduled. Whichever PCPU happens to be ready when the VCPU is schedule will be used unless you specifically restrict the VCPU to a (set of) PCPU(s).> > or if 3 pcpus are idle will the scheduler assign vcpus from a > VM to the same socket.It will assign "any VCPU to any PCPU that is allowed for that VCPU", and it doesn''t really care which VM or which socket any particular VCPU/PCPU combination belongs to.> > Basically all my questions boil down to this: Does the > Scheduler know about the pcpu layout(same socket) and does it > do any scheduling based on that.Not at present. There''s been some discussions on this, and whilst it''s easy to solve some of the obvious cases, there are also some harder nuts to crack. What do you do when the system is really busy and there''s not a "good" PCPU to schedule a particular VCPU on - do you wait for the PCPU that is ideal to become available, or do you schedule it on a less ideal PCPU? How long do you allow the wait for that ideal PCPU? Whilst it''s easy to say "Just do it right", solving the rather hairy problems of when there''s congestion and making the right "judgement" of the situation is much harder. -- Mats> > Thanks > Prabha > > > > > > > > -------------- Original message -------------- > From: "Mike D. Day" <ncmike@us.ibm.com> > > > On 21/04/07 06:03 +0000, pak333@comcast.net wrote: > > > > > > Hi, > > > > > > > > > > > > On running on a dual/quad core does the Xen > scheduler take into > > > account the physical layout of the cores. > > > > > > For example if a VM has two vcpus, and there are 4 > physical cpus > > > free, will it take care to assign the 2vcpus (from > a VM) to 2 pcpus > > > on the same socket. > > > > > > The scheduler only knows the affinity of vcpus for physical > > cpus. The affinity is determined by a userspace > application and can > > be modified using a domain control hypercall. Look in > > xen/common/domctl.c around line 568 for the following: > > > > case XEN_DOMCTL_setvc puaffinity: > > case XEN_DOMCTL_getvcpuaffinity: > > > > > > > > When the credit scheduler migrates a vcpu to a pcpu, > it only considers > > pcpus for which the affinity bit is set. If the > userspace application > > sets affinity such that only the bits set for pcpus > on the same > > socket, then the vcpu will only run on pcpu''s sharing > the same > > socket. > > > > > > Mike > > > > -- > > Mike D. Day > > IBM LTC > > Cell: 919 412-3900 > > Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: > ultra.runner > > PGP key: http://www.ncultra.org/ncmike/pubkey.asc > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Apr 23, 2007, at 21:33, pak333@comcast.net wrote:> Basically all my questions boil down to this: Does the Scheduler know > about the pcpu layout(same socket) and does it do any scheduling based > on that.Yes but not how you suggested. The scheduler actually tries to schedule VCPUs across multiple sockets before it "co-schedules" a socket. The idea behind this is to maximize the achievable memory bandwidth. On hyperthreaded systems, the scheduler will also attempt to schedule across cores before it co-schedules hyperthreads. This is to maximize achievable cycles. At this time, no attempt is made to schedule 2 VCPUs of the same VM any differently than 2 VCPUs of distinct VMs. If you feel two VCPUs would do better co-scheduled on a core or socket, you''d currently have to use cpumasks -- as mike suggested -- to restrict where they can run manually. I''d be curious to know of real world cases where doing this increases performance significantly. Hope this helps. Cheers, Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Emmanuel Ackaouy > Sent: 24 April 2007 15:35 > To: pak333@comcast.net > Cc: ncmike@us.ibm.com; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Re: Xen scheduler > > On Apr 23, 2007, at 21:33, pak333@comcast.net wrote: > > Basically all my questions boil down to this: Does the > Scheduler know > > about the pcpu layout(same socket) and does it do any > scheduling based > > on that. > > Yes but not how you suggested. > > The scheduler actually tries to schedule VCPUs across multiple > sockets before it "co-schedules" a socket. The idea behind this is > to maximize the achievable memory bandwidth. > > On hyperthreaded systems, the scheduler will also attempt to > schedule across cores before it co-schedules hyperthreads. This > is to maximize achievable cycles. > > At this time, no attempt is made to schedule 2 VCPUs of the > same VM any differently than 2 VCPUs of distinct VMs. > > If you feel two VCPUs would do better co-scheduled on a > core or socket, you''d currently have to use cpumasks -- as > mike suggested -- to restrict where they can run manually. I''d > be curious to know of real world cases where doing this > increases performance significantly.If you have data-sharing between the apps on the same socket, and a shared L2 or L3 cache, and the application/data fits in the cache, I could see that it would help. [And of course, the OS for example will have some data and code-sharing between CPU''s - so some application where a lot of time is spent in the OS itself would be benefitting from "socket sharing"]. For other applications, having better memory bandwitch is most likely better. Of course, for ideal performance, it would also have to be taken into account which CPU owns the memory being used, as the latency of transferring memory from one CPU to another in a NUMA system can affect the performance quite noticably. -- Mats> > Hope this helps. > > Cheers, > Emmanuel. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mats, As I''ve just explained in my previous post to the mailing list, what you say is not totally correct. On partially idle systems, the credit scheduler does schedule across sockets and cores before co-scheduling them. At least it used to but I''ve not looked at the code in a while so perhaps someone changed this? On a system with no idle PCPU, when time slicing becomes necessary, what you say is true: The scheduler doesn''t care which PCPU hosts which VCPU. On NUMA systems, this means a VCPU may not always run closest to the memory it is using at the time. On Apr 24, 2007, at 13:07, Petersson, Mats wrote:> > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >> pak333@comcast.net >> Sent: 23 April 2007 20:34 >> To: ncmike@us.ibm.com >> Cc: Mike D. Day; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] Re: Xen scheduler >> >> Thanks. A little more clarification. >> >> Here is an example. >> >> I have multiple VMs, each with 2 vcpus. There is no user >> affinity. So i will let the vcpus run whereever the Xen >> scheduler chooses. My system has 2 dual core sockets. >> >> If all 4 pcpus are idle, then will the scheduler assign >> the vcpus of a VM to the same socket pcpus. >> If while running, 2 pcpus from different sockets become >> available, the scheduler will assign 2 vcpus to those two >> pcpus. Does the scheduler do any optimization as to moving >> the vcpus of a vm to the same socket or just assign the vcpus >> as they become ready. > > Xen''s current schedulers doesn''t have any clue about CPU cores and > their > relationship to sockets, memory locations or any other such things. So > if you get two VCPUS from the same domain or two different domains on > one of your physical dual core CPU is entirely random, and will remain > so at any time a VCPU is rescheduled. Whichever PCPU happens to be > ready > when the VCPU is schedule will be used unless you specifically restrict > the VCPU to a (set of) PCPU(s). >> >> or if 3 pcpus are idle will the scheduler assign vcpus from a >> VM to the same socket. > > It will assign "any VCPU to any PCPU that is allowed for that VCPU", > and > it doesn''t really care which VM or which socket any particular > VCPU/PCPU > combination belongs to. >> >> Basically all my questions boil down to this: Does the >> Scheduler know about the pcpu layout(same socket) and does it >> do any scheduling based on that. > > Not at present. There''s been some discussions on this, and whilst it''s > easy to solve some of the obvious cases, there are also some harder > nuts > to crack. What do you do when the system is really busy and there''s not > a "good" PCPU to schedule a particular VCPU on - do you wait for the > PCPU that is ideal to become available, or do you schedule it on a less > ideal PCPU? How long do you allow the wait for that ideal PCPU? > > Whilst it''s easy to say "Just do it right", solving the rather hairy > problems of when there''s congestion and making the right "judgement" of > the situation is much harder. > > -- > Mats >> >> Thanks >> Prabha >> >> >> >> >> >> >> >> -------------- Original message -------------- >> From: "Mike D. Day" <ncmike@us.ibm.com> >> >> > On 21/04/07 06:03 +0000, pak333@comcast.net wrote: >> > > >> > > Hi, >> > > >> > > >> > > >> > > On running on a dual/quad core does the Xen >> scheduler take into >> > > account the physical layout of the cores. >> > > >> > > For example if a VM has two vcpus, and there are 4 >> physical cpus >> > > free, will it take care to assign the 2vcpus (from >> a VM) to 2 pcpus >> > > on the same socket. >> > >> > >> > The scheduler only knows the affinity of vcpus for physical >> > cpus. The affinity is determined by a userspace >> application and can >> > be modified using a domain control hypercall. Look in >> > xen/common/domctl.c around line 568 for the following: >> > >> > case XEN_DOMCTL_setvc puaffinity: >> > case XEN_DOMCTL_getvcpuaffinity: >> > >> > >> > >> > When the credit scheduler migrates a vcpu to a pcpu, >> it only considers >> > pcpus for which the affinity bit is set. If the >> userspace application >> > sets affinity such that only the bits set for pcpus >> on the same >> > socket, then the vcpu will only run on pcpu''s sharing >> the same >> > socket. >> > >> > >> > Mike >> > >> > -- >> > Mike D. Day >> > IBM LTC >> > Cell: 919 412-3900 >> > Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: >> ultra.runner >> > PGP key: http://www.ncultra.org/ncmike/pubkey.asc >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel >> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Emmanuel Ackaouy [mailto:ackaouy@gmail.com] > Sent: 24 April 2007 15:47 > To: Petersson, Mats > Cc: pak333@comcast.net; xen-devel@lists.xensource.com; > ncmike@us.ibm.com > Subject: Re: [Xen-devel] Re: Xen scheduler > > Mats, > > As I''ve just explained in my previous post to the mailing list, what > you say is not totally correct.Yes, I realized from your previous post that what I said was not entirely correctly.> > On partially idle systems, the credit scheduler does schedule across > sockets and cores before co-scheduling them. At least it used to but > I''ve not looked at the code in a while so perhaps someone changed > this?Not that I''m aware of. I apologize for the misinformation.> > On a system with no idle PCPU, when time slicing becomes > necessary, what you say is true: The scheduler doesn''t care > which PCPU hosts which VCPU. On NUMA systems, this > means a VCPU may not always run closest to the memory > it is using at the time. > > On Apr 24, 2007, at 13:07, Petersson, Mats wrote: > > > > > > >> -----Original Message----- > >> From: xen-devel-bounces@lists.xensource.com > >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > >> pak333@comcast.net > >> Sent: 23 April 2007 20:34 > >> To: ncmike@us.ibm.com > >> Cc: Mike D. Day; xen-devel@lists.xensource.com > >> Subject: Re: [Xen-devel] Re: Xen scheduler > >> > >> Thanks. A little more clarification. > >> > >> Here is an example. > >> > >> I have multiple VMs, each with 2 vcpus. There is no user > >> affinity. So i will let the vcpus run whereever the Xen > >> scheduler chooses. My system has 2 dual core sockets. > >> > >> If all 4 pcpus are idle, then will the scheduler assign > >> the vcpus of a VM to the same socket pcpus. > >> If while running, 2 pcpus from different sockets become > >> available, the scheduler will assign 2 vcpus to those two > >> pcpus. Does the scheduler do any optimization as to moving > >> the vcpus of a vm to the same socket or just assign the vcpus > >> as they become ready. > > > > Xen''s current schedulers doesn''t have any clue about CPU cores and > > their > > relationship to sockets, memory locations or any other such > things. So > > if you get two VCPUS from the same domain or two different > domains on > > one of your physical dual core CPU is entirely random, and > will remain > > so at any time a VCPU is rescheduled. Whichever PCPU happens to be > > ready > > when the VCPU is schedule will be used unless you > specifically restrict > > the VCPU to a (set of) PCPU(s). > >> > >> or if 3 pcpus are idle will the scheduler assign vcpus from a > >> VM to the same socket. > > > > It will assign "any VCPU to any PCPU that is allowed for > that VCPU", > > and > > it doesn''t really care which VM or which socket any particular > > VCPU/PCPU > > combination belongs to. > >> > >> Basically all my questions boil down to this: Does the > >> Scheduler know about the pcpu layout(same socket) and does it > >> do any scheduling based on that. > > > > Not at present. There''s been some discussions on this, and > whilst it''s > > easy to solve some of the obvious cases, there are also some harder > > nuts > > to crack. What do you do when the system is really busy and > there''s not > > a "good" PCPU to schedule a particular VCPU on - do you wait for the > > PCPU that is ideal to become available, or do you schedule > it on a less > > ideal PCPU? How long do you allow the wait for that ideal PCPU? > > > > Whilst it''s easy to say "Just do it right", solving the rather hairy > > problems of when there''s congestion and making the right > "judgement" of > > the situation is much harder. > > > > -- > > Mats > >> > >> Thanks > >> Prabha > >> > >> > >> > >> > >> > >> > >> > >> -------------- Original message -------------- > >> From: "Mike D. Day" <ncmike@us.ibm.com> > >> > >> > On 21/04/07 06:03 +0000, pak333@comcast.net wrote: > >> > > > >> > > Hi, > >> > > > >> > > > >> > > > >> > > On running on a dual/quad core does the Xen > >> scheduler take into > >> > > account the physical layout of the cores. > >> > > > >> > > For example if a VM has two vcpus, and there are 4 > >> physical cpus > >> > > free, will it take care to assign the 2vcpus (from > >> a VM) to 2 pcpus > >> > > on the same socket. > >> > > >> > > >> > The scheduler only knows the affinity of vcpus for physical > >> > cpus. The affinity is determined by a userspace > >> application and can > >> > be modified using a domain control hypercall. Look in > >> > xen/common/domctl.c around line 568 for the following: > >> > > >> > case XEN_DOMCTL_setvc puaffinity: > >> > case XEN_DOMCTL_getvcpuaffinity: > >> > > >> > > >> > > >> > When the credit scheduler migrates a vcpu to a pcpu, > >> it only considers > >> > pcpus for which the affinity bit is set. If the > >> userspace application > >> > sets affinity such that only the bits set for pcpus > >> on the same > >> > socket, then the vcpu will only run on pcpu''s sharing > >> the same > >> > socket. > >> > > >> > > >> > Mike > >> > > >> > -- > >> > Mike D. Day > >> > IBM LTC > >> > Cell: 919 412-3900 > >> > Sametime: ncmike@us.ibm.com AIM: ncmikeday Yahoo: > >> ultra.runner > >> > PGP key: http://www.ncultra.org/ncmike/pubkey.asc > >> > > >> > _______________________________________________ > >> > Xen-devel mailing list > >> > Xen-devel@lists.xensource.com > >> > http://lists.xensource.com/xen-devel > >> > >> > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Apr 24, 2007, at 16:42, Petersson, Mats wrote:>> If you feel two VCPUs would do better co-scheduled on a >> core or socket, you''d currently have to use cpumasks -- as >> mike suggested -- to restrict where they can run manually. I''d >> be curious to know of real world cases where doing this >> increases performance significantly. > > If you have data-sharing between the apps on the same socket, and a > shared L2 or L3 cache, and the application/data fits in the cache, I > could see that it would help. [And of course, the OS for example will > have some data and code-sharing between CPU''s - so some application > where a lot of time is spent in the OS itself would be benefitting > from "socket sharing"]. > > For other applications, having better memory bandwitch is most likely > better. > > Of course, for ideal performance, it would also have to be taken into > account which CPU owns the memory being used, as the latency of > transferring memory from one CPU to another in a NUMA system can > affect the performance quite noticably.I understand in theory what would do better scheduled in either of these ways. What I''m interested in learning about is actual applications that people use that exhibit the type of L2/3 cache sharing that would make it significantly better to co-schedule the VCPUs in question on whole sockets rather than across them. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel