thr3ads.net - Xen devel - [Xen-devel] planned csched improvements? [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2009-Oct-09 14:53 UTC

[Xen-devel] planned csched improvements?

After the original announcement of plans to do some work on csched there
wasn''t much activity, so I''d like to ask about some
observations that I made
with the current implementation and whether it would be expected that
those planned changes would take care of them.

On a lightly loaded many-core non-hyperthreaded system (e.g. a single
CPU bound process in one VM, and only some background load elsewhere),
I see this CPU bound vCPU permanently switch between sockets, which is
a result of csched_cpu_pick() eagerly moving vCPU-s to "more idle"
sockets. It would seem that some minimal latency consideration might be
useful to get added here, so that a very brief interruption by another
vCPU doesn''t result in unnecessary migration.

As a consequence of that eager moving, in the vast majority of cases
the vCPU in question then (within a very short period of time) either
triggers a cascade of other vCPU migrations, or begins a series of
ping-pongs between (usually two) pCPU-s - until things settle again for
a while. Again, some minimal latency added here might help avoiding
that.

Finally, in the complete inverse scenario of severely overcommitted
systems (more than two fully loaded vCPU-s per pCPU) I frequently
see Linux'' softlockup watchdog kick in, now and then even resulting
in the VM hanging. I had always thought that starvation of a vCPU
for several seconds shouldn''t be an issue that early - am I wrong
here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Oct-09 15:59 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Fri, Oct 9, 2009 at 3:53 PM, Jan Beulich <JBeulich@novell.com>
wrote:> After the original announcement of plans to do some work on csched there
> wasn''t much activity, so I''d like to ask about some
observations that I made
> with the current implementation and whether it would be expected that
> those planned changes would take care of them.
There has been activity, but nothing worth sharing yet. :-)  I''m
working on the new "fairness" algorithm (perhaps called credits,
perhaps not), which is a prerequisite for any further work such as
load-balancing, power consumption, and so on.  Unfortunately, I
haven''t been able to work on it for more than a week at a time for the
last several months before being interrupted with other work-related
tasks. :-(

Re the items you bring up below: I believe that my planned changes to
load-balancing should address the first.  First, I plan on making all
cores which share an L2 cache to share a runqueue.  This will
automatically share work among those cores without needing any special
load-balancing to be done.  Then, I plan on actually calculating:
* The per-runqueue load over the last time period
* The amount each vcpu is contributing to that load.
Then load balancing won''t be a matter of looking at the instantaneous
runqueue lengths (as it is currently) but to the actual amount of
"business" the runqueue has over a period of time.  Load balancing
will be just that: actually moving vcpus around to make the loads more
balanced.  Balancing operations will happen at fixed intervals, rather
than "whenever a runqueue is idle".

But those are just plans now; not a line of code has been written, and
schedulers especially are notorious for the Law of Unexpected
Consequences.

Re soft-lockups: That really shouldn''t be possible with the current
scheduler; if it happens, it''s a bug.  Have you pulled from
xen-unstable recently?  There was a bug introduced a few weeks ago
that would cause problems; Keir checked in a fix for that one last
week.  Otherwise, if you''re sure it''s not a long hypercall
issue,
there must be a bug somewhere.

The new scheduler will be an almost complete re-write; so it will
probably erase this bug, and introduce its own bugs.  However, I doubt
it will be ready by 3.5, so it''s probably worth tracking down and
fixing if we can.

Hope that answers your question. :-)

 -George

> On a lightly loaded many-core non-hyperthreaded system (e.g. a single
> CPU bound process in one VM, and only some background load elsewhere),
> I see this CPU bound vCPU permanently switch between sockets, which is
> a result of csched_cpu_pick() eagerly moving vCPU-s to "more
idle"
> sockets. It would seem that some minimal latency consideration might be
> useful to get added here, so that a very brief interruption by another
> vCPU doesn''t result in unnecessary migration.
>
> As a consequence of that eager moving, in the vast majority of cases
> the vCPU in question then (within a very short period of time) either
> triggers a cascade of other vCPU migrations, or begins a series of
> ping-pongs between (usually two) pCPU-s - until things settle again for
> a while. Again, some minimal latency added here might help avoiding
> that.
>
> Finally, in the complete inverse scenario of severely overcommitted
> systems (more than two fully loaded vCPU-s per pCPU) I frequently
> see Linux'' softlockup watchdog kick in, now and then even
resulting
> in the VM hanging. I had always thought that starvation of a vCPU
> for several seconds shouldn''t be an issue that early - am I wrong
> here?
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Oct-10 08:03 UTC

head link

RE: [Xen-devel] planned csched improvements?

>From: Jan Beulich
>Sent: 2009年10月9日 22:54
>
>After the original announcement of plans to do some work on 
>csched there
>wasn't much activity, so I'd like to ask about some 
>observations that I made
>with the current implementation and whether it would be expected that
>those planned changes would take care of them.
>
>On a lightly loaded many-core non-hyperthreaded system (e.g. a single
>CPU bound process in one VM, and only some background load elsewhere),
>I see this CPU bound vCPU permanently switch between sockets, which is
>a result of csched_cpu_pick() eagerly moving vCPU-s to "more idle"
>sockets. It would seem that some minimal latency consideration might be
>useful to get added here, so that a very brief interruption by another
>vCPU doesn't result in unnecessary migration.
there's a migration delay (default is 1ms) to judge cache hotness and
thus avoid unnecessary migration. However so far it's only checked 
when one cpu wants to steal vcpus from other runqueue. Possibly it
makes sense to add this check to csched_vcpu_acct, as a cold cache
and cascade of other VCPU migrations could easily beat benefit on a 
"more idle" socket.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Oct-12 07:27 UTC

head link

RE: [Xen-devel] planned csched improvements?

>>> "Tian, Kevin" <kevin.tian@intel.com> 10.10.09 10:03
>>>
>>From: Jan Beulich
>>On a lightly loaded many-core non-hyperthreaded system (e.g. a single
>>CPU bound process in one VM, and only some background load elsewhere),
>>I see this CPU bound vCPU permanently switch between sockets, which is
>>a result of csched_cpu_pick() eagerly moving vCPU-s to "more
idle"
>>sockets. It would seem that some minimal latency consideration might be
>>useful to get added here, so that a very brief interruption by another
>>vCPU doesn''t result in unnecessary migration.
>
>there''s a migration delay (default is 1ms) to judge cache hotness
and
>thus avoid unnecessary migration. However so far it''s only checked 
>when one cpu wants to steal vcpus from other runqueue. Possibly it
>makes sense to add this check to csched_vcpu_acct, as a cold cache
>and cascade of other VCPU migrations could easily beat benefit on a 
>"more idle" socket.
Where do you see this 1ms delay - I can''t seem to spot it...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Oct-12 07:31 UTC

head link

Re: [Xen-devel] planned csched improvements?

>>> George Dunlap <George.Dunlap@eu.citrix.com> 09.10.09 17:59
>>>
>Re soft-lockups: That really shouldn''t be possible with the current
>scheduler; if it happens, it''s a bug.  Have you pulled from
>xen-unstable recently?  There was a bug introduced a few weeks ago
>that would cause problems; Keir checked in a fix for that one last
>week.  Otherwise, if you''re sure it''s not a long hypercall
issue,
>there must be a bug somewhere.
The testing that had exposed this was done in late July, on 3.3.1. I''ll
have
to re-do this on up-to-date -unstable then, and post results if the issue
does reproduce there.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Oct-12 07:32 UTC

head link

RE: [Xen-devel] planned csched improvements?

>From: Jan Beulich [mailto:JBeulich@novell.com] 
>Sent: 2009年10月12日 15:28
>
>>>> "Tian, Kevin" <kevin.tian@intel.com> 10.10.09
10:03 >>>
>>>From: Jan Beulich
>>>On a lightly loaded many-core non-hyperthreaded system (e.g. a
single
>>>CPU bound process in one VM, and only some background load 
>elsewhere),
>>>I see this CPU bound vCPU permanently switch between 
>sockets, which is
>>>a result of csched_cpu_pick() eagerly moving vCPU-s to "more
idle"
>>>sockets. It would seem that some minimal latency 
>consideration might be
>>>useful to get added here, so that a very brief interruption 
>by another
>>>vCPU doesn't result in unnecessary migration.
>>
>>there's a migration delay (default is 1ms) to judge cache hotness
and
>>thus avoid unnecessary migration. However so far it's only checked 
>>when one cpu wants to steal vcpus from other runqueue. Possibly it
>>makes sense to add this check to csched_vcpu_acct, as a cold cache
>>and cascade of other VCPU migrations could easily beat benefit on a 
>>"more idle" socket.
>
>Where do you see this 1ms delay - I can't seem to spot it...
>
Sorry, that's default value in my memory. However taking a look at
code doesn't give it.

/*
 * Delay, in microseconds, between migrations of a VCPU between PCPUs.
 * This prevents rapid fluttering of a VCPU between CPUs, and reduces the
 * implicit overheads such as cache-warming. 1ms (1000) has been measured
 * as a good value.
 */
static unsigned int vcpu_migration_delay;
integer_param("vcpu_migration_delay", vcpu_migration_delay);

It's just the comment saying that. You may try to add that boot option 
for a try. :-)

Thanks,
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Oct-12 07:34 UTC

head link

Re: [Xen-devel] planned csched improvements?

On 12/10/2009 08:27, "Jan Beulich" <JBeulich@novell.com> wrote:
>> there''s a migration delay (default is 1ms) to judge cache
hotness and
>> thus avoid unnecessary migration. However so far it''s only
checked
>> when one cpu wants to steal vcpus from other runqueue. Possibly it
>> makes sense to add this check to csched_vcpu_acct, as a cold cache
>> and cascade of other VCPU migrations could easily beat benefit on a
>> "more idle" socket.
> 
> Where do you see this 1ms delay - I can''t seem to spot it...
The option of interest is vcpu_migration_delay, but it defaults to zero
(disabled). Intel explicitly set it to 1000 (1ms) for their tests.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Oct-17 00:16 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Fri, 9 Oct 2009 16:59:25 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> There has been activity, but nothing worth sharing yet. :-)  I''m
> working on the new "fairness" algorithm (perhaps called credits,
> perhaps not), which is a prerequisite for any further work such as
> load-balancing, power consumption, and so on.  Unfortunately, I
> haven''t been able to work on it for more than a week at a time for
the
> last several months before being interrupted with other work-related
> tasks. :-(
Incidentally, I''ve been thinking of a schedular plugin for database and
other similar apps. Prelimanary DB benchmarks on xen vs bare metal are 
not as good. Of course, I am not blaming schedular for it. As with most 
big user apps, DB is very multi-threaded, and as such it does lot of 
tricks to get the OS schedular to play it it''s way.
I may be getting my hands on a large box in few weeks to bring up xen
and do some scalability work. Hope to find some low hanging fruit.

thanks
Mukesh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Oct-19 09:34 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Sat, Oct 17, 2009 at 1:16 AM, Mukesh Rathor <mukesh.rathor@oracle.com>
wrote:> Incidentally, I''ve been thinking of a schedular plugin for
database and
> other similar apps. Prelimanary DB benchmarks on xen vs bare metal are
> not as good. Of course, I am not blaming schedular for it. As with most
> big user apps, DB is very multi-threaded, and as such it does lot of
> tricks to get the OS schedular to play it it''s way.
> I may be getting my hands on a large box in few weeks to bring up xen
> and do some scalability work. Hope to find some low hanging fruit.
Scalability for Xen past 8 logical processors (where 1 hyperthread is
1 schedulable unit) is likely to be poor, due to the load-balancing
algorithm.

Regarding a Xen scheduler plug-in for DB applications, it seems to me
it would be best to understand the characteristics of the DB workload
and how they respond to different kinds of contention.  There may be a
few surprises; for example, a workload that you assumed was CPU-bound
may in fact be making many qemu-handled operations, so it''s really
blocking thousands of times per second.  If we can make the default
scheduler handle DB workloads well without making a special plug-in,
that would be preferrable.

Would you be willing, if you have the time, to help "beta-test" a new
scheduler with a DB workload and compare it to the old one?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Oct-20 00:01 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Mon, 19 Oct 2009 10:34:16 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:
....> Scalability for Xen past 8 logical processors (where 1 hyperthread is
> 1 schedulable unit) is likely to be poor, due to the load-balancing
> algorithm.Yeah, I''ve been thinking in the back of my mind, some sort of multiple
runqueus, with vcpu priority adjustments with cpu usage, but at the same
time allow certain vcpus to maintain certain level of minimum usage. This
for cases where an app has pinned a high priority thread to a vcpu.
I''ve
not looked at existing xen scheds, so may be it''s already doing that.
More
runqueues is less locking, but more load balancing, so may be something
that''s tunable on the fly. Some thoughts at high level.....
> Regarding a Xen scheduler plug-in for DB applications, it seems to me
> it would be best to understand the characteristics of the DB workload
> and how they respond to different kinds of contention.  There may be a
> few surprises; for example, a workload that you assumed was CPU-bound
> may in fact be making many qemu-handled operations, so it''s really
> blocking thousands of times per second.  If we can make the default
> scheduler handle DB workloads well without making a special plug-in,
> that would be preferrable.
Agree. I''m hoping to collect all that information over the next
couple/few
months.  The last attempt, made a year ago, didn''t yield in a whole lot
of information because of problems with 32bit tools and 64bit guest apps 
interaction. 
In a nutshell, there''s tremendous smarts in the DB, and so I think it 
prefers a simplified schedular/OS that it can provide hints to and interact 
a little with.  Ideally, it would like ability for a privileged thread
to tell the OS/hyp, I want to yield cpu to thread #xyz. 

Moreover, my focus is large, 32 to 128 logical processors, with 1/2 
to 1TB memory.  As such, I also want to address VCPUs being confined 
to logical block of physical CPUs, taking into consideration that 
licenses are per physical cpu core. Also, it''s important for a 
cluster heartbeat thread to get cpu at expected times, otherwise, 
it starts to freak out. Apparently, we are seeing some of that during 
live migrations. Waiting on more info on that myself.
> Would you be willing, if you have the time, to help "beta-test" a
new
> scheduler with a DB workload and compare it to the old one?
Yeah, sure. I hope to have a setup in few weeks. 

thanks,
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Oct-20 09:37 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Tue, Oct 20, 2009 at 1:01 AM, Mukesh Rathor <mukesh.rathor@oracle.com>
wrote:> Yeah, I''ve been thinking in the back of my mind, some sort of
multiple
> runqueus
There already are multiple runqueues; the overhead comes from the
"steal work" method of moving vcpus between them, which works fine for
low number of cpus but doesn''t scale well.

Hmm, I thought I had written up my plans for load-balancing in an
e-mail to the list, but I can''t seem to find them now.  Standby
sometime for a description. :-)
> Agree. I''m hoping to collect all that information over the next
couple/few
> months.  The last attempt, made a year ago, didn''t yield in a
whole lot
> of information because of problems with 32bit tools and 64bit guest apps
> interaction.
I have some good tools for collecting scheduling activity and
analyzing using xentrace and xenalyze.  When you get things set up,
let me know and I''ll post some information about using xentrace /
xenalyze to characterize a workload''s scheduling.
> In a nutshell, there''s tremendous smarts in the DB, and so I think
it
> prefers a simplified schedular/OS that it can provide hints to and interact
> a little with.  Ideally, it would like ability for a privileged thread
> to tell the OS/hyp, I want to yield cpu to thread #xyz.
If the thread is not scheduled on a vcpu by the OS, then when the DB
says to yield to that thread, the OS can switch on the running vcpu,
no changes needed.

The only potential modification would be if the DB wants to yield to a
thread which is scheduled on another vcpu, but that vcpu is not
currently running.  Then the guest OS *may* want to be able to ask the
HV to yield the currently running vcpu to the other vcpu.  That
interface is work thinking about.
> Moreover, my focus is large, 32 to 128 logical processors, with 1/2
> to 1TB memory.  As such, I also want to address VCPUs being confined
> to logical block of physical CPUs, taking into consideration that
> licenses are per physical cpu core.
This sounds like it would benefit from the "CPU pools" patch submitted
by Juergen Gross.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Oct-20 09:54 UTC

head link

Re: [Xen-devel] planned csched improvements?

George Dunlap wrote:> On Tue, Oct 20, 2009 at 1:01 AM, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
>> Moreover, my focus is large, 32 to 128 logical processors, with 1/2
>> to 1TB memory.  As such, I also want to address VCPUs being confined
>> to logical block of physical CPUs, taking into consideration that
>> licenses are per physical cpu core.
> 
> This sounds like it would benefit from the "CPU pools" patch
submitted
> by Juergen Gross.
Indeed.
Same problem, so same solution should be fine :-)
Mukesh, if you need more infos about cpu pools, I would be glad to help.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2009-Oct-21 02:26 UTC

head link

Re: [Xen-devel] planned csched improvements?

On Tue, 20 Oct 2009 10:37:19 +0100
George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Tue, Oct 20, 2009 at 1:01 AM, Mukesh Rathor
> <mukesh.rathor@oracle.com> wrote:
> > Yeah, I''ve been thinking in the back of my mind, some sort of
> > multiple runqueus
> 
> There already are multiple runqueues; the overhead comes from the
> "steal work" method of moving vcpus between them, which works
fine for
> low number of cpus but doesn''t scale well.
Exactly, we''d have to study to find the contentions and address those.
Hopefully I can take what you got, tinker around a bit, and send the
changes to see what you think.
> Hmm, I thought I had written up my plans for load-balancing in an
> e-mail to the list, but I can''t seem to find them now.  Standby
> sometime for a description. :-)
Actually, I think you posted it on the list and I saved it somewhere, 
and plan on reading and figuring it once I get closer to doing the work.
> > Agree. I''m hoping to collect all that information over the
next
> > couple/few months.  The last attempt, made a year ago, didn''t
yield
> > in a whole lot of information because of problems with 32bit tools
> > and 64bit guest apps interaction.
> 
> I have some good tools for collecting scheduling activity and
> analyzing using xentrace and xenalyze.  When you get things set up,
> let me know and I''ll post some information about using xentrace /
> xenalyze to characterize a workload''s scheduling.
Great thanks.
> > In a nutshell, there''s tremendous smarts in the DB, and so I
think
> > it prefers a simplified schedular/OS that it can provide hints to
> > and interact a little with.  Ideally, it would like ability for a
> > privileged thread to tell the OS/hyp, I want to yield cpu to thread
> > #xyz.
> 
> If the thread is not scheduled on a vcpu by the OS, then when the DB
> says to yield to that thread, the OS can switch on the running vcpu,
> no changes needed.
> 
> The only potential modification would be if the DB wants to yield to a
> thread which is scheduled on another vcpu, but that vcpu is not
> currently running.  Then the guest OS *may* want to be able to ask the
> HV to yield the currently running vcpu to the other vcpu.  That
> interface is work thinking about.
Yup, precisely.
> > Moreover, my focus is large, 32 to 128 logical processors, with 1/2
> > to 1TB memory.  As such, I also want to address VCPUs being confined
> > to logical block of physical CPUs, taking into consideration that
> > licenses are per physical cpu core.
> 
> This sounds like it would benefit from the "CPU pools" patch
submitted
> by Juergen Gross. 
 Yes, I saw that also on the list also, and when I get closer to doing the
 work, will take a closer look. Right now I am still trying to round up
 the hardware, then will have to round up folks familiar with benchmarks
 to setup them up. Then the easier part begins :)...

 thanks
 Mukesh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2009 - planned csched improvements?

[Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

RE: [Xen-devel] planned csched improvements?

RE: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

RE: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?

Re: [Xen-devel] planned csched improvements?