thr3ads.net - Xen devel - [Xen-devel] Cpu pools discussion [Jul 2009]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2009-Jul-27 15:20 UTC

[Xen-devel] Cpu pools discussion

Keir (and community),

Any thoughts on Jeurgen Gross'' patch on cpu pools?

As a reminder, the idea is to allow "pools" of cpus that would have
separate schedulers.  Physical cpus and domains can be moved from one
pool to another only by an explicit command.  The main purpose Fujitsu
seems to have is to allow a simple machine "partitioning" that is more
robust than using simple affinity masks.  Another potential advantage
would be the ability to use different schedulers for different
purposes.

For my part, it seems like they should be OK.  The main thing I don''t
like is the ugliness related to continue_hypercall_on_cpu(), described
below.

Jeurgen, could you remind us what were the advantages of pools in the
hypervisor, versus just having
affinity masks (with maybe sugar in the toolstack)?

Re the ugly part of the patch, relating to continue_hypercall_on_cpu():

Domains are assigned to a pool, so
if continue_hypercall_on_cpu() is called for a cpu not in the domain''s
pool, you can''t just run it normally.  Jeurgen''s solution
(IIRC) was to
pause all domains in the other pool, temporarily move the cpu in
question to the calling domain''s pool, finish the hypercall, then move
the cpu in question back to the other pool.

Since there''s a lot of antecedents in that, let''s take an
example:

Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3.

Domain 0 is running in pool A, domain 1 is running in pool B.

Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2.

Cpu 2 is in pool B, so Jeurgen''s patch:
 * Pauses domain 1
 * Moves cpu 2 to pool A
 * Finishes the hypercall
 * Moves cpu 2 back to pool B
 * Unpauses domain 1

That seemed a bit ugly to me, but I''m not familiar enough with the use
cases or the code to know if there''s a cleaner solution.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-27 15:50 UTC

head link

[Xen-devel] Re: Cpu pools discussion

On 27/07/2009 16:20, "George Dunlap" <dunlapg@umich.edu> wrote:
> Keir (and community),
> 
> Any thoughts on Jeurgen Gross'' patch on cpu pools?
> 
> As a reminder, the idea is to allow "pools" of cpus that would
have
> separate schedulers.  Physical cpus and domains can be moved from one
> pool to another only by an explicit command.  The main purpose Fujitsu
> seems to have is to allow a simple machine "partitioning" that is
more
> robust than using simple affinity masks.  Another potential advantage
> would be the ability to use different schedulers for different
> purposes.
My own opinion, if it was not clear before, is that I''m not personally
super
excited about this feature. I''d like to know how interested users are
in it
before we spend developer effort on polishing it for inclusion.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhigang Wang

2009-Jul-28 00:41 UTC

head link

Re: [Xen-devel] Cpu pools discussion

George Dunlap wrote:> Keir (and community),
> 
> Any thoughts on Jeurgen Gross'' patch on cpu pools?
> 
> As a reminder, the idea is to allow "pools" of cpus that would
have
> separate schedulers.  Physical cpus and domains can be moved from one
> pool to another only by an explicit command.  The main purpose Fujitsu
> seems to have is to allow a simple machine "partitioning" that is
more
> robust than using simple affinity masks.  Another potential advantage
> would be the ability to use different schedulers for different
> purposes.
> 
> For my part, it seems like they should be OK.  The main thing I
don''t
> like is the ugliness related to continue_hypercall_on_cpu(), described
> below.
> 
> Jeurgen, could you remind us what were the advantages of pools in the
> hypervisor, versus just having
> affinity masks (with maybe sugar in the toolstack)?
> 
> Re the ugly part of the patch, relating to continue_hypercall_on_cpu():
> 
> Domains are assigned to a pool, so
> if continue_hypercall_on_cpu() is called for a cpu not in the
domain''s
> pool, you can''t just run it normally.  Jeurgen''s solution
(IIRC) was to
> pause all domains in the other pool, temporarily move the cpu in
> question to the calling domain''s pool, finish the hypercall, then
move
> the cpu in question back to the other pool.
> 
> Since there''s a lot of antecedents in that, let''s take an
example:
> 
> Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3.
> 
> Domain 0 is running in pool A, domain 1 is running in pool B.
> 
> Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2.
> 
> Cpu 2 is in pool B, so Jeurgen''s patch:
>  * Pauses domain 1
>  * Moves cpu 2 to pool A
>  * Finishes the hypercall
>  * Moves cpu 2 back to pool B
>  * Unpauses domain 1
> 
> That seemed a bit ugly to me, but I''m not familiar enough with the
use
> cases or the code to know if there''s a cleaner solution.
> A usecase from me: I want a pool that passthrough pcpus to the mission critical
domains. A scheduling algorithm
will map vcpus to pcpus one by one in this pool. That will implement a reliable
hard partitioning.
although it will lose some benefit of virtualization.

And we still want a pool using the credit scheduler for common domains.

thanks,

zhigang
>  -George
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 05:40 UTC

head link

Re: [Xen-devel] Cpu pools discussion

George Dunlap wrote:> Keir (and community),
> 
> Any thoughts on Jeurgen Gross'' patch on cpu pools?
> 
> As a reminder, the idea is to allow "pools" of cpus that would
have
> separate schedulers.  Physical cpus and domains can be moved from one
> pool to another only by an explicit command.  The main purpose Fujitsu
> seems to have is to allow a simple machine "partitioning" that is
more
> robust than using simple affinity masks.  Another potential advantage
> would be the ability to use different schedulers for different
> purposes.
> 
> For my part, it seems like they should be OK.  The main thing I
don''t
> like is the ugliness related to continue_hypercall_on_cpu(), described
> below.
> 
> Jeurgen, could you remind us what were the advantages of pools in the
> hypervisor, versus just having
> affinity masks (with maybe sugar in the toolstack)?
Sure.

Our main reason for introducing pools was the weakness of the current
scheduler(s) to schedule domains according to their weights while restricting
the domains to a subset of the physical processors using pinning.
I think it is virtually impossible to find a general solution for this
problem without some sort of pooling (if somebody proves me being wrong here,
I''m completely glad to take this "perfect" scheduler instead
of pools :-) ).

So while the reason for the pools was a lack of functionality in the first
run, there are some more benefits:
+ possibility to use different schedulers for different domains on the same
  machine (do you remember the discussion with bcredit?). Zhigang has posted
  a request for this feature already.
+ less lock conflicts on huge machines with many processors
+ pools could be a good base for NUMA-aware scheduling policies
> 
> Re the ugly part of the patch, relating to continue_hypercall_on_cpu():
> 
> Domains are assigned to a pool, so
> if continue_hypercall_on_cpu() is called for a cpu not in the
domain''s
> pool, you can''t just run it normally.  Jeurgen''s solution
(IIRC) was to
> pause all domains in the other pool, temporarily move the cpu in
> question to the calling domain''s pool, finish the hypercall, then
move
> the cpu in question back to the other pool.
> 
> Since there''s a lot of antecedents in that, let''s take an
example:
> 
> Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3.
> 
> Domain 0 is running in pool A, domain 1 is running in pool B.
> 
> Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2.
> 
> Cpu 2 is in pool B, so Jeurgen''s patch:
>  * Pauses domain 1
>  * Moves cpu 2 to pool A
>  * Finishes the hypercall
>  * Moves cpu 2 back to pool B
>  * Unpauses domain 1
> 
> That seemed a bit ugly to me, but I''m not familiar enough with the
use
> cases or the code to know if there''s a cleaner solution.
Some thoughts on this topic:

The continue_hypercall_on_cpu() function is needed on x86 for loading new
microcode into the processor. The source buffer of the new microcode is
located in dom0-memory so dom0 has to run on the physical processor the new
code is loaded into (otherwise it wouldn''t be accessible).
We could avoid the complete continue_hypercall_on_cpu() stuff if the microcode
would be copied into a hypervisor buffer and use on_selected_cpus() instead.
Other users (cpu hotplug and acpi_enter_sleep) would have to switch to other
solutions as well.

BTW: continue_hypercall_on_cpu() exists on x86 only and it isn''t really
much
better than my usage of it:
- remember old pinning state of current vcpu
- pin it temporarily to the cpu it should continue on
- continue the hypercall
- remove temporary pinning
- re-establish old pinning (if any)
Pretty much the same as my solution above ;-)

So I would suggest to eliminate continue_hypercall_on_cpu() completely if you
are feeling uneasy with my solution.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 09:09 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 28/07/2009 06:40, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> BTW: continue_hypercall_on_cpu() exists on x86 only and it isn''t
really much
> better than my usage of it:
> - remember old pinning state of current vcpu
> - pin it temporarily to the cpu it should continue on
> - continue the hypercall
> - remove temporary pinning
> - re-establish old pinning (if any)
> Pretty much the same as my solution above ;-)
If your solution locks the pinning, as we do already, so that it cannot be
changed while the continue_hypercall_on_cpu() is running, then that is fine.
If it''s not locked then it''s not safe.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2009-Jul-28 09:19 UTC

head link

Re: [Xen-devel] Cpu pools discussion

At 01:41 +0100 on 28 Jul (1248745277), Zhigang Wang
wrote:> A usecase from me: I want a pool that passthrough pcpus to the mission
> critical domains. A scheduling algorithm will map vcpus to pcpus one
> by one in this pool. That will implement a reliable hard partitioning.
> although it will lose some benefit of virtualization.
That''s easily done by setting affinity masks in the tools, without
needing any mechanism in Xen.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 10:15 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Tim Deegan wrote:> At 01:41 +0100 on 28 Jul (1248745277), Zhigang Wang wrote:
>> A usecase from me: I want a pool that passthrough pcpus to the mission
>> critical domains. A scheduling algorithm will map vcpus to pcpus one
>> by one in this pool. That will implement a reliable hard partitioning.
>> although it will lose some benefit of virtualization.
> 
> That''s easily done by setting affinity masks in the tools, without
> needing any mechanism in Xen.
More or less.
You have to set the affinity masks for ALL domains to avoid scheduling on the
"special" cpus.
You won''t have reliable scheduling weights any more.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 10:19 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 28/07/2009 06:40, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>> BTW: continue_hypercall_on_cpu() exists on x86 only and it
isn''t really much
>> better than my usage of it:
>> - remember old pinning state of current vcpu
>> - pin it temporarily to the cpu it should continue on
>> - continue the hypercall
>> - remove temporary pinning
>> - re-establish old pinning (if any)
>> Pretty much the same as my solution above ;-)
> 
> If your solution locks the pinning, as we do already, so that it cannot be
> changed while the continue_hypercall_on_cpu() is running, then that is
fine.
> If it''s not locked then it''s not safe.
Locking in my solution should be okay.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-28 12:50 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On Tue, Jul 28, 2009 at 11:15 AM, Juergen
Gross<juergen.gross@ts.fujitsu.com> wrote:> Tim Deegan wrote:
>> That''s easily done by setting affinity masks in the tools,
without
>> needing any mechanism in Xen.
>
> More or less.
> You have to set the affinity masks for ALL domains to avoid scheduling on
the
> "special" cpus.
> You won''t have reliable scheduling weights any more.
I think this is the key thing: scheduling algorithms normally ignore
pinning.  Both credit1 and (atm) credit2 assume that a VM with higher
credit will be able to run before a VM of lower credit.  But if there
are some VMs pinned to a subset of pcpus, and other VMs pinned on
another subset (or not pinned at all), this breaks that assumption.
The algorithms might be able to be extended to account for pinning,
but it would make things a lot more complicated.  This means harder to
understand and modify the algorithm, which means it''s likely to break
in the future as people try to extend it.  It also means any future
updates or rewrites have to take this pin-to-partition case into
account and do the "right thing".

Given that people want to partition a machine, I think cpu pools makes
the most sense:
* From a user perspective it''s easier; no need to pin every VM, simply
assign which pool it starts in
* From a scheduler perspective, it makes thinking about the algorithms
easier.  It''s OK to build in the assumption that each VM can run
anywhere.  Other than partitioning, there''s no real need to adjust the
scheduling algorithm to do it.

 -George
>
>
> Juergen
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
> Fujitsu Technolgy Solutions               e-mail:
juergen.gross@ts.fujitsu.com
> Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
> D-81739 Muenchen                 Company details:
ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2009-Jul-28 13:07 UTC

head link

Re: [Xen-devel] Cpu pools discussion

At 13:50 +0100 on 28 Jul (1248789008), George Dunlap
wrote:> On Tue, Jul 28, 2009 at 11:15 AM, Juergen
> Gross<juergen.gross@ts.fujitsu.com> wrote:
> > Tim Deegan wrote:
> >> That''s easily done by setting affinity masks in the
tools, without
> >> needing any mechanism in Xen.
> >
> > More or less.
> > You have to set the affinity masks for ALL domains to avoid scheduling
on the
> > "special" cpus.
Bah.  You have to set the CPU pool of all domains to achieve the same
thing; in any case this kind of thing is what toolstacks are good at. :)
> > You won''t have reliable scheduling weights any more.
That''s a much more interesting argument.  It seems to me that in this
simple case the scheduling weights will work out OK, but I can see that
in the general case it gets entertaining. 
> Given that people want to partition a machine, I think cpu pools makes
> the most sense:
> * From a user perspective it''s easier; no need to pin every VM,
simply
> assign which pool it starts in
I''ll say it again because I think it''s important: policy
belongs in the
tools.  User-friendly abstractions don''t have to extend into the
hypervisor interfaces unless...
> * From a scheduler perspective, it makes thinking about the algorithms
> easier.  It''s OK to build in the assumption that each VM can run
> anywhere.  Other than partitioning, there''s no real need to adjust
the
> scheduling algorithm to do it.
...unless there''s a benefit to keeping the hypervisor simple.  Which
this certainly looks like. 

Does strict partitioning of CPUs like this satisfy everyone''s
requirements?  Bearing in mind that 

 - It''s not work-conserving, i.e. it doesn''t allow best-effort
   scheduling of pool A''s vCPUs on the idle CPUs of pool B.

 - It restricts the maximum useful number of vCPUs per guest to the size
   of a pool rather than the size of the machine. 

 - dom0 would be restricted to a subset of CPUs.  That seems OK to me
   but occasionally people talk about having dom0''s vCPUs pinned 1-1 on
   the physical CPUs.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 13:24 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Tim Deegan wrote:> At 13:50 +0100 on 28 Jul (1248789008), George Dunlap wrote:
>> On Tue, Jul 28, 2009 at 11:15 AM, Juergen
>> Gross<juergen.gross@ts.fujitsu.com> wrote:
>>> Tim Deegan wrote:
>>>> That''s easily done by setting affinity masks in the
tools, without
>>>> needing any mechanism in Xen.
>>> More or less.
>>> You have to set the affinity masks for ALL domains to avoid
scheduling on the
>>> "special" cpus.
> 
> Bah.  You have to set the CPU pool of all domains to achieve the same
> thing; in any case this kind of thing is what toolstacks are good at. :)
No.
If I have a dedicated pool for my "special domain" and all other
domains are
running in the default pool 0, I only have to set the pool of my special
domain. Nothing else.
> 
>>> You won''t have reliable scheduling weights any more.
> 
> That''s a much more interesting argument.  It seems to me that in
this
> simple case the scheduling weights will work out OK, but I can see that
> in the general case it gets entertaining.
Even in the relatively simple case of 2 disjunct subsets of domains/cpus
(e.g. 2 domains on cpu 0+1 and 2 domains on cpu 2+3) the consumed time
of the domains does not reflect their weights correctly.
> 
>> Given that people want to partition a machine, I think cpu pools makes
>> the most sense:
>> * From a user perspective it''s easier; no need to pin every
VM, simply
>> assign which pool it starts in
> 
> I''ll say it again because I think it''s important: policy
belongs in the
> tools.  User-friendly abstractions don''t have to extend into the
> hypervisor interfaces unless...
> 
>> * From a scheduler perspective, it makes thinking about the algorithms
>> easier.  It''s OK to build in the assumption that each VM can
run
>> anywhere.  Other than partitioning, there''s no real need to
adjust the
>> scheduling algorithm to do it.
> 
> ...unless there''s a benefit to keeping the hypervisor simple. 
Which
> this certainly looks like. 
> 
> Does strict partitioning of CPUs like this satisfy everyone''s
> requirements?  Bearing in mind that 
> 
>  - It''s not work-conserving, i.e. it doesn''t allow
best-effort
>    scheduling of pool A''s vCPUs on the idle CPUs of pool B.
> 
>  - It restricts the maximum useful number of vCPUs per guest to the size
>    of a pool rather than the size of the machine. 
> 
>  - dom0 would be restricted to a subset of CPUs.  That seems OK to me
>    but occasionally people talk about having dom0''s vCPUs pinned
1-1 on
>    the physical CPUs.
You don''t have to define other pools. You can just live with the
default pool
extended to all cpus and everything is as today.

Pinning is still working in each pool as today.

If a user has domains with different scheduling requirements (e.g. sedf and
credit are to be used) he can use one partitioned machine instead two
dedicated machines. And he can shift resources between the domains (e.g.
devices, memory, single cores or even threads).
He can''t do that without pools today.

With pools you have more possibilities without losing any function you have
today. The only restriction is that you might not be able to use ALL
features together with pools (e.g. complete load balancing), but the
alternative would be to either lose some other functionality (scheduling
weights) or to use different machines which won''t give you load
balancing
either.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2009-Jul-28 13:31 UTC

head link

Re: [Xen-devel] Cpu pools discussion

At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross
wrote:> > Does strict partitioning of CPUs like this satisfy everyone''s
> > requirements?  Bearing in mind that 
> > 
> >  - It''s not work-conserving, i.e. it doesn''t allow
best-effort
> >    scheduling of pool A''s vCPUs on the idle CPUs of pool B.
> > 
> >  - It restricts the maximum useful number of vCPUs per guest to the
size
> >    of a pool rather than the size of the machine. 
> > 
> >  - dom0 would be restricted to a subset of CPUs.  That seems OK to me
> >    but occasionally people talk about having dom0''s vCPUs
pinned 1-1 on
> >    the physical CPUs.
> 
> You don''t have to define other pools. You can just live with the
default pool
> extended to all cpus and everything is as today.
Yep, all I''m saying is you can''t do both.  If the people who
want this
feature (so far I count two of you) want to do both, then this
solution''s good not enough, and we should think about that before going
ahead with it.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 13:39 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Tim Deegan wrote:> At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross wrote:
>>> Does strict partitioning of CPUs like this satisfy
everyone''s
>>> requirements?  Bearing in mind that 
>>>
>>>  - It''s not work-conserving, i.e. it doesn''t
allow best-effort
>>>    scheduling of pool A''s vCPUs on the idle CPUs of pool
B.
>>>
>>>  - It restricts the maximum useful number of vCPUs per guest to the
size
>>>    of a pool rather than the size of the machine. 
>>>
>>>  - dom0 would be restricted to a subset of CPUs.  That seems OK to
me
>>>    but occasionally people talk about having dom0''s vCPUs
pinned 1-1 on
>>>    the physical CPUs.
>> You don''t have to define other pools. You can just live with
the default pool
>> extended to all cpus and everything is as today.
> 
> Yep, all I''m saying is you can''t do both.  If the people
who want this
> feature (so far I count two of you) want to do both, then this
> solution''s good not enough, and we should think about that before
going
> ahead with it.
Okay.

I think your first point is the most important one.
It might be possible to build a load balancing scheme to shift cpus between
pools dynamically, but this should be step 2, I think :-)
But it would be a nice project :-)


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-28 13:41 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On Tue, Jul 28, 2009 at 2:31 PM, Tim Deegan<Tim.Deegan@citrix.com>
wrote:> At 14:24 +0100 on 28 Jul (1248791073), Juergen Gross wrote:
>> > Does strict partitioning of CPUs like this satisfy
everyone''s
>> > requirements?  Bearing in mind that
>> >
>> >  - It''s not work-conserving, i.e. it doesn''t
allow best-effort
>> >    scheduling of pool A''s vCPUs on the idle CPUs of pool
B.
>> >
>> >  - It restricts the maximum useful number of vCPUs per guest to
the size
>> >    of a pool rather than the size of the machine.
>> >
>> >  - dom0 would be restricted to a subset of CPUs.  That seems OK to
me
>> >    but occasionally people talk about having dom0''s vCPUs
pinned 1-1 on
>> >    the physical CPUs.
>>
>> You don''t have to define other pools. You can just live with
the default pool
>> extended to all cpus and everything is as today.
>
> Yep, all I''m saying is you can''t do both.  If the people
who want this
> feature (so far I count two of you) want to do both, then this
> solution''s good not enough, and we should think about that before
going
> ahead with it.
Yes, if you have more than one pool, then dom0 can''t run on all cpus;
but it can still run with dom0''s vcpus pinned 1-1 on the physical cpus
in its pool.

I''m not sure why someone who wants to partition a machine would
simultaneously want dom0 to run across all cpus...

As Juergen says, for people who don''t use the feature, it
shouldn''t
have any real effect.  The patch is pretty straightforward, except for
the "continue_hypercall_on_cpu()" bit.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-28 13:47 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On Tue, Jul 28, 2009 at 2:39 PM, Juergen
Gross<juergen.gross@ts.fujitsu.com> wrote:> I think your first point is the most important one.
> It might be possible to build a load balancing scheme to shift cpus between
> pools dynamically, but this should be step 2, I think :-)
> But it would be a nice project :-)
If I recall your use case, Juergen, I thought the whole point was to
keep some set of VMs limited to just a subset of CPUs?  So the first
point is a feature for you, not a bug. :-)

If we ever do find someone who wants cpu pools, perhaps to use
different schedulers, but wants to be able to dynamically adjust pool
size, then they can work on such a project.  Until then, no point
spending time on something no one''s going to use.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 13:55 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 28/07/2009 14:41, "George Dunlap" <dunlapg@umich.edu> wrote:
> As Juergen says, for people who don''t use the feature, it
shouldn''t
> have any real effect.  The patch is pretty straightforward, except for
> the "continue_hypercall_on_cpu()" bit.
Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to
lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu()
is running. In particular, it is clear that it does nothing if the vcpu is
already part of the pool that the domain is running in. But then what if the
cpu is removed from the pool during the borrow_cpu()/return_cpu() critical
region? It hardly inspires confidence.

Another thing I noted is that sched_tick_suspend/resume are pointlessly
changed to take a cpu parameter, which is smp_processor_id(). I swear at the
screen whenever I see people trying to slip that kind of nonsense in. It
makes it look like the functions can operate on an arbitrary cpu when in
fact I''ll wager they cannot (and I doubt the author of such changes has
checked). It''s a nasty nasty interface change.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-28 13:57 UTC

head link

Re: [Xen-devel] Cpu pools discussion

George Dunlap wrote:> On Tue, Jul 28, 2009 at 2:39 PM, Juergen
> Gross<juergen.gross@ts.fujitsu.com> wrote:
>> I think your first point is the most important one.
>> It might be possible to build a load balancing scheme to shift cpus
between
>> pools dynamically, but this should be step 2, I think :-)
>> But it would be a nice project :-)
> 
> If I recall your use case, Juergen, I thought the whole point was to
> keep some set of VMs limited to just a subset of CPUs?  So the first
> point is a feature for you, not a bug. :-)
Indeed.
I just like to think about further enhancements, even if my company
isn''t
requiring them...
> 
> If we ever do find someone who wants cpu pools, perhaps to use
> different schedulers, but wants to be able to dynamically adjust pool
> size, then they can work on such a project.  Until then, no point
> spending time on something no one''s going to use.
Absolutely true.
OTOH I see pools as an interesting way to support large NUMA systems in an
effective way. And for this usage you would need such a project :-)
I think it is very important to check the possible future enhancements, as
they might influence decisions today.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-28 15:29 UTC

head link

RE: [Xen-devel] Cpu pools discussion

Sorry for the late join...

I wonder if cpu pools helps with the following problem:

Some large software company that shall remain nameless
continues to license their high value applications
on a per-pcpu basis rather than on a per-vcpu basis.
As a result, VMs running these applications must be
restricted to specific pcpu''s which are "licensed" to
run the software.

Currently this is done with pinning, but pinning
does restrict the flexibility of a multi-vcpu VM.
Affinity seems like it should help, but affinity
doesn''t restrict the VM from running on a non-affinitive
pcpu (does it?)

For example, assume you have an 8 vcpu VM and it
must be restricted to a 2 pcpu license on a
4 pcpu server.  Ideally, you''d like any of the 8
vcpus to be assigned to either pcpu at any time
so you don''t want to pin, for example, even
vcpu''s to pcpu#0 and odd vcpu''s to pcpu#1.
And, if all vcpus are idle, you''d like pcpu#0
and pcpu#1 to be free to run other VMs.

Can this be done with cpu pools (easier than / more
flexibly than / and not at all ) with current pinning
and affinity?

Also in a data center, does cpu pools make it possible/
easier for tools to pre-assign a subset of processors
on ALL servers in the data center to serve a certain
licensed class of VMs?  For example, perhaps one
would like to upgrade some of the machines in one''s
virtual data center from dual-core to quad-core but
not pay for additional per-pcpu app licenses (i.e.
the additional pcpus will be used for other non-licensed
VMs).  Tools could assign two pcpus on each server
to be part of the "DB pool" thus restricting execution
(and license fees) but still allowing easy migration.

Can this be done with cpu pools (easier than / more
flexibly than / and not at all ) with current pinning
and affinity?

If the answer to these questions is yes, than I
suspect one large software company might be very
interested in cpu pools.
> -----Original Message-----
> From: Juergen Gross [mailto:juergen.gross@ts.fujitsu.com]
> Sent: Tuesday, July 28, 2009 7:57 AM
> To: George Dunlap
> Cc: xen-devel@lists.xensource.com; Zhigang Wang; Tim Deegan; 
> Keir Fraser
> Subject: Re: [Xen-devel] Cpu pools discussion
> 
> 
> George Dunlap wrote:
> > On Tue, Jul 28, 2009 at 2:39 PM, Juergen
> > Gross<juergen.gross@ts.fujitsu.com> wrote:
> >> I think your first point is the most important one.
> >> It might be possible to build a load balancing scheme to 
> shift cpus between
> >> pools dynamically, but this should be step 2, I think :-)
> >> But it would be a nice project :-)
> > 
> > If I recall your use case, Juergen, I thought the whole point was to
> > keep some set of VMs limited to just a subset of CPUs?  So the first
> > point is a feature for you, not a bug. :-)
> 
> Indeed.
> I just like to think about further enhancements, even if my 
> company isn''t
> requiring them...
> 
> > 
> > If we ever do find someone who wants cpu pools, perhaps to use
> > different schedulers, but wants to be able to dynamically 
> adjust pool
> > size, then they can work on such a project.  Until then, no point
> > spending time on something no one''s going to use.
> 
> Absolutely true.
> OTOH I see pools as an interesting way to support large NUMA 
> systems in an
> effective way. And for this usage you would need such a project :-)
> I think it is very important to check the possible future 
> enhancements, as
> they might influence decisions today.
> 
> 
> Juergen
> 
> -- 
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
> Fujitsu Technolgy Solutions               e-mail: 
> juergen.gross@ts.fujitsu.com
> Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
> D-81739 Muenchen                 Company details: 
> ts.fujitsu.com/imprint.html
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 15:49 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 28/07/2009 16:29, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> Currently this is done with pinning, but pinning
> does restrict the flexibility of a multi-vcpu VM.
> Affinity seems like it should help, but affinity
> doesn''t restrict the VM from running on a non-affinitive
> pcpu (does it?)
Yes it does. VCPUs only run on PCPUs in their affinity masks.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jul-28 16:26 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Or to put it another way, "pinning" is shorthand for an affinity that
contains only one cpu.

 -George

On Tue, Jul 28, 2009 at 4:49 PM, Keir Fraser<keir.fraser@eu.citrix.com>
wrote:> On 28/07/2009 16:29, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>
>> Currently this is done with pinning, but pinning
>> does restrict the flexibility of a multi-vcpu VM.
>> Affinity seems like it should help, but affinity
>> doesn''t restrict the VM from running on a non-affinitive
>> pcpu (does it?)
>
> Yes it does. VCPUs only run on PCPUs in their affinity masks.
>
>  -- Keir
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhigang Wang

2009-Jul-29 00:29 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 28/07/2009 16:29, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> 
>> Currently this is done with pinning, but pinning
>> does restrict the flexibility of a multi-vcpu VM.
>> Affinity seems like it should help, but affinity
>> doesn''t restrict the VM from running on a non-affinitive
>> pcpu (does it?)
> 
> Yes it does. VCPUs only run on PCPUs in their affinity masks.
> 
>  -- Keir
> 
> I''m wondering is there some performance difference between these
two scenario:

1) vcpu0 pinned to pcpu0, vcpu1 pinned to pcpu1.
2) vcpu0 and vcpu1 affined to pcpu0 and pcpu1 but not pinned.

Currently we have to explicitly pin *every* vcpu to get true hard partitioning.

We are seeking for a better solution, whether it will be in the hypervisor or
just
user space tools. But seems the cpu pool concept is attractive.

thanks,

zhigang

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-29 05:47 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Dan Magenheimer wrote:> Sorry for the late join...
> 
> I wonder if cpu pools helps with the following problem:
> 
> Some large software company that shall remain nameless
> continues to license their high value applications
> on a per-pcpu basis rather than on a per-vcpu basis.
> As a result, VMs running these applications must be
> restricted to specific pcpu''s which are "licensed" to
> run the software.
> 
> Currently this is done with pinning, but pinning
> does restrict the flexibility of a multi-vcpu VM.
> Affinity seems like it should help, but affinity
> doesn''t restrict the VM from running on a non-affinitive
> pcpu (does it?)
> 
> For example, assume you have an 8 vcpu VM and it
> must be restricted to a 2 pcpu license on a
> 4 pcpu server.  Ideally, you''d like any of the 8
> vcpus to be assigned to either pcpu at any time
> so you don''t want to pin, for example, even
> vcpu''s to pcpu#0 and odd vcpu''s to pcpu#1.
> And, if all vcpus are idle, you''d like pcpu#0
> and pcpu#1 to be free to run other VMs.
> 
> Can this be done with cpu pools (easier than / more
> flexibly than / and not at all ) with current pinning
> and affinity?
Pools will restrict the assigned domains to the assigned pcpus.
This can be done by affinity masks as well.
But pools won''t allow domains of pool B to run on idle pcpus of pool A.
> 
> Also in a data center, does cpu pools make it possible/
> easier for tools to pre-assign a subset of processors
> on ALL servers in the data center to serve a certain
> licensed class of VMs?  For example, perhaps one
> would like to upgrade some of the machines in one''s
> virtual data center from dual-core to quad-core but
> not pay for additional per-pcpu app licenses (i.e.
> the additional pcpus will be used for other non-licensed
> VMs).  Tools could assign two pcpus on each server
> to be part of the "DB pool" thus restricting execution
> (and license fees) but still allowing easy migration.
> 
> Can this be done with cpu pools (easier than / more
> flexibly than / and not at all ) with current pinning
> and affinity?
This is easy doable with pools.
We are doing this for our BS2000 system.
> 
> If the answer to these questions is yes, than I
> suspect one large software company might be very
> interested in cpu pools.
Is one "yes" enough? :-)


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-29 06:14 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 28/07/2009 14:41, "George Dunlap" <dunlapg@umich.edu>
wrote:
> 
>> As Juergen says, for people who don''t use the feature, it
shouldn''t
>> have any real effect.  The patch is pretty straightforward, except for
>> the "continue_hypercall_on_cpu()" bit.
> 
> Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem to
> lock down the cpu-pool-vcpu relationship while continue_hypercall_on_cpu()
> is running. In particular, it is clear that it does nothing if the vcpu is
> already part of the pool that the domain is running in. But then what if
the
> cpu is removed from the pool during the borrow_cpu()/return_cpu() critical
> region? It hardly inspires confidence.
I checked the use cases.
All calls leading to cpupool_borrow_cpu() are done under the domctl lock.
The same applies to all cpupool operations.
I can add an explicit check not to unassign borrowed cpus, if you like.
> 
> Another thing I noted is that sched_tick_suspend/resume are pointlessly
> changed to take a cpu parameter, which is smp_processor_id(). I swear at
the
> screen whenever I see people trying to slip that kind of nonsense in. It
Sorry, this seems to be an artefact of an earlier version of my changes.
I''ll remove this one...
> makes it look like the functions can operate on an arbitrary cpu when in
> fact I''ll wager they cannot (and I doubt the author of such
changes has
> checked). It''s a nasty nasty interface change.
I''m pretty sure they could indeed work on any cpu. At least I tried to
use
them on other cpus, but I ran into other problems leading to the current
solution not requiring the cpu parameter any more.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-29 07:39 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 29/07/2009 07:14, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> Just pulled up the patch. Actually cpupool_borrow_cpu() does not seem
to
>> lock down the cpu-pool-vcpu relationship while
continue_hypercall_on_cpu()
>> is running. In particular, it is clear that it does nothing if the vcpu
is
>> already part of the pool that the domain is running in. But then what
if the
>> cpu is removed from the pool during the borrow_cpu()/return_cpu()
critical
>> region? It hardly inspires confidence.
> 
> I checked the use cases.
> All calls leading to cpupool_borrow_cpu() are done under the domctl lock.
> The same applies to all cpupool operations.
Uhhh... How did you figure that one out? I don''t think one single
caller of
continue_hypercall_on_cpu() holds the domctl_lock. The callers are all
sysctls and platform_ops.
> I can add an explicit check not to unassign borrowed cpus, if you like.
Your new interface ought to be responsible for its own synchronisation
needs. And if it''s not you should implement the appropriate assertions
regarding e.g., spin_is_locked(), plus a code comment. It''s simple
negligence to do neither.

This is all not to say that I''ve been convinced we should accept the
feature
at all...

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-29 08:52 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 29/07/2009 07:14, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> Just pulled up the patch. Actually cpupool_borrow_cpu() does not
seem to
>>> lock down the cpu-pool-vcpu relationship while
continue_hypercall_on_cpu()
>>> is running. In particular, it is clear that it does nothing if the
vcpu is
>>> already part of the pool that the domain is running in. But then
what if the
>>> cpu is removed from the pool during the borrow_cpu()/return_cpu()
critical
>>> region? It hardly inspires confidence.
>> I checked the use cases.
>> All calls leading to cpupool_borrow_cpu() are done under the domctl
lock.
>> The same applies to all cpupool operations.
> 
> Uhhh... How did you figure that one out? I don''t think one single
caller of
> continue_hypercall_on_cpu() holds the domctl_lock. The callers are all
> sysctls and platform_ops.
Sigh. I just recalled it from memory. Seems I was wrong.
> 
>> I can add an explicit check not to unassign borrowed cpus, if you like.
> 
> Your new interface ought to be responsible for its own synchronisation
> needs. And if it''s not you should implement the appropriate
assertions
> regarding e.g., spin_is_locked(), plus a code comment. It''s simple
> negligence to do neither.
You are right.
I will add a check to ensure borrowed cpus are not allowed to change the pool.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-29 09:35 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 29/07/2009 09:52, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>>> I can add an explicit check not to unassign borrowed cpus, if you
like.
>> 
>> Your new interface ought to be responsible for its own synchronisation
>> needs. And if it''s not you should implement the appropriate
assertions
>> regarding e.g., spin_is_locked(), plus a code comment. It''s
simple
>> negligence to do neither.
> 
> You are right.
> I will add a check to ensure borrowed cpus are not allowed to change the
pool.
A couple more comments.

It is not safe to domain_pause() while you hold locks. It can deadlock, as
domain_pause() waits for the domain to be descheduled, but it could be
spinning on a lock you hold. Also it looks like a domain can be moved away
from a pool while the pool is paused, and then you would leak a pause
refcount.

Secondly, I think that the cpupool_borrow/return calls should be embedded
within vcpu_{lock,unlock,locked_change}_affinity(); also I see no need to
have cpupool_return_cpu() return anything as you should be able to make a
decision to move onto another CPU on the next scheduling round anyway (which
can always be forced by setting SCHEDULE_SOFTIRQ).

Really I dislike this patch greatly, as you can tell. ;-) The patchset as a
whole is *ginormous*, the Xen patch by itself is pretty big and complicated
and I believe full of races and deadlocks. I''ve just picked up on a few
obvious ones from a very brief read.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-29 11:06 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> 
> 
> On 29/07/2009 09:52, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>>> I can add an explicit check not to unassign borrowed cpus, if
you like.
>>> Your new interface ought to be responsible for its own
synchronisation
>>> needs. And if it''s not you should implement the
appropriate assertions
>>> regarding e.g., spin_is_locked(), plus a code comment.
It''s simple
>>> negligence to do neither.
>> You are right.
>> I will add a check to ensure borrowed cpus are not allowed to change
the pool.
> 
> A couple more comments.
> 
> It is not safe to domain_pause() while you hold locks. It can deadlock, as
> domain_pause() waits for the domain to be descheduled, but it could be
> spinning on a lock you hold. Also it looks like a domain can be moved away
> from a pool while the pool is paused, and then you would leak a pause
> refcount.
> 
> Secondly, I think that the cpupool_borrow/return calls should be embedded
> within vcpu_{lock,unlock,locked_change}_affinity(); also I see no need to
> have cpupool_return_cpu() return anything as you should be able to make a
> decision to move onto another CPU on the next scheduling round anyway
(which
> can always be forced by setting SCHEDULE_SOFTIRQ).
> 
> Really I dislike this patch greatly, as you can tell. ;-) The patchset as a
> whole is *ginormous*, the Xen patch by itself is pretty big and complicated
> and I believe full of races and deadlocks. I''ve just picked up on
a few
> obvious ones from a very brief read.
The main problems you mention here are related to the cpupool_borrow stuff,
which is the main objection of George, too (its not my favourite part of the
patch, too).

Would you feel better if I''d try to eliminate the reason for
cpupool_borrow?
This function is needed only for continue_hypercall_on_cpu outside of the
current pool. I think it should be possible to replace those by
on_selected_cpus with less impact on the whole system.

I tried not to change any interfaces which are not directly related to the
pools in the first run. If the result of this approach forces you to reject
the patch, I would be happy to change it.
I agree with you it would be better not to need that borrow stuff, but I
don''t
know whether you would like the continue_hypercall_on_cpu elimination more
(or which solution would cause less pain).

The next step after that would be to split up the xen patch into logical
pieces. I would suggest to change the scheduler internals in a separate patch
(mainly the elimination of the local variables) to make the functional
changes required for the pools more obvious. This should reduce the pure
pool related patch by a factor of 2.

Regarding races: I tested the "normal" pool interfaces (cpu
add/remove, domain
create/destroy/move) rather intensive (multiple concurrent scripts ran for
several hours). The cpu borrow stuff was NOT tested very much. There are 3
use cases for this interface:
- cpu microcode loading is running at system boot (this was my favourite test
  case)
- enter deep sleep only continues on cpu 0, which I removed only occasionally
  from pool 0
- I don''t think I could test cpu hotplug...


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-29 12:28 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 29/07/2009 12:06, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> Would you feel better if I''d try to eliminate the reason for
cpupool_borrow?
> This function is needed only for continue_hypercall_on_cpu outside of the
> current pool. I think it should be possible to replace those by
> on_selected_cpus with less impact on the whole system.
Some of the stuff in the continuation handlers cannot be executed in irq
context. ''Fixing'' that would make many of the users ugly and
less
maintainable, so getting borrow/return right is the better answer I think.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-29 12:33 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 29/07/2009 12:06, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>> Would you feel better if I''d try to eliminate the reason for
cpupool_borrow?
>> This function is needed only for continue_hypercall_on_cpu outside of
the
>> current pool. I think it should be possible to replace those by
>> on_selected_cpus with less impact on the whole system.
> 
> Some of the stuff in the continuation handlers cannot be executed in irq
> context. ''Fixing'' that would make many of the users ugly
and less
> maintainable, so getting borrow/return right is the better answer I think.
The alternative would be a tasklet set up in irq.
And we are speaking of 3 users.
I could try a patch and then we could compare the two solutions. What do you
think?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-29 13:00 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 29/07/2009 13:33, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>>> Would you feel better if I''d try to eliminate the reason
for cpupool_borrow?
>>> This function is needed only for continue_hypercall_on_cpu outside
of the
>>> current pool. I think it should be possible to replace those by
>>> on_selected_cpus with less impact on the whole system.
>> 
>> Some of the stuff in the continuation handlers cannot be executed in
irq
>> context. ''Fixing'' that would make many of the users
ugly and less
>> maintainable, so getting borrow/return right is the better answer I
think.
> 
> The alternative would be a tasklet set up in irq.
> And we are speaking of 3 users.
> I could try a patch and then we could compare the two solutions. What do
you
> think?
This would work for a couple of callers, but some really need to be running
in dom0 context. Or, more precisely, not the context of some other domain
(softirqs synchronously preempt execution of a vcpu context). This can lead
to subtle deadlocks, for example in freeze_domains() and in __cpu_die(),
because we may need the vcpu we have snchronously preempted to make some
progress for ourselves to be able to get past a spin loop.

Another alternative might be to create a ''hypervisor thread'',
either
dynamically, or a per-cpu worker thread, and do the work in that. Of course
that has its own complexities and these threads would also have their own
interactions with cpu pools to keep them pinned on the appropriate physical
cpu. I don''t know whether this would really work out simpler.

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-30 05:46 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 29/07/2009 13:33, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>>> Would you feel better if I''d try to eliminate the
reason for cpupool_borrow?
>>>> This function is needed only for continue_hypercall_on_cpu
outside of the
>>>> current pool. I think it should be possible to replace those by
>>>> on_selected_cpus with less impact on the whole system.
>>> Some of the stuff in the continuation handlers cannot be executed
in irq
>>> context. ''Fixing'' that would make many of the
users ugly and less
>>> maintainable, so getting borrow/return right is the better answer I
think.
>> The alternative would be a tasklet set up in irq.
>> And we are speaking of 3 users.
>> I could try a patch and then we could compare the two solutions. What
do you
>> think?
> 
> This would work for a couple of callers, but some really need to be running
> in dom0 context. Or, more precisely, not the context of some other domain
> (softirqs synchronously preempt execution of a vcpu context). This can lead
> to subtle deadlocks, for example in freeze_domains() and in __cpu_die(),
> because we may need the vcpu we have snchronously preempted to make some
> progress for ourselves to be able to get past a spin loop.
Okay.
> Another alternative might be to create a ''hypervisor
thread'', either
> dynamically, or a per-cpu worker thread, and do the work in that. Of course
> that has its own complexities and these threads would also have their own
> interactions with cpu pools to keep them pinned on the appropriate physical
> cpu. I don''t know whether this would really work out simpler.
There should be an easy solution for this: What you are suggesting here sounds
like a "hypervisor domain" similar to the the idle domain, but with
high
priority and normally all vcpus blocked.

The interactions of this domain with cpupools would be the same as for the
idle domain.

I think this approach could be attractive, but the question is if the pros
outweigh the cons. OTOH such a domain could open interesting opportunities.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-30 08:30 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 30/07/2009 06:46, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> Another alternative might be to create a ''hypervisor
thread'', either
>> dynamically, or a per-cpu worker thread, and do the work in that. Of
course
>> that has its own complexities and these threads would also have their
own
>> interactions with cpu pools to keep them pinned on the appropriate
physical
>> cpu. I don''t know whether this would really work out simpler.
> 
> There should be an easy solution for this: What you are suggesting here
sounds
> like a "hypervisor domain" similar to the the idle domain, but
with high
> priority and normally all vcpus blocked.
> 
> The interactions of this domain with cpupools would be the same as for the
> idle domain.
> 
> I think this approach could be attractive, but the question is if the pros
> outweigh the cons. OTOH such a domain could open interesting opportunities.
I think especially if cpupools are added into the mix then this becomes more
attractive than the current approach. The other alternative is to modify the
two existing problematic callers to work okay from softirq context (or not
need continue_hypercall_on_cpu() at all, which might be possible at least in
the case of CPU hotplug). I would be undecided between these two just now --
it depends on how easily those two callers can be fixed up.

CPU hotplug raises a question in relation to cpupools, by the way. What pool
does a cpu get added to when it is brought online? And what do you do when
someone offlines a CPU (e.g., especially when it is the last in its pool)?
In that latter case, have you not considered it, or do you refuse the
offline, or do you somehow break the pool affinity so that domains belonging
to it can run elsewhere?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-30 08:58 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> CPU hotplug raises a question in relation to cpupools, by the way. What
pool
> does a cpu get added to when it is brought online? And what do you do when
> someone offlines a CPU (e.g., especially when it is the last in its pool)?
> In that latter case, have you not considered it, or do you refuse the
> offline, or do you somehow break the pool affinity so that domains
belonging
> to it can run elsewhere?
These cases are already covered by my patch.
A new cpu is always added to the "free pool". It can then be assigned
to any
pool. Perhaps it would be better to add it to pool 0, but that''s a
minor
detail, I think.
Offlining the last cpu of a pool with active domains is refused.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-30 12:51 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 30/07/2009 06:46, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> Another alternative might be to create a ''hypervisor
thread'', either
>>> dynamically, or a per-cpu worker thread, and do the work in that.
Of course
>>> that has its own complexities and these threads would also have
their own
>>> interactions with cpu pools to keep them pinned on the appropriate
physical
>>> cpu. I don''t know whether this would really work out
simpler.
>> There should be an easy solution for this: What you are suggesting here
sounds
>> like a "hypervisor domain" similar to the the idle domain,
but with high
>> priority and normally all vcpus blocked.
>>
>> The interactions of this domain with cpupools would be the same as for
the
>> idle domain.
>>
>> I think this approach could be attractive, but the question is if the
pros
>> outweigh the cons. OTOH such a domain could open interesting
opportunities.
> 
> I think especially if cpupools are added into the mix then this becomes
more
> attractive than the current approach. The other alternative is to modify
the
> two existing problematic callers to work okay from softirq context (or not
> need continue_hypercall_on_cpu() at all, which might be possible at least
in
> the case of CPU hotplug). I would be undecided between these two just now
--
> it depends on how easily those two callers can be fixed up.
I''ll try to set up a patch to add a hypervisor domain. Regarding all
the
problems I got with switching cpus between pools (avoid running on the cpu to
be switched etc.) this solution could make life much easier.

And George would be happy to see all the borrow cpu stuff vanish :-)


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-30 13:18 UTC

head link

Re: [Xen-devel] Cpu pools discussion

On 30/07/2009 13:51, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> I think especially if cpupools are added into the mix then this becomes
more
>> attractive than the current approach. The other alternative is to
modify the
>> two existing problematic callers to work okay from softirq context (or
not
>> need continue_hypercall_on_cpu() at all, which might be possible at
least in
>> the case of CPU hotplug). I would be undecided between these two just
now --
>> it depends on how easily those two callers can be fixed up.
> 
> I''ll try to set up a patch to add a hypervisor domain. Regarding
all the
> problems I got with switching cpus between pools (avoid running on the cpu
to
> be switched etc.) this solution could make life much easier.
I''m inclined actually to think a hypervisor domain is not necessary,
and we
can get by with softirqs. I actually think cpu offline can be reimplemented
without softirqs or continue_hypercall_on_cpu(), and I would imagine cpupool
changes then could use a similar technique. I will take a look at that, and
you can take your cues from it if I find an elegant solution along those
lines.
> And George would be happy to see all the borrow cpu stuff vanish :-)
Yes, well I think we can get rid of that, regardless of a decision regarding
hypervisor domains. And we get rid of vcpu_lock_affinity too, which is nice.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Jul-31 05:25 UTC

head link

Re: [Xen-devel] Cpu pools discussion

Keir Fraser wrote:> On 30/07/2009 13:51, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> I think especially if cpupools are added into the mix then this
becomes more
>>> attractive than the current approach. The other alternative is to
modify the
>>> two existing problematic callers to work okay from softirq context
(or not
>>> need continue_hypercall_on_cpu() at all, which might be possible at
least in
>>> the case of CPU hotplug). I would be undecided between these two
just now --
>>> it depends on how easily those two callers can be fixed up.
>> I''ll try to set up a patch to add a hypervisor domain.
Regarding all the
>> problems I got with switching cpus between pools (avoid running on the
cpu to
>> be switched etc.) this solution could make life much easier.
> 
> I''m inclined actually to think a hypervisor domain is not
necessary, and we
> can get by with softirqs. I actually think cpu offline can be reimplemented
> without softirqs or continue_hypercall_on_cpu(), and I would imagine
cpupool
> changes then could use a similar technique. I will take a look at that, and
> you can take your cues from it if I find an elegant solution along those
> lines.
Thanks, that''s great!


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2009 - Cpu pools discussion

[Xen-devel] Cpu pools discussion

[Xen-devel] Re: Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

RE: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion

Re: [Xen-devel] Cpu pools discussion