thr3ads.net - Xen devel - [Xen-devel] Xen 3.4.1 NUMA support [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Papagiannis Anastasios

2009-Nov-04 12:02 UTC

[Xen-devel] Xen 3.4.1 NUMA support

Hello,

does the last version of Xen(3.4.1) support NUMA machines? Is there a .pdf
or a link that can give me some more details about that? I work on a
project for xen performace in numa machines. And in xen 3.3.0 this
performance isn''t good. Have something changed in last version?

Thanks in advance,
Papagiannis Anastasios


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-04 12:32 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Add Xen boot parameter ''numa=on'' to enable NUMA detection.
Then it''s up to
you to, for example, pin domains to specific nodes, using the
''cpus=...''
option in the domain config file. See /etc/xen/xmexample1 for an example of
its usage.

 -- Keir

On 04/11/2009 12:02, "Papagiannis Anastasios"
<apapag@ics.forth.gr> wrote:
> Hello,
> 
> does the last version of Xen(3.4.1) support NUMA machines? Is there a .pdf
> or a link that can give me some more details about that? I work on a
> project for xen performace in numa machines. And in xen 3.3.0 this
> performance isn''t good. Have something changed in last version?
> 
> Thanks in advance,
> Papagiannis Anastasios
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Nov-06 18:07 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

VMware has the notion of a "cell" where VMs can be
scheduled only within a cell, not across cells.
Cell boundaries are determined by VMware by
default, though certains settings can override them.

An interesting project might be to implement
"numa=cell" for Xen.... or maybe something similar
is already in George Dunlap''s scheduler plans?
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Wednesday, November 04, 2009 5:33 AM
> To: Papagiannis Anastasios; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
> 
> 
> Add Xen boot parameter ''numa=on'' to enable NUMA
detection.
> Then it''s up to
> you to, for example, pin domains to specific nodes, using the 
> ''cpus=...''
> option in the domain config file. See /etc/xen/xmexample1 for 
> an example of
> its usage.
> 
>  -- Keir
> 
> On 04/11/2009 12:02, "Papagiannis Anastasios" 
> <apapag@ics.forth.gr> wrote:
> 
> > Hello,
> > 
> > does the last version of Xen(3.4.1) support NUMA machines? 
> Is there a .pdf
> > or a link that can give me some more details about that? I work on a
> > project for xen performace in numa machines. And in xen 3.3.0 this
> > performance isn''t good. Have something changed in last
version?
> > 
> > Thanks in advance,
> > Papagiannis Anastasios
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-09 11:33 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

I haven''t had time to look at NUMA stuff at all.  I probably will look 
at it eventually, if no one else does, but I''d be happy if someone else
could pursue it.

 -George

Dan Magenheimer wrote:> VMware has the notion of a "cell" where VMs can be
> scheduled only within a cell, not across cells.
> Cell boundaries are determined by VMware by
> default, though certains settings can override them.
>
> An interesting project might be to implement
> "numa=cell" for Xen.... or maybe something similar
> is already in George Dunlap''s scheduler plans?
>
>   
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Wednesday, November 04, 2009 5:33 AM
>> To: Papagiannis Anastasios; xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>
>>
>> Add Xen boot parameter ''numa=on'' to enable NUMA
detection.
>> Then it''s up to
>> you to, for example, pin domains to specific nodes, using the 
>> ''cpus=...''
>> option in the domain config file. See /etc/xen/xmexample1 for 
>> an example of
>> its usage.
>>
>>  -- Keir
>>
>> On 04/11/2009 12:02, "Papagiannis Anastasios" 
>> <apapag@ics.forth.gr> wrote:
>>
>>     
>>> Hello,
>>>
>>> does the last version of Xen(3.4.1) support NUMA machines? 
>>>       
>> Is there a .pdf
>>     
>>> or a link that can give me some more details about that? I work on
a
>>> project for xen performace in numa machines. And in xen 3.3.0 this
>>> performance isn''t good. Have something changed in last
version?
>>>
>>> Thanks in advance,
>>> Papagiannis Anastasios
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>       
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>     

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dulloor

2009-Nov-09 11:39 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

George,

What''s the current scope and status of your scheduler work ? Is it
going to look similar to the Linux scheduler (with scheduling domains,
et al). In that case, topology is already accounted for, to a large
extent. It would be good to know so that I can work on something that
doesn''t overlap.

-dulloor

On Mon, Nov 9, 2009 at 6:33 AM, George Dunlap
<george.dunlap@eu.citrix.com> wrote:> I haven''t had time to look at NUMA stuff at all.  I probably will
look at it
> eventually, if no one else does, but I''d be happy if someone else
could
> pursue it.
>
> -George
>
> Dan Magenheimer wrote:
>>
>> VMware has the notion of a "cell" where VMs can be
>> scheduled only within a cell, not across cells.
>> Cell boundaries are determined by VMware by
>> default, though certains settings can override them.
>>
>> An interesting project might be to implement
>> "numa=cell" for Xen.... or maybe something similar
>> is already in George Dunlap''s scheduler plans?
>>
>>
>>>
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>> Sent: Wednesday, November 04, 2009 5:33 AM
>>> To: Papagiannis Anastasios; xen-devel@lists.xensource.com
>>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>>
>>>
>>> Add Xen boot parameter ''numa=on'' to enable NUMA
detection. Then it''s up
>>> to
>>> you to, for example, pin domains to specific nodes, using the
''cpus=...''
>>> option in the domain config file. See /etc/xen/xmexample1 for an
example
>>> of
>>> its usage.
>>>
>>>  -- Keir
>>>
>>> On 04/11/2009 12:02, "Papagiannis Anastasios"
<apapag@ics.forth.gr>
>>> wrote:
>>>
>>>
>>>>
>>>> Hello,
>>>>
>>>> does the last version of Xen(3.4.1) support NUMA machines?
>>>
>>> Is there a .pdf
>>>
>>>>
>>>> or a link that can give me some more details about that? I work
on a
>>>> project for xen performace in numa machines. And in xen 3.3.0
this
>>>> performance isn''t good. Have something changed in last
version?
>>>>
>>>> Thanks in advance,
>>>> Papagiannis Anastasios
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2009-Nov-09 11:44 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Cpupools? :-)

NUMA was a topic I wanted to look at as soon as cpupools are officially
accepted. Keir wanted to propose a way to get rid of the function
continue_hypercall_on_cpu() which was causing most of the stuff leading
to the objection of cpupools.
I guess Keir had some higher priority jobs. :-)
So I will try a new patch for cpupools without continue_hypercall_on_cpu()
and perhaps with NUMA support.
George, would this be okay for you? I think your scheduler still will have
problems with domain weights as long as domains are restricted to some
processors, right?

Juergen

George Dunlap wrote:> I haven''t had time to look at NUMA stuff at all.  I probably will
look
> at it eventually, if no one else does, but I''d be happy if someone
else
> could pursue it.
> 
> -George
> 
> Dan Magenheimer wrote:
>> VMware has the notion of a "cell" where VMs can be
>> scheduled only within a cell, not across cells.
>> Cell boundaries are determined by VMware by
>> default, though certains settings can override them.
>>
>> An interesting project might be to implement
>> "numa=cell" for Xen.... or maybe something similar
>> is already in George Dunlap''s scheduler plans?
>>
>>  
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>> Sent: Wednesday, November 04, 2009 5:33 AM
>>> To: Papagiannis Anastasios; xen-devel@lists.xensource.com
>>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>>
>>>
>>> Add Xen boot parameter ''numa=on'' to enable NUMA
detection. Then it''s
>>> up to
>>> you to, for example, pin domains to specific nodes, using the
''cpus=...''
>>> option in the domain config file. See /etc/xen/xmexample1 for an
>>> example of
>>> its usage.
>>>
>>>  -- Keir
>>>
>>> On 04/11/2009 12:02, "Papagiannis Anastasios"
<apapag@ics.forth.gr>
>>> wrote:
>>>
>>>    
>>>> Hello,
>>>>
>>>> does the last version of Xen(3.4.1) support NUMA machines?
>>> Is there a .pdf
>>>    
>>>> or a link that can give me some more details about that? I work
on a
>>>> project for xen performace in numa machines. And in xen 3.3.0
this
>>>> performance isn''t good. Have something changed in last
version?
>>>>
>>>> Thanks in advance,
>>>> Papagiannis Anastasios
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>       
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>     
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-09 12:07 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On Mon, Nov 9, 2009 at 11:44 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:> George, would this be okay for you? I think your scheduler still will have
> problems with domain weights as long as domains are restricted to some
> processors, right?
Hmm, this may be a point of discussion at some point.

My plan was actually to have one runqueue per L2 processor cache.
Thus as many as 4 cores (and possibly 8 hyperthreads) would be sharing
the same runqueue; doing CPU pinning within the same runqueue would be
problematic.

I was planning on having credits work mainly within one runqueue, and
then do load balancing between runqueues.  In that case pinning to a
specific runqueue shouldn''t cause a problem, because credits of one
runqueue wouldn''t affect credtis of another one.

However, I haven''t implemented or tested this idea yet; it''s
possible
that having credits kept distinct and doing load balancing between
runqueues will cause unacceptable levels of unfairness.  I expect it
to be fine (esp since Linux''s scheduler does this kind of load
balancing, but doesn''t share runqueues between logical processors),
but without implementation and testing I can''t say for sure.

Thoughts are welcome at this point, but it will probably be better to
have a real discussion once I''ve posted some patches.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-09 12:29 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On Mon, Nov 9, 2009 at 11:39 AM, Dulloor <dulloor@gmail.com>
wrote:> What''s the current scope and status of your scheduler work ? Is it
> going to look similar to the Linux scheduler (with scheduling domains,
> et al). In that case, topology is already accounted for, to a large
> extent. It would be good to know so that I can work on something that
> doesn''t overlap.
My plan was to do something similar to Linux, but with this
difference: Instead of having one runqueue per logical processor (as
both Xen and Linux currently do), and having "domains" all the way up
(as Linux currently does), I had planned on having one runqueue per L2
processor cache.  The main reason to avoid migration is to preserve a
warm cache; but since L1''s are replaced so quickly, there should be
little impact to a VM migrating between different threads and cores
which share the same L2.

Above the L2s I was planning on having an idea similar to the Linux
"domains" (although obviously it would need a different name to avoid
confusion), and doing explicit load-balancing between them.  But as I
have not had a chance to test this kind of load balancing yet, the
plan may change somewhate before then.

Problems to solve wrt NUMA, as I understand it, are to balance the
performance cost of sharing a busy local CPU, vs the performance cost
of non-local memory accesses.  This would involve adding the NUMA
logic to the load balancing algorithm.  Which I guess would depend in
part on having a load balancing algorithm to begin with. :-)

Once I have the basic credit patches in working order, would you be
interested in working on the load-balancing between runqueues?  I can
then work on further testing of the credit algorithm.  My ultimate
goal would be to have a basic regression test that people could use to
measure how their changes to the scheduler affect a wide variety of
workloads.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-09 12:40 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 09/11/2009 11:44, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> NUMA was a topic I wanted to look at as soon as cpupools are officially
> accepted. Keir wanted to propose a way to get rid of the function
> continue_hypercall_on_cpu() which was causing most of the stuff leading
> to the objection of cpupools.
> I guess Keir had some higher priority jobs. :-)
Well, I forgot about it. I think the plan was to perhaps keep something like
continue_hypercall_on_cpu(), but not need to actually run the vcpu itself
''over there'' but instead schedule a tasklet or somesuch, and
sleep on its
completion. That would get rid of the skanky affinity hacks you had to do to
support continue_hypercall_on_cpu(). I''ll have a look back at what we
discussed.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dulloor

2009-Nov-09 12:51 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Sure ! Let know when you have the patches ready. Also, that might be a
good time to see if runq-per-l2 works better.

-dulloor

On Mon, Nov 9, 2009 at 7:29 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Mon, Nov 9, 2009 at 11:39 AM, Dulloor <dulloor@gmail.com> wrote:
>> What''s the current scope and status of your scheduler work ?
Is it
>> going to look similar to the Linux scheduler (with scheduling domains,
>> et al). In that case, topology is already accounted for, to a large
>> extent. It would be good to know so that I can work on something that
>> doesn''t overlap.
>
> My plan was to do something similar to Linux, but with this
> difference: Instead of having one runqueue per logical processor (as
> both Xen and Linux currently do), and having "domains" all the
way up
> (as Linux currently does), I had planned on having one runqueue per L2
> processor cache.  The main reason to avoid migration is to preserve a
> warm cache; but since L1''s are replaced so quickly, there should
be
> little impact to a VM migrating between different threads and cores
> which share the same L2.
>
> Above the L2s I was planning on having an idea similar to the Linux
> "domains" (although obviously it would need a different name to
avoid
> confusion), and doing explicit load-balancing between them.  But as I
> have not had a chance to test this kind of load balancing yet, the
> plan may change somewhate before then.
>
> Problems to solve wrt NUMA, as I understand it, are to balance the
> performance cost of sharing a busy local CPU, vs the performance cost
> of non-local memory accesses.  This would involve adding the NUMA
> logic to the load balancing algorithm.  Which I guess would depend in
> part on having a load balancing algorithm to begin with. :-)
>
> Once I have the basic credit patches in working order, would you be
> interested in working on the load-balancing between runqueues?  I can
> then work on further testing of the credit algorithm.  My ultimate
> goal would be to have a basic regression test that people could use to
> measure how their changes to the scheduler affect a wide variety of
> workloads.
>
>  -George
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Nov-09 15:02 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Dan Magenheimer wrote:>> Add Xen boot parameter ''numa=on'' to enable NUMA
detection.
>> Then it''s up to you to, for example, pin domains to specific
nodes,
>> using the ''cpus=...'' option in the domain config
file. See
>> /etc/xen/xmexample1 for an example of its usage.
> VMware has the notion of a "cell" where VMs can be
> scheduled only within a cell, not across cells.
> Cell boundaries are determined by VMware by
> default, though certains settings can override them.Well, If I got this right, then you are describing the current behaviour 
of Xen. It has a similar feature for some time now (since 3.3, I guess). 
When you launch a domain on a numa=on machine, it will pick the least 
busiest node (which can hold the requested memory) and restrict the 
domain to that node (by only allowing CPUs of that node).
This is in XendDomainInfo.py (c/s 17131, 17247, 17709)
Looks like this one:
(kernel xen.gz numa=on dom0_mem=6144M dom0_max_vcpus=6 dom0_vcpus_pin)
# xm create opensuse.hvm
# xm create opensuse2.hvm
# xm vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU 
Affinity
001-LTP                              1     0     6   -b-      17.8 6-11
001-LTP                              1     1     7   -b-       6.3 6-11
002-LTP                              2     0    12   -b-      19.0 12-17
002-LTP                              2     1    16   -b-       1.6 12-17
002-LTP                              2     2    17   -b-       1.7 12-17
002-LTP                              2     3    14   -b-       1.6 12-17
002-LTP                              2     4    16   -b-       1.6 12-17
002-LTP                              2     5    15   -b-       1.5 12-17
002-LTP                              2     6    12   -b-       1.3 12-17
002-LTP                              2     7    13   -b-       1.8 12-17
Domain-0                             0     0     0   -b-      12.6 0
Domain-0                             0     1     1   -b-       7.6 1
Domain-0                             0     2     2   -b-       8.0 2
Domain-0                             0     3     3   -b-      14.6 3
Domain-0                             0     4     4   r--       1.4 4
Domain-0                             0     5     5   -b-       0.9 5
# xm debug-keys U
(XEN) Domain 0 (total: 2097152):
(XEN)     Node 0: 2097152
(XEN)     Node 1: 0
(XEN)     Node 2: 0
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0
(XEN) Domain 1 (total: 394219):
(XEN)     Node 0: 0
(XEN)     Node 1: 394219
(XEN)     Node 2: 0
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0
(XEN) Domain 2 (total: 394219):
(XEN)     Node 0: 0
(XEN)     Node 1: 0
(XEN)     Node 2: 394219
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0

Note that there were no cpus= lines in the config files, Xen did that 
automatically.

Domains can be localhost-migrated to another node:
# xm migrate --node=4 1 localhost
The only issue is with domains larger than a node.
If someone has a useful use-case, I can start rebasing my old patches 
for NUMA aware HVM domains to Xen unstable.

Regards,
Andre.

BTW: Shouldn''t we set finally numa=on as the default value?

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-09 15:06 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Andre Przywara wrote:> BTW: Shouldn''t we set finally numa=on as the default value?
>   Is there any data to support the idea that this helps significantly on 
common systems?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Nov-09 15:19 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

>>> Andre Przywara <andre.przywara@amd.com> 09.11.09 16:02
>>>
>BTW: Shouldn''t we set finally numa=on as the default value?
I''d say no, at least until the default confinement of a guest to a
single
node gets fixed to properly deal with guests having more vCPU-s than
a node''s worth of pCPU-s (i.e. I take it for granted that the benefits
of
not overcommitting CPUs outweigh the drawbacks of cross-node memory
accesses at the very least for CPU-bound workloads).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Nov-09 22:51 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

George Dunlap wrote:> Andre Przywara wrote:
>> BTW: Shouldn''t we set finally numa=on as the default value?
>>   
> Is there any data to support the idea that this helps significantly on 
> common systems?I don''t have any numbers handy, but I will try if I can generate some.

Looking from a high level perspective it is a shame that it''s not the 
default: With numa=off the Xen domain loader will allocate physical 
memory from some node (maybe even from several nodes) and will schedule 
the guest on some other (even rapidly changing) nodes. According to 
Murphy''s law you will end up with _all_ the memory access of a guest to
be remote. But in fact a NUMA architecture is really beneficial for 
virtualization: As there are close to zero cross domain memory accesses 
(except for Dom0), each node is more or less self contained and each 
guest can use the node''s memory controller almost exclusively.
But this is all spoiled as most people don''t know about Xen''s
NUMA
capabilities and don''t set numa=on. Using this as a default would solve
this.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 488-3567-12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Nov-10 01:46 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

> >>> Andre Przywara <andre.przywara@amd.com> 09.11.09 16:02
>>>
> >BTW: Shouldn''t we set finally numa=on as the default value?
> 
> I''d say no, at least until the default confinement of a guest to a
single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node''s worth of pCPU-s (i.e. I take it for granted that the
benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).
What default confinement? I thought guests had an all-pCPUs affinity mask be
default?

I suspect we will get benefits enabling NUMA even if all the guests have
all-pCPUs affinity masks: all guests will have memory stripped across all nodes,
which is likely better than allocating from one node and then the other.
Obviously assigning VMs to node(s) and allocating memory accordingly is the best
plan.

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dulloor

2009-Nov-10 06:56 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

I am not finding this. Can you please point to the code ?

numa=on/off is only for setting up numa in xen (similar to the linux
knob, but turned off by default). The allocation of memory from a
single node (that you observe) could be because of the way
alloc_heap_pages is implemented (trying to allocate from all the heaps
from a node, before trying the next one) - try looking at dump_numa
output. And, affinities are not set anywhere based on the node from
which allocation happens.

-dulloor

On Mon, Nov 9, 2009 at 5:51 PM, Andre Przywara <andre.przywara@amd.com>
wrote:> George Dunlap wrote:
>>
>> Andre Przywara wrote:
>>>
>>> BTW: Shouldn''t we set finally numa=on as the default
value?
>>>
>>
>> Is there any data to support the idea that this helps significantly on
>> common systems?
>
> I don''t have any numbers handy, but I will try if I can generate
some.
>
> Looking from a high level perspective it is a shame that it''s not
the
> default: With numa=off the Xen domain loader will allocate physical memory
> from some node (maybe even from several nodes) and will schedule the guest
> on some other (even rapidly changing) nodes. According to Murphy''s
law you
> will end up with _all_ the memory access of a guest to be remote. But in
> fact a NUMA architecture is really beneficial for virtualization: As there
> are close to zero cross domain memory accesses (except for Dom0), each node
> is more or less self contained and each guest can use the node''s
memory
> controller almost exclusively.
> But this is all spoiled as most people don''t know about
Xen''s NUMA
> capabilities and don''t set numa=on. Using this as a default would
solve
> this.
>
> Regards,
> Andre.
>
> --
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> Tel: +49 351 488-3567-12
> ----to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Nov-10 07:49 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Dulloor wrote:> I am not finding this. Can you please point to the code ?tools/python/xen/xend/XendDomainInfo.py (around line 2600)
with the core code being:
-------------
       index = nodeload.index( min(nodeload) )
       cpumask = info[''node_to_cpu''][index]
   for v in range(0, self.info[''VCPUs_max'']):
       xc.vcpu_setaffinity(self.domid, v, cpumask)
--------------
The code got introduced with c/s 17131 and later got refined with c/s 
17247 and c/s 17709.> 
> numa=on/off is only for setting up numa in xen (similar to the linux
> knob, but turned off by default). The allocation of memory from a
> single node (that you observe) could be because of the way
> alloc_heap_pages is implemented (trying to allocate from all the heaps
> from a node, before trying the next one)Yes, but if the domain is pinned before it allocated it''s memory, then 
the natural behavior of Xen is to take memory from this local node.
> - try looking at dump_numa
> output. And, affinities are not set anywhere based on the node from
> which allocation happens.It is the other way round, first the domain is pinned, later the memory 
is allocated (based on the node to which the currently scheduled CPU is 
belonging to).

Regards,
Andre.
> 
> -dulloor
> 
> On Mon, Nov 9, 2009 at 5:51 PM, Andre Przywara
<andre.przywara@amd.com> wrote:
>> George Dunlap wrote:
>>> Andre Przywara wrote:
>>>> BTW: Shouldn''t we set finally numa=on as the default
value?
>>>>
>>> Is there any data to support the idea that this helps significantly
on
>>> common systems?
>> I don''t have any numbers handy, but I will try if I can
generate some.
>>
>> Looking from a high level perspective it is a shame that it''s
not the
>> default: With numa=off the Xen domain loader will allocate physical
memory
>> from some node (maybe even from several nodes) and will schedule the
guest
>> on some other (even rapidly changing) nodes. According to
Murphy''s law you
>> will end up with _all_ the memory access of a guest to be remote. But
in
>> fact a NUMA architecture is really beneficial for virtualization: As
there
>> are close to zero cross domain memory accesses (except for Dom0), each
node
>> is more or less self contained and each guest can use the
node''s memory
>> controller almost exclusively.
>> But this is all spoiled as most people don''t know about
Xen''s NUMA
>> capabilities and don''t set numa=on. Using this as a default
would solve
>> this.
>>
>> Regards,
>> Andre.
>>-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2009-Nov-10 08:51 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

>>> Ian Pratt <Ian.Pratt@eu.citrix.com> 10.11.09 02:46
>>>
>> >>> Andre Przywara <andre.przywara@amd.com> 09.11.09
16:02 >>>
>> >BTW: Shouldn''t we set finally numa=on as the default
value?
>> 
>> I''d say no, at least until the default confinement of a guest
to a single
>> node gets fixed to properly deal with guests having more vCPU-s than
>> a node''s worth of pCPU-s (i.e. I take it for granted that the
benefits of
>> not overcommitting CPUs outweigh the drawbacks of cross-node memory
>> accesses at the very least for CPU-bound workloads).
>
>What default confinement? I thought guests had an all-pCPUs affinity mask be
default?
Not with numa=on (see also Andre''s post to this effect): The guest will
get assigned to a node, and its affinity set to that node''s CPUs.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-10 08:57 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 10/11/2009 08:51, "Jan Beulich" <JBeulich@novell.com> wrote:
>> What default confinement? I thought guests had an all-pCPUs affinity
mask be
>> default?
> 
> Not with numa=on (see also Andre''s post to this effect): The guest
will
> get assigned to a node, and its affinity set to that node''s CPUs.
...And if it didn''t, striping would not happen. In fact iirc the
default
NUMA allocation policy for an all-pcpus domain is in some respects pessimal:
vcpu0''s initial node gets drained of memory first. I.e., you get *less*
''striping'' than you could with numa=off where you might at
least get lucky.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-12 16:09 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 09/11/2009 15:19, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>> Andre Przywara <andre.przywara@amd.com> 09.11.09 16:02
>>>
>> BTW: Shouldn''t we set finally numa=on as the default value?
> 
> I''d say no, at least until the default confinement of a guest to a
single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node''s worth of pCPU-s (i.e. I take it for granted that the
benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).
If this would be fixed (e.g., turn off node locality entirely by default for
domains which will not fit into a single node) then I think we could
consider numa=on by default.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Nov-13 14:14 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

George Dunlap wrote:> Andre Przywara wrote:
>> BTW: Shouldn''t we set finally numa=on as the default value?
>>   
> Is there any data to support the idea that this helps significantly on 
> common systems?I did some tests on an 8 node machine. I will retry this later on 
4-nodes and 2-nodes systems, but I assume similar numbers. I used 
multiple guests in parallel each running bw_mem of lmbench, which is 
admittedly quite NUMA sensitive. I cannot publish real numbers (yet?), 
but the results were dramatic:
with numa=on I got the same results for each guest (the same as the 
native result) when the number of guests was smaller or equal the number 
of nodes (since each guest got it''s own memory controller).
If I disabled NUMA aware placement by explicitly specifying
cpus="0-31"
  in the config file or booted with numa=off, the values dropped down by 
factor 3-5 (!) (even for a few guests) with some variance due to the 
random nature of core to memory mapping.
Overcommitting the nodes (letting multiple guests use each node) lowered 
the values to about 80% for two guests and 60% for three guests per 
node, but it never got anywhere close to the numa=off values.
So these results encourage me again to opt for numa=on as the default value.
Keir, I will check if dropping the node containment in the CPU 
overcommitment case is an option, but what would be the right strategy 
in that case?
Warn the user?
Don''t contain at all?
Contain to more than onde node?

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Nov-13 14:29 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

> Overcommitting the nodes (letting multiple guests use each node) lowered
> the values to about 80% for two guests and 60% for three guests per
> node, but it never got anywhere close to the numa=off values.
> So these results encourage me again to opt for numa=on as the default
> value.
> Keir, I will check if dropping the node containment in the CPU
> overcommitment case is an option, but what would be the right strategy
> in that case?
> Warn the user?
> Don''t contain at all?
> Contain to more than onde node?
In the case where a VM is asking for more vCPUs there are pCPUs in a node we
should contain the guest to multiple nodes. (I presume we favour nodes according
to the number of vCPUs they already have committed to them?)

We should turn off automatic node containment of any kind if the total number of
pCPUs in the system is <= 8  -- on such systems the statistical multiplexing
gain of having access to more pCPUs likely outweighs the NUMA placement benefit
and memory striping will be a better strategy.
I''m inclined to believe that may be true for 2 node systems with
<=16 pCPUs too under many workloads

I''d really like to see us enumerate pCPUs in a sensible order so that
it''s easier to see the topology.  It should be
nodes.sockets.cores{.threads}, leaving gaps for missing execution units due to
hot plug or non power of two packing.
Right now we''re inconsistent in the enumeration order depending on how
the BIOS has set things up. It would be great if someone could volunteer to fix
this...

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-13 14:31 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 13/11/2009 14:14, "Andre Przywara" <andre.przywara@amd.com>
wrote:
> Keir, I will check if dropping the node containment in the CPU
> overcommitment case is an option, but what would be the right strategy
> in that case?
> Warn the user?
> Don''t contain at all?
> Contain to more than onde node?
I would suggest simply don''t contain at all (i.e., keep equivalent
numa=off
behaviour) would be safest.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Nov-13 15:25 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

Ian Pratt wrote:> In the case where a VM is asking for more vCPUs there are pCPUs in a node
we should contain the guest to multiple nodes. (I presume we favour nodes
according to the number of vCPUs they already have committed to them?)
Seems like CPU load might be a better measure.  Xen doesn''t calculate 
load currently, but it''s on my list of things to do.

  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-13 15:27 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 13/11/2009 14:29, "Ian Pratt" <Ian.Pratt@eu.citrix.com>
wrote:
> I''d really like to see us enumerate pCPUs in a sensible order so
that it''s
> easier to see the topology.  It should be nodes.sockets.cores{.threads},
> leaving gaps for missing execution units due to hot plug or non power of
two
> packing. 
> Right now we''re inconsistent in the enumeration order depending on
how the
> BIOS has set things up. It would be great if someone could volunteer to fix
> this...
Even better would be to have pCPUs addressable and listable explicitly as
dotted tuples. That can be implemented entirely within the toolstack, and
could even allow wildcarding of tuple components to efficiently express
cpumasks.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Nov-13 15:35 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

> Ian Pratt wrote:
> > In the case where a VM is asking for more vCPUs there are pCPUs in a
> node we should contain the guest to multiple nodes. (I presume we favour
> nodes according to the number of vCPUs they already have committed to
> them?)
> 
> Seems like CPU load might be a better measure.  Xen doesn''t
calculate
> load currently, but it''s on my list of things to do.
I''d rather get this stuff fixed now than wait for the new scheduler.

It''s not clear that instantaneous CPU load is any better than just
counting the number of vCPUs. The XCP xapi stack also records good historical
data, and would be in a better position to do the placement. Further work.

Ian 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Nov-13 15:38 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

> > Keir, I will check if dropping the node containment in the CPU
> > overcommitment case is an option, but what would be the right strategy
> > in that case?
> > Warn the user?
> > Don''t contain at all?
> > Contain to more than onde node?
> 
> I would suggest simply don''t contain at all (i.e., keep equivalent
> numa=off
> behaviour) would be safest.
I disagree. In systems with 2 nodes it will use all nodes, which is the same as
your propose[*]. In systems with more nodes it will do placement to some subset.
Note that systems with >2 nodes generally have stronger NUMA effects and
these are exactly the systems where node placement is a good thing.

[*] note that numa=off is quite different from just disabling node placement. If
node placement is disabled we still get the benefit of memory striping across
nodes, which at least avoids some performance cliffs.

Ian
  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Nov-13 15:40 UTC

head link

RE: [Xen-devel] Xen 3.4.1 NUMA support

> > I''d really like to see us enumerate pCPUs in a sensible order
so that
> it''s
> > easier to see the topology.  It should be
nodes.sockets.cores{.threads},
> > leaving gaps for missing execution units due to hot plug or non power
of
> two
> > packing.
> > Right now we''re inconsistent in the enumeration order
depending on how
> the
> > BIOS has set things up. It would be great if someone could volunteer
to
> fix
> > this...
> 
> Even better would be to have pCPUs addressable and listable explicitly as
> dotted tuples. That can be implemented entirely within the toolstack, and
> could even allow wildcarding of tuple components to efficiently express
> cpumasks.
Yes, I''d certainly like to see the toolstack support dotted tuple
notation.

However, I just don''t trust the toolstack to get this right unless xen
has already set it up nicely for it with a sensible enumeration and defined
sockets-per-node, cores-per-socket and threads-per-core parameters. Xen should
provide a clean interface to the toolstack in this respect.

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Nov-13 16:02 UTC

head link

Re: [Xen-devel] Xen 3.4.1 NUMA support

On 13/11/2009 15:40, "Ian Pratt" <Ian.Pratt@eu.citrix.com>
wrote:
>> Even better would be to have pCPUs addressable and listable explicitly
as
>> dotted tuples. That can be implemented entirely within the toolstack,
and
>> could even allow wildcarding of tuple components to efficiently express
>> cpumasks.
> 
> Yes, I''d certainly like to see the toolstack support dotted tuple
notation.
> 
> However, I just don''t trust the toolstack to get this right unless
xen has
> already set it up nicely for it with a sensible enumeration and defined
> sockets-per-node, cores-per-socket and threads-per-core parameters. Xen
should
> provide a clean interface to the toolstack in this respect.
Xen provides a topology-interrogation hypercall which should suffice for
tools to build up a {node,socket,core,thread}<->cpuid mapping table.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2009-Nov-30 15:40 UTC

head link

[Xen-devel] [PATCH] tools: avoid over-commitment if numa=on

Jan Beulich wrote:>>>> Andre Przywara <andre.przywara@amd.com> 09.11.09 16:02
>>>
>> BTW: Shouldn''t we set finally numa=on as the default value?
> 
> I''d say no, at least until the default confinement of a guest to a
single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node''s worth of pCPU-s (i.e. I take it for granted that the
benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).That sounds reasonable.
Attached a patch to lift the restriction of one node per guest if the 
number of VCPUs is greater than the number of cores / node.
This isn''t optimal (the best way would be to inform the guest about it,
but this is another patchset ;-), but should solve the above concerns.

Please apply,
Andre.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2009 - Xen 3.4.1 NUMA support

[Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

RE: [Xen-devel] Xen 3.4.1 NUMA support

Re: [Xen-devel] Xen 3.4.1 NUMA support

[Xen-devel] [PATCH] tools: avoid over-commitment if numa=on