thr3ads.net - Xen devel - [Xen-devel] Host Numa informtion in dom0 [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Kamble, Nitin A

2010-Jan-29 23:05 UTC

[Xen-devel] Host Numa informtion in dom0

Hi Keir,
   Attached is the patch which exposes the host numa information to dom0. With
the patch "xm info" command now also gives the cpu topology & host
numa information. This will be later used to build guest numa support.
The patch basically changes physinfo sysctl, and adds topology_info &
numa_info sysctls, and also changes the python & libxc code accordingly.

Please apply.

Thanks & Regards,
Nitin





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Jan-30 08:09 UTC

head link

Re: [Xen-devel] Host Numa informtion in dom0

I''ll apply post 4.0.0. Please also supply a signed-off-by line.

 -- Keir

On 29/01/2010 23:05, "Kamble, Nitin A"
<nitin.a.kamble@intel.com> wrote:
> Hi Keir,
>    Attached is the patch which exposes the host numa information to dom0.
With
> the patch ³xm info² command now also gives the cpu topology & host numa
> information. This will be later used to build guest numa support.
> The patch basically changes physinfo sysctl, and adds topology_info &
> numa_info sysctls, and also changes the python & libxc code
accordingly.
>  
> Please apply.
>  
> Thanks & Regards,
> Nitin
>  
>  
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kamble, Nitin A

2010-Feb-01 02:21 UTC

head link

RE: [Xen-devel] Host Numa informtion in dom0

Thanks Keir,
Signed-Off-by: Nitin A Kamble <nitin.a.kamble@intel.com>

Regards,
Nitin

-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
Sent: Saturday, January 30, 2010 12:09 AM
To: Kamble, Nitin A; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Host Numa informtion in dom0

I''ll apply post 4.0.0. Please also supply a signed-off-by line.

 -- Keir

On 29/01/2010 23:05, "Kamble, Nitin A"
<nitin.a.kamble@intel.com> wrote:
> Hi Keir,
>    Attached is the patch which exposes the host numa information to dom0.
With
> the patch ³xm info² command now also gives the cpu topology & host numa
> information. This will be later used to build guest numa support.
> The patch basically changes physinfo sysctl, and adds topology_info &
> numa_info sysctls, and also changes the python & libxc code
accordingly.
>  
> Please apply.
>  
> Thanks & Regards,
> Nitin
>  
>  
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2010-Feb-01 10:23 UTC

head link

Re: [Xen-devel] Host Numa informtion in dom0

Kamble, Nitin A wrote:> Hi Keir,
>
> Attached is the patch which exposes the host numa information to
> dom0. With the patch “xm info” command now also gives the cpu topology
&
> host numa information. This will be later used to build guest numa support.What information are you missing from the current physinfo? As far as I
can see, only the total amount of memory per node is not provided. But
one could get this info from parsing the SRAT table in Dom0, which is at
least mapped into Dom0''s memory.
Or do you want to provide NUMA information to all PV guests (but then it
cannot be a sysctl)? This would be helpful, as this would avoid to
enable ACPI parsing in PV Linux for NUMA guest support.

Beside that I have to oppose the introduction of sockets_per_node again.
Future AMD processors will feature _two_ nodes on _one_ socket, so this
variable should hold 1/2, but this will be rounded to zero. I think this
information is pretty useless anyway, as the number of sockets is mostly
interesting for licensing purposes, where a single number is sufficient.
For scheduling purposes cache topology is more important.

My NUMA guest patches (currently for HVM only) are doing fine, I will
try to send out a RFC patches this week. I think they don''t interfere
with this patch, but if you have other patches in development, we should
sync on this.
The scope of my patches is to let the user (or xend) describe a guest''s
topology (either by specifying only the number of guest nodes in the
config file or by explicitly describing the whole NUMA topology). Some
code will assign host nodes to the guest nodes (I am not sure yet
whether this really belongs into xend as it currently does, or is better
done in libxc, where libxenlight would also benefit).
Then libxc''s hvm_build_* will pass that info into the hvm_info_table,
where code in the hvmloader will generate an appropriate SRAT table.
An extension of this would be to let Xen automatically decide whether a
split of the resources is necessary (because there is not enough memory
available (anymore) on one node).

Looking forward to comments...

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dulloor

2010-Feb-01 17:53 UTC

head link

Re: [Xen-devel] Host Numa informtion in dom0

> Beside that I have to oppose the introduction of sockets_per_node again.
> Future AMD processors will feature _two_ nodes on _one_ socket, so this
> variable should hold 1/2, but this will be rounded to zero. I think this
> information is pretty useless anyway, as the number of sockets is mostly
> interesting for licensing purposes, where a single number is sufficient.
I sent a similar patch (was using to enlist pcpu-tuples and in
vcpu-pin/unpin) and I didn''t pursue it because of this same argument.
When we talk of cpu topology, that''s how it is currently :
nodes-socket-cpu-core. Don''t sockets also figure in the cache and
interconnect hierarchy ?
What would be the hierarchy in those future AMD processors ? Even Keir
and Ian Pratt initially wanted the pcpu-tuples
to be listed that way. So, it would be helpful to make a call and move ahead.

-dulloor


On Mon, Feb 1, 2010 at 5:23 AM, Andre Przywara <andre.przywara@amd.com>
wrote:> Kamble, Nitin A wrote:
>>
>> Hi Keir,
>>
>>   Attached is the patch which exposes the host numa information to
dom0.
>> With the patch “xm info” command now also gives the cpu topology &
host numa
>> information. This will be later used to build guest numa support.
>
> What information are you missing from the current physinfo? As far as I can
> see, only the total amount of memory per node is not provided. But one
could
> get this info from parsing the SRAT table in Dom0, which is at least mapped
> into Dom0''s memory.
> Or do you want to provide NUMA information to all PV guests (but then it
> cannot be a sysctl)? This would be helpful, as this would avoid to enable
> ACPI parsing in PV Linux for NUMA guest support.
>
> Beside that I have to oppose the introduction of sockets_per_node again.
> Future AMD processors will feature _two_ nodes on _one_ socket, so this
> variable should hold 1/2, but this will be rounded to zero. I think this
> information is pretty useless anyway, as the number of sockets is mostly
> interesting for licensing purposes, where a single number is sufficient.
>  For scheduling purposes cache topology is more important.
>
> My NUMA guest patches (currently for HVM only) are doing fine, I will try
to
> send out a RFC patches this week. I think they don''t interfere
with this
> patch, but if you have other patches in development, we should sync on
this.
> The scope of my patches is to let the user (or xend) describe a
guest''s
>  topology (either by specifying only the number of guest nodes in the
config
> file or by explicitly describing the whole NUMA topology). Some code will
> assign host nodes to the guest nodes (I am not sure yet whether this really
> belongs into xend as it currently does, or is better done in libxc, where
> libxenlight would also benefit).
> Then libxc''s hvm_build_* will pass that info into the
hvm_info_table, where
> code in the hvmloader will generate an appropriate SRAT table.
> An extension of this would be to let Xen automatically decide whether a
> split of the resources is necessary (because there is not enough memory
> available (anymore) on one node).
>
> Looking forward to comments...
>
> Regards,
> Andre.
>
> --
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> Tel: +49 351 448 3567 12
> ----to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2010-Feb-01 21:39 UTC

head link

Re: [Xen-devel] Host Numa informtion in dom0

Dulloor wrote:>> Beside that I have to oppose the introduction of sockets_per_node
again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think
this
>> information is pretty useless anyway, as the number of sockets is
mostly
>> interesting for licensing purposes, where a single number is
sufficient.
> 
> I sent a similar patch (was using to enlist pcpu-tuples and in
> vcpu-pin/unpin) and I didn''t pursue it because of this same
argument.
> When we talk of cpu topology, that''s how it is currently :
> nodes-socket-cpu-core. Don''t sockets also figure in the cache and
> interconnect hierarchy ?Not necessarily. Think of Intel''s Core2Quad, they have two separate L2 
caches each associated to two of the four cores in one socket. If you 
move from core0 to core2 then AFAIK the cost would be very similar to 
moving to another processor socket. So in fact the term socket does not 
help here.
The situation is similar to the new AMD CPUs, just that it replaces "L2 
cache" with "node" (aka shared memory controller, which also
matches
shared L3 cache). In fact the cost of moving from one node to the 
neighbor in the same socket is exactly the same as moving to another 
socket.> What would be the hierarchy in those future AMD processors ? Even Keir
> and Ian Pratt initially wanted the pcpu-tuples
> to be listed that way. So, it would be helpful to make a call and move
ahead.You could create variables like cores_per_socket and cores_per_node, 
this would solve this issue for now. Actually better would be an array 
mapping cores (or threads) to {nodes,sockets,L[123]_caches}, as this 
would allow asymmetrical configurations (useful for guests).
In the past there once was a socket_per_node value in physinfo, but it 
has been removed. It was not used anywhere, and multiplying the whole 
chain of x_per_y sometimes ended up in wrong values anyway.
Anyway, if you insist on this value it will hold bogus values for the 
upcoming processors. If it will be zero, you end up in trouble when 
multiplying or dividing with it, and letting it be one is also wrong.
I am sorry to spoil this whole game, but that it''s how it is.

If you or Nitin show me how the socket_per_node variable should be used, 
we can maybe find a pleasing solution.

Regards,
Andre.> 
> On Mon, Feb 1, 2010 at 5:23 AM, Andre Przywara
<andre.przywara@amd.com> wrote:
>> Kamble, Nitin A wrote:
>>> Hi Keir,
>>>
>>>   Attached is the patch which exposes the host numa information to
dom0.
>>> With the patch “xm info” command now also gives the cpu topology
& host numa
>>> information. This will be later used to build guest numa support.
>> What information are you missing from the current physinfo? As far as I
can
>> see, only the total amount of memory per node is not provided. But one
could
>> get this info from parsing the SRAT table in Dom0, which is at least
mapped
>> into Dom0''s memory.
>> Or do you want to provide NUMA information to all PV guests (but then
it
>> cannot be a sysctl)? This would be helpful, as this would avoid to
enable
>> ACPI parsing in PV Linux for NUMA guest support.
>>
>> Beside that I have to oppose the introduction of sockets_per_node
again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think
this
>> information is pretty useless anyway, as the number of sockets is
mostly
>> interesting for licensing purposes, where a single number is
sufficient.
>>  For scheduling purposes cache topology is more important.
>>
>> My NUMA guest patches (currently for HVM only) are doing fine, I will
try to
>> send out a RFC patches this week. I think they don''t interfere
with this
>> patch, but if you have other patches in development, we should sync on
this.
>> The scope of my patches is to let the user (or xend) describe a
guest''s
>>  topology (either by specifying only the number of guest nodes in the
config
>> file or by explicitly describing the whole NUMA topology). Some code
will
>> assign host nodes to the guest nodes (I am not sure yet whether this
really
>> belongs into xend as it currently does, or is better done in libxc,
where
>> libxenlight would also benefit).
>> Then libxc''s hvm_build_* will pass that info into the
hvm_info_table, where
>> code in the hvmloader will generate an appropriate SRAT table.
>> An extension of this would be to let Xen automatically decide whether a
>> split of the resources is necessary (because there is not enough memory
>> available (anymore) on one node).
>>
>> Looking forward to comments...
>>
>> Regards,
>> Andre.
>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kamble, Nitin A

2010-Feb-01 23:21 UTC

head link

RE: [Xen-devel] Host Numa informtion in dom0

Andre, Dulloor,
  Some of us are also busy cooking guest numa patches for xen. I think we should
sync up, so that it works well for both.
  And sockets_per_node can be taken out if it is issue to you. That was added to
assist the user in specifying the numa topology for the guest. It is not
strictly required, and can be taken out without any harm.

Thanks & Regards,
Nitin



-----Original Message-----
From: Andre Przywara [mailto:andre.przywara@amd.com] 
Sent: Monday, February 01, 2010 1:40 PM
To: Dulloor
Cc: Kamble, Nitin A; xen-devel@lists.xensource.com; Keir Fraser
Subject: Re: [Xen-devel] Host Numa informtion in dom0

Dulloor wrote:>> Beside that I have to oppose the introduction of sockets_per_node
again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think
this
>> information is pretty useless anyway, as the number of sockets is
mostly
>> interesting for licensing purposes, where a single number is
sufficient.
> 
> I sent a similar patch (was using to enlist pcpu-tuples and in
> vcpu-pin/unpin) and I didn''t pursue it because of this same
argument.
> When we talk of cpu topology, that''s how it is currently :
> nodes-socket-cpu-core. Don''t sockets also figure in the cache and
> interconnect hierarchy ?Not necessarily. Think of Intel''s Core2Quad, they have two separate L2 
caches each associated to two of the four cores in one socket. If you 
move from core0 to core2 then AFAIK the cost would be very similar to 
moving to another processor socket. So in fact the term socket does not 
help here.
The situation is similar to the new AMD CPUs, just that it replaces "L2 
cache" with "node" (aka shared memory controller, which also
matches
shared L3 cache). In fact the cost of moving from one node to the 
neighbor in the same socket is exactly the same as moving to another 
socket.> What would be the hierarchy in those future AMD processors ? Even Keir
> and Ian Pratt initially wanted the pcpu-tuples
> to be listed that way. So, it would be helpful to make a call and move
ahead.You could create variables like cores_per_socket and cores_per_node, 
this would solve this issue for now. Actually better would be an array 
mapping cores (or threads) to {nodes,sockets,L[123]_caches}, as this 
would allow asymmetrical configurations (useful for guests).
In the past there once was a socket_per_node value in physinfo, but it 
has been removed. It was not used anywhere, and multiplying the whole 
chain of x_per_y sometimes ended up in wrong values anyway.
Anyway, if you insist on this value it will hold bogus values for the 
upcoming processors. If it will be zero, you end up in trouble when 
multiplying or dividing with it, and letting it be one is also wrong.
I am sorry to spoil this whole game, but that it''s how it is.

If you or Nitin show me how the socket_per_node variable should be used, 
we can maybe find a pleasing solution.

Regards,
Andre.> 
> On Mon, Feb 1, 2010 at 5:23 AM, Andre Przywara
<andre.przywara@amd.com> wrote:
>> Kamble, Nitin A wrote:
>>> Hi Keir,
>>>
>>>   Attached is the patch which exposes the host numa information to
dom0.
>>> With the patch "xm info" command now also gives the cpu
topology & host numa
>>> information. This will be later used to build guest numa support.
>> What information are you missing from the current physinfo? As far as I
can
>> see, only the total amount of memory per node is not provided. But one
could
>> get this info from parsing the SRAT table in Dom0, which is at least
mapped
>> into Dom0''s memory.
>> Or do you want to provide NUMA information to all PV guests (but then
it
>> cannot be a sysctl)? This would be helpful, as this would avoid to
enable
>> ACPI parsing in PV Linux for NUMA guest support.
>>
>> Beside that I have to oppose the introduction of sockets_per_node
again.
>> Future AMD processors will feature _two_ nodes on _one_ socket, so this
>> variable should hold 1/2, but this will be rounded to zero. I think
this
>> information is pretty useless anyway, as the number of sockets is
mostly
>> interesting for licensing purposes, where a single number is
sufficient.
>>  For scheduling purposes cache topology is more important.
>>
>> My NUMA guest patches (currently for HVM only) are doing fine, I will
try to
>> send out a RFC patches this week. I think they don''t interfere
with this
>> patch, but if you have other patches in development, we should sync on
this.
>> The scope of my patches is to let the user (or xend) describe a
guest''s
>>  topology (either by specifying only the number of guest nodes in the
config
>> file or by explicitly describing the whole NUMA topology). Some code
will
>> assign host nodes to the guest nodes (I am not sure yet whether this
really
>> belongs into xend as it currently does, or is better done in libxc,
where
>> libxenlight would also benefit).
>> Then libxc''s hvm_build_* will pass that info into the
hvm_info_table, where
>> code in the hvmloader will generate an appropriate SRAT table.
>> An extension of this would be to let Xen automatically decide whether a
>> split of the resources is necessary (because there is not enough memory
>> available (anymore) on one node).
>>
>> Looking forward to comments...
>>
>> Regards,
>> Andre.
>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2010-Feb-05 17:39 UTC

head link

[Xen-devel] RE: Host Numa informtion in dom0

>    Attached is the patch which exposes the host numa information to dom0.
> With the patch "xm info" command now also gives the cpu topology
& host
> numa information. This will be later used to build guest numa support.
> 
> The patch basically changes physinfo sysctl, and adds topology_info &
> numa_info sysctls, and also changes the python & libxc code
accordingly.

It would be good to have a discussion about how we should expose NUMA
information to guests.

I believe we can control the desired allocation of memory from nodes and
creation of guest NUMA tables using VCPU affinity masks combined with a new
boolean option to enable exposure of NUMA information to guests.

For each guest VCPU, we should inspect its affinity mask to see which nodes the
VCPU is able to run on, thus building a set of ''allowed node''
masks. We should then compare all the ''allowed node'' masks to
see how many unique node masks there are -- this corresponds to the number of
NUMA nodes that we wish to expose to the guest if this guest has NUMA enabled.
We would aportion the guest''s pseudo-physical memory equally between
these virtual NUMA nodes.

If guest NUMA is disabled, we just use a single node mask which is the union of
the per-VCPU node masks.

Where allowed node masks span more than one physical node, we should allocate
memory to the guest''s virtual node by pseudo randomly striping memory
allocations (in 2MB chunks) from across the specified physical nodes. [pseudo
random is probably better than round robin]

Make sense? I can provide some worked exampled.

As regards the socket vs node terminology, I agree the variables are probably
badly named and would perhaps best be called ''node'' and
''supernode''. The key thing is that the toolstack should allow
hierarchy to be expressed when specifying CPUs (using a dotted notation) rather
than having to specify the enumerated CPU number.


Best,
Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2010-Feb-05 20:33 UTC

head link

RE: [Xen-devel] RE: Host Numa informtion in dom0

It would be good if the discussion includes how guest NUMA
works with (or is exclusive of) migration/save/restore.  Also,
the discussion should include the interaction (or exclusivity
from) the various Xen RAM utilization technologies -- tmem,
page sharing/swapping, and PoD.  Obviously it would be great
if Xen could provide both optimal affinity/performance and optimal
flexibility and resource utilization, but I suspect that will
be a VERY difficult combination.
> -----Original Message-----
> From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com]
> Sent: Friday, February 05, 2010 10:39 AM
> To: Kamble, Nitin A; xen-devel@lists.xensource.com
> Cc: Ian Pratt
> Subject: [Xen-devel] RE: Host Numa informtion in dom0
> 
> >    Attached is the patch which exposes the host numa information to
> dom0.
> > With the patch "xm info" command now also gives the cpu
topology &
> host
> > numa information. This will be later used to build guest numa
> support.
> >
> > The patch basically changes physinfo sysctl, and adds topology_info
&
> > numa_info sysctls, and also changes the python & libxc code
> accordingly.
> 
> 
> It would be good to have a discussion about how we should expose NUMA
> information to guests.
> 
> I believe we can control the desired allocation of memory from nodes
> and creation of guest NUMA tables using VCPU affinity masks combined
> with a new boolean option to enable exposure of NUMA information to
> guests.
> 
> For each guest VCPU, we should inspect its affinity mask to see which
> nodes the VCPU is able to run on, thus building a set of ''allowed
node''
> masks. We should then compare all the ''allowed node''
masks to see how
> many unique node masks there are -- this corresponds to the number of
> NUMA nodes that we wish to expose to the guest if this guest has NUMA
> enabled. We would aportion the guest''s pseudo-physical memory
equally
> between these virtual NUMA nodes.
> 
> If guest NUMA is disabled, we just use a single node mask which is the
> union of the per-VCPU node masks.
> 
> Where allowed node masks span more than one physical node, we should
> allocate memory to the guest''s virtual node by pseudo randomly
striping
> memory allocations (in 2MB chunks) from across the specified physical
> nodes. [pseudo random is probably better than round robin]
> 
> Make sense? I can provide some worked exampled.
> 
> As regards the socket vs node terminology, I agree the variables are
> probably badly named and would perhaps best be called
''node'' and
> ''supernode''. The key thing is that the toolstack should
allow hierarchy
> to be expressed when specifying CPUs (using a dotted notation) rather
> than having to specify the enumerated CPU number.
> 
> 
> Best,
> Ian
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nakajima, Jun

2010-Feb-09 22:03 UTC

head link

RE: [Xen-devel] RE: Host Numa informtion in dom0

Dan Magenheimer wrote on Fri, 5 Feb 2010 at 12:33:19:
> It would be good if the discussion includes how guest NUMA
> works with (or is exclusive of) migration/save/restore.  Also,
> the discussion should include the interaction (or exclusivity
> from) the various Xen RAM utilization technologies -- tmem,
> page sharing/swapping, and PoD.  Obviously it would be great
> if Xen could provide both optimal affinity/performance and optimal
> flexibility and resource utilization, but I suspect that will
> be a VERY difficult combination.
> 
I think migration/save/restore should be excluded at this point, to keep the
design/implementation simple; it''s a performance/scalability feature.

Jun
___
Intel Open Source Technology Center




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nakajima, Jun

2010-Feb-09 22:56 UTC

head link

[Xen-devel] RE: Host Numa informtion in dom0

Ian Pratt wrote on Fri, 5 Feb 2010 at 09:39:09:
>>    Attached is the patch which exposes the host numa information to
> dom0.
>> With the patch "xm info" command now also gives the cpu
topology & host
>> numa information. This will be later used to build guest numa support.
>> 
>> The patch basically changes physinfo sysctl, and adds topology_info
&
>> numa_info sysctls, and also changes the python & libxc code
> accordingly.
> 
> 
> It would be good to have a discussion about how we should expose NUMA
> information to guests.
> 
> I believe we can control the desired allocation of memory from nodes and
> creation of guest NUMA tables using VCPU affinity masks combined with a
> new boolean option to enable exposure of NUMA information to guests.
> 
I agree. 
> For each guest VCPU, we should inspect its affinity mask to see which
> nodes the VCPU is able to run on, thus building a set of ''allowed
node''
> masks. We should then compare all the ''allowed node''
masks to see how
> many unique node masks there are -- this corresponds to the number of
> NUMA nodes that we wish to expose to the guest if this guest has NUMA
> enabled. We would aportion the guest''s pseudo-physical memory
equally
> between these virtual NUMA nodes.
> 
Right.
> If guest NUMA is disabled, we just use a single node mask which is the
> union of the per-VCPU node masks.
> 
> Where allowed node masks span more than one physical node, we should
> allocate memory to the guest''s virtual node by pseudo randomly
striping
> memory allocations (in 2MB chunks) from across the specified physical
> nodes. [pseudo random is probably better than round robin]
Do we really want to support this? I don''t think the allowed node masks
should span more than one physical NUMA node. We also need to look at I/O
devices as well.
> 
> Make sense? I can provide some worked exampled.
> 
Examples are appreciated.

Thanks,
Jun
___
Intel Open Source Technology Center




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2010-Feb-10 03:25 UTC

head link

RE: [Xen-devel] RE: Host Numa informtion in dom0

While I am in agreement in general, my point is that we
need to avoid misleading virtualization users by somehow
making it clear that "pinning NUMA memory" to get
performance advantages results in significant losses
in flexibility.  For example, it won''t be intuitive
to users/admins that starting guest A and then starting
guest B may result in very different performance
profile for A''s applications than starting guest B
and then starting guest A.

This may be obvious for other flexibility limiters such
as PCI passthrough, but I suspect the vast majority of
users (at least outside of the HPC community) for the next
few years are not going to accept that one chunk of memory 
is *that* different from another chunk of memory.
> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Tuesday, February 09, 2010 3:03 PM
> To: Dan Magenheimer; Ian Pratt; Kamble, Nitin A; xen-
> devel@lists.xensource.com; Andre Przywara
> Subject: RE: [Xen-devel] RE: Host Numa informtion in dom0
> 
> Dan Magenheimer wrote on Fri, 5 Feb 2010 at 12:33:19:
> 
> > It would be good if the discussion includes how guest NUMA
> > works with (or is exclusive of) migration/save/restore.  Also,
> > the discussion should include the interaction (or exclusivity
> > from) the various Xen RAM utilization technologies -- tmem,
> > page sharing/swapping, and PoD.  Obviously it would be great
> > if Xen could provide both optimal affinity/performance and optimal
> > flexibility and resource utilization, but I suspect that will
> > be a VERY difficult combination.
> >
> 
> I think migration/save/restore should be excluded at this point, to
> keep the design/implementation simple; it''s a
performance/scalability
> feature.
> 
> Jun
> ___
> Intel Open Source Technology Center
> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2010-Feb-11 15:21 UTC

head link

[Xen-devel] RE: Host Numa informtion in dom0

> > If guest NUMA is disabled, we just use a single node mask which is the
> > union of the per-VCPU node masks.
> >
> > Where allowed node masks span more than one physical node, we should
> > allocate memory to the guest''s virtual node by pseudo
randomly striping
> > memory allocations (in 2MB chunks) from across the specified physical
> > nodes. [pseudo random is probably better than round robin]
> 
> Do we really want to support this? I don''t think the allowed node
masks
> should span more than one physical NUMA node. We also need to look at I/O
> devices as well.
Given that we definitely need this striping code in the case where the guest is
non NUMA, I''d be inclined to still allow it to be used even if the
guest has multiple NUMA nodes. It could come in handy where there is a hierarchy
between physical NUMA nodes, enabling for example striping to be used between a
pair of ''close'' nodes, while exposing the higher-level
topology of sets of the paired nodes to be exposed to the guest.

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-May-26 17:31 UTC

head link

Re: [Xen-devel] Host Numa informtion in dom0

I''m getting a python traceback when I try to start a VM that I tracked
down (I think) to this patch:

Traceback (most recent call last):
  File
"/usr/local/lib/python2.6/dist-packages/xen/util/xmlrpclib2.py",
line 131, in _marshaled_dispatch
    response = self._dispatch(method, params)
  File "/usr/lib/python2.6/SimpleXMLRPCServer.py", line 418, in
_dispatch
    return func(*params)
  File
"/usr/local/lib/python2.6/dist-packages/xen/xend/server/XMLRPCServer.py",
line 80, in domain_create
    info = XendDomain.instance().domain_create(config)
  File
"/usr/local/lib/python2.6/dist-packages/xen/xend/XendDomain.py",
line 982, in domain_create
    dominfo = XendDomainInfo.create(config)
  File
"/usr/local/lib/python2.6/dist-packages/xen/xend/XendDomainInfo.py",
line 106, in create
    vm.start()
  File
"/usr/local/lib/python2.6/dist-packages/xen/xend/XendDomainInfo.py",
line 470, in start
    XendTask.log_progress(0, 30, self._constructDomain)
  File "/usr/local/lib/python2.6/dist-packages/xen/xend/XendTask.py",
line 209, in log_progress
    retval = func(*args, **kwds)
  File
"/usr/local/lib/python2.6/dist-packages/xen/xend/XendDomainInfo.py",
line 2530, in _constructDomain
    balloon.free(16*1024, self) # 16MB should be plenty
  File "/usr/local/lib/python2.6/dist-packages/xen/xend/balloon.py",
line 187, in free
    nodenum = xc.numainfo()[''cpu_to_node''][cpu]
KeyError: ''cpu_to_node''


release                : 2.6.32.12
version                : #1 SMP Wed May 5 21:52:23 PDT 2010
machine                : x86_64
nr_cpus                : 16
nr_nodes               : 2
cores_per_socket       : 4
threads_per_core       : 2
cpu_mhz                : 2533
hw_caps                :
bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000
virt_caps              : hvm hvm_directio
total_memory           : 12277
free_memory            : 11629
free_cpus              : 0
xen_major              : 4
xen_minor              : 1
xen_extra              : -unstable
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : Sat May 22 06:36:41 2010 +0100 21446:93410e5e4ad8
xen_commandline        : dummy=dummy console=com1 115200,8n1
dom0_mem=512M dom0_max_vcpus=1 dom0_vcpus_pin=true
iommu=1,passthrough,no-intremap loglvl=all loglvl_guest=all loglevl=10
debug acpi=force apic=on apic_verbosity=verbose numa=on
cc_compiler            : gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
cc_compile_by          : bedge
cc_compile_domain      :
cc_compile_date        : Tue May 25 14:51:02 PDT 2010
xend_config_format     : 4

-Bruce


On Fri, Jan 29, 2010 at 4:05 PM, Kamble, Nitin A
<nitin.a.kamble@intel.com> wrote:>
> Hi Keir,
>
>    Attached is the patch which exposes the host numa information to dom0.
With the patch “xm info” command now also gives the cpu topology & host numa
information. This will be later used to build guest numa support.
>
> The patch basically changes physinfo sysctl, and adds topology_info &
numa_info sysctls, and also changes the python & libxc code accordingly.
>
>
>
> Please apply.
>
>
>
> Thanks & Regards,
>
> Nitin
>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2010 - Host Numa informtion in dom0

[Xen-devel] Host Numa informtion in dom0

Re: [Xen-devel] Host Numa informtion in dom0

RE: [Xen-devel] Host Numa informtion in dom0

Re: [Xen-devel] Host Numa informtion in dom0

Re: [Xen-devel] Host Numa informtion in dom0

Re: [Xen-devel] Host Numa informtion in dom0

RE: [Xen-devel] Host Numa informtion in dom0

[Xen-devel] RE: Host Numa informtion in dom0

RE: [Xen-devel] RE: Host Numa informtion in dom0

RE: [Xen-devel] RE: Host Numa informtion in dom0

[Xen-devel] RE: Host Numa informtion in dom0

RE: [Xen-devel] RE: Host Numa informtion in dom0

[Xen-devel] RE: Host Numa informtion in dom0

Re: [Xen-devel] Host Numa informtion in dom0