thr3ads.net - Xen devel - [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate

If this information is useful, please help other people find it:
Share via:

Andre Przywara

2007-Aug-13 10:02 UTC

[Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Signed-off-by: Andre Przywara <andre.przywara@amd.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Aug-13 10:30 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

On 13/8/07 11:02, "Andre Przywara" <andre.przywara@amd.com>
wrote:
> @@ -35,6 +35,7 @@
>  #define XENMEM_increase_reservation 0
>  #define XENMEM_decrease_reservation 1
>  #define XENMEM_populate_physmap     6
> +#define XENMEM_DEFAULT_CPU ((unsigned int)-1)
>  struct xen_memory_reservation {
>  
>      /*
> @@ -66,6 +67,7 @@ struct xen_memory_reservation {
>       * Unprivileged domains can specify only DOMID_SELF.
>       */
>      domid_t        domid;
> +    unsigned int   cpu;
>  };
We cannot change the size of existing hypercall structures. In this case we
could steal bits from address_bits field and create a pair of 16-bit fields
from it. Also, a physical cpu id is not a great fit for this hypercall -- it
is meaningless to most guests who do not see the physical cpu map. Better to
pass a vcpu_id and let Xen work out the most appropriate physical cpu id
based on the vcpu''s affinity. Or have a concept of per-guest
''virtual node
identifiers'' and pass a ''uint16_t vnodeid''. The
latter might actually be a
nice abstraction -- it''d be good to know other people''s
thoughts on this?

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Aug-13 12:59 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

On Monday 13 August 2007 12:30:15 Keir Fraser wrote:> On 13/8/07 11:02, "Andre Przywara" <andre.przywara@amd.com>
wrote:
> > @@ -35,6 +35,7 @@
> >  #define XENMEM_increase_reservation 0
> >  #define XENMEM_decrease_reservation 1
> >  #define XENMEM_populate_physmap     6
> > +#define XENMEM_DEFAULT_CPU ((unsigned int)-1)
> >  struct xen_memory_reservation {
> >
> >      /*
> > @@ -66,6 +67,7 @@ struct xen_memory_reservation {
> >       * Unprivileged domains can specify only DOMID_SELF.
> >       */
> >      domid_t        domid;
> > +    unsigned int   cpu;
> >  };
>
> We cannot change the size of existing hypercall structures.
Except Xen bumps major version number to 4 ? :-)

You are worrying about PV guests that lag behind with syncing
pulic headers such as NetBSD/Xen ?
> In this case we  could steal bits from address_bits field and create a pair
> of 16-bit fields from it. Also, a physical cpu id is not a great fit for
> this hypercall --  it is meaningless to most guests who do not see the
> physical cpu map. 
> Better to pass a vcpu_id and let Xen work out the most appropriate physical
> cpu id based on the vcpu''s affinity. Or have a concept of
per-guest
> ''virtual node identifiers'' and pass a ''uint16_t
vnodeid''. The latter might
> actually be a nice abstraction -- it''d be good to know other
people''s
> thoughts on this?
Making struct xen_machphys_mapping NUMA-aware is also a no-go, right?
It would additionally need a min_mfn and a vnodeid member.

Oh, and how should the guest query how many vnode''s exist?


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Isaku Yamahata

2007-Aug-13 14:00 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

On Mon, Aug 13, 2007 at 02:59:31PM +0200, Christoph Egger
wrote:> > In this case we  could steal bits from address_bits field and create a
pair
> > of 16-bit fields from it. Also, a physical cpu id is not a great fit
for
> > this hypercall --  it is meaningless to most guests who do not see the
> > physical cpu map. 
> > Better to pass a vcpu_id and let Xen work out the most appropriate
physical
> > cpu id based on the vcpu''s affinity. Or have a concept of
per-guest
> > ''virtual node identifiers'' and pass a
''uint16_t vnodeid''. The latter might
> > actually be a nice abstraction -- it''d be good to know other
people''s
> > thoughts on this?
> 
> Making struct xen_machphys_mapping NUMA-aware is also a no-go, right?
> It would additionally need a min_mfn and a vnodeid member.
> 
> Oh, and how should the guest query how many vnode''s exist?
Domain save/restore/dump-core also want to know those infomations.
Probably One approach is to introduce hypercalls or to store those
in xenstore. Another approach would be to introduce magic pages like
start_info, and embed it as reserved pages.
-- 
yamahata

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Aug-13 14:06 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

On 13/8/07 13:59, "Christoph Egger" <Christoph.Egger@amd.com>
wrote:
>> We cannot change the size of existing hypercall structures.
> 
> Except Xen bumps major version number to 4 ? :-)
> 
> You are worrying about PV guests that lag behind with syncing
> pulic headers such as NetBSD/Xen ?
It''s not merely an API issue, it''s an ABI compatibility issue.
Existing
guests will provide structures that are too small (and thus have trailing
garbage, or potentially even cross over into an unmapped page causing
copy_from_guest() to fail). Also this particular structure is included
inside others (like struct xen_memory_exchange) and will change all the
field offsets. Not good.
> Making struct xen_machphys_mapping NUMA-aware is also a no-go, right?
> It would additionally need a min_mfn and a vnodeid member.
Actually I think it can stay as is. Guests are supposed to be robust against
unmapped holes in the m2p table. So we can continue to have one big virtual
address range covering all valid MFNs. This is only going to fail if virtual
address space is scarce compared with machine address space (e.g., we kind
of run up against this in a mild way with x86 PAE).
> Oh, and how should the guest query how many vnode''s exist?
I think we should add topology discovery hypercalls. Xen needs to know this
stuff anyway, so we just provide a mechanism for guests to extract it. An
alternative is to start exporting virtual ACPI tables to PV guests.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ryan Harper

2007-Aug-13 20:49 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

* Keir Fraser <keir@xensource.com> [2007-08-13
09:08]:> On 13/8/07 13:59, "Christoph Egger"
<Christoph.Egger@amd.com> wrote:
> 
> >> We cannot change the size of existing hypercall structures.
> > 
> > Except Xen bumps major version number to 4 ? :-)
> > 
> > You are worrying about PV guests that lag behind with syncing
> > pulic headers such as NetBSD/Xen ?
> 
> It''s not merely an API issue, it''s an ABI compatibility
issue. Existing
> guests will provide structures that are too small (and thus have trailing
> garbage, or potentially even cross over into an unmapped page causing
> copy_from_guest() to fail). Also this particular structure is included
> inside others (like struct xen_memory_exchange) and will change all the
> field offsets. Not good.
> 
> > Making struct xen_machphys_mapping NUMA-aware is also a no-go, right?
> > It would additionally need a min_mfn and a vnodeid member.
> 
> Actually I think it can stay as is. Guests are supposed to be robust
against
> unmapped holes in the m2p table. So we can continue to have one big virtual
> address range covering all valid MFNs. This is only going to fail if
virtual
> address space is scarce compared with machine address space (e.g., we kind
> of run up against this in a mild way with x86 PAE).
> 
> > Oh, and how should the guest query how many vnode''s exist?
> 
> I think we should add topology discovery hypercalls. Xen needs to know this
> stuff anyway, so we just provide a mechanism for guests to extract it. An
> alternative is to start exporting virtual ACPI tables to PV guests.
One concern has been the static nature of the ACPI SRAT data versus the
dynamic ability of the vcpu to cpu mapping.  If the scheduler is
migrating the guest vcpu to various cpus, then the SRAT information is
likely to be incorrect.

That said, if one creates a vnode, and it sufficiently restricts the
vcpu affinity, then accurate SRAT information can be exported for the
guest to utilize.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2007-Aug-15 10:12 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Ryan Harper wrote:> One concern has been the static nature of the ACPI SRAT data versus the
> dynamic ability of the vcpu to cpu mapping.  If the scheduler is
> migrating the guest vcpu to various cpus, then the SRAT information is
> likely to be incorrect.I think this is a problem even for the native OSes when you think of 
CPU- and/or memory-hotplugging. Although Linux can do CPU hotplugging, 
AFAIK NUMA isn''t currently considered in this process. I think the most
feasible approach would be to rebuild all affected structures when the 
hotplug event occurs. This will probably considered quite rare and thus 
could be potentially more costly, so I this is not something you want to 
do every time Xen decides to reschedule a VCPU. So IMHO pinning VCPUs to 
a certain node (actually all cores within this node) is OK for
now.> 
> That said, if one creates a vnode, and it sufficiently restricts the
> vcpu affinity, then accurate SRAT information can be exported for the
> guest to utilize.My patch does this automatically. CPU affinity information from the 
config file is ignored and each VCPUs affinity is set to match the NUMA 
topology.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, 
Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, 
Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2007-Aug-15 10:13 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Keir Fraser wrote:> I think we should add topology discovery hypercalls. Xen needs to know this
> stuff anyway, so we just provide a mechanism for guests to extract it. An
> alternative is to start exporting virtual ACPI tables to PV guests.I will look at this next. The HVM approach seemed to be easier from my 
POV, but this NUMA propagation is also benefical for PV guests. Maybe 
one should solve the whole NUMA-ballooning issue while looking at this.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, 
Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, 
Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Aug-15 10:43 UTC

head link

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

On 15/8/07 11:13, "Andre Przywara" <andre.przywara@amd.com>
wrote:
>> I think we should add topology discovery hypercalls. Xen needs to know
this
>> stuff anyway, so we just provide a mechanism for guests to extract it.
An
>> alternative is to start exporting virtual ACPI tables to PV guests.
> I will look at this next. The HVM approach seemed to be easier from my
> POV, but this NUMA propagation is also benefical for PV guests. Maybe
> one should solve the whole NUMA-ballooning issue while looking at this.
Topology discovery hypercalls are the way to go imo. And to fix ballooking
it''s just going to have to become numa-aware, by hooking into whatever
numa
apis Linux provides internally.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andi Kleen

2007-Aug-15 11:18 UTC

head link

[Xen-devel] Re: [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

"Andre Przywara" <andre.przywara@amd.com> writes:
> Ryan Harper wrote:
> > One concern has been the static nature of the ACPI SRAT data versus
the
> > dynamic ability of the vcpu to cpu mapping.  If the scheduler is
> > migrating the guest vcpu to various cpus, then the SRAT information is
> > likely to be incorrect.
> I think this is a problem even for the native OSes when you think of
> CPU- and/or memory-hotplugging. Although Linux can do CPU hotplugging,
> AFAIK NUMA isn''t currently considered in this process. 
IA64 (and I think PPC) Linux support node hotplug. Node hot unplug
is currently missing because the memory hotunplug support is not finished
yet. There is no interface to notify NUMA aware user space of topology
changes though.

x86 Linux currently doesn''t but will assign new CPUs to existing
nodes as reported in SRAT.
> I think the
> most feasible approach would be to rebuild all affected structures
> when the hotplug event occurs. This will probably considered quite
> rare and thus could be potentially more costly, so I this is not
> something you want to do every time Xen decides to reschedule a
> VCPU. 
In the current Linux implementation just report all nodes at boot up
(even if they have little or no memory) and then you can add/remove CPUs to 
them as needed. 

When you migrate to another box with more nodes that likely won''t work,
but that could be probably made configurable.

-Andi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2007 - [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

[Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call

[Xen-devel] Re: [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call