Andre Przywara
2007-Aug-13 10:02 UTC
[Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
Signed-off-by: Andre Przywara <andre.przywara@amd.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-13 10:30 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
On 13/8/07 11:02, "Andre Przywara" <andre.przywara@amd.com> wrote:> @@ -35,6 +35,7 @@ > #define XENMEM_increase_reservation 0 > #define XENMEM_decrease_reservation 1 > #define XENMEM_populate_physmap 6 > +#define XENMEM_DEFAULT_CPU ((unsigned int)-1) > struct xen_memory_reservation { > > /* > @@ -66,6 +67,7 @@ struct xen_memory_reservation { > * Unprivileged domains can specify only DOMID_SELF. > */ > domid_t domid; > + unsigned int cpu; > };We cannot change the size of existing hypercall structures. In this case we could steal bits from address_bits field and create a pair of 16-bit fields from it. Also, a physical cpu id is not a great fit for this hypercall -- it is meaningless to most guests who do not see the physical cpu map. Better to pass a vcpu_id and let Xen work out the most appropriate physical cpu id based on the vcpu''s affinity. Or have a concept of per-guest ''virtual node identifiers'' and pass a ''uint16_t vnodeid''. The latter might actually be a nice abstraction -- it''d be good to know other people''s thoughts on this? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2007-Aug-13 12:59 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
On Monday 13 August 2007 12:30:15 Keir Fraser wrote:> On 13/8/07 11:02, "Andre Przywara" <andre.przywara@amd.com> wrote: > > @@ -35,6 +35,7 @@ > > #define XENMEM_increase_reservation 0 > > #define XENMEM_decrease_reservation 1 > > #define XENMEM_populate_physmap 6 > > +#define XENMEM_DEFAULT_CPU ((unsigned int)-1) > > struct xen_memory_reservation { > > > > /* > > @@ -66,6 +67,7 @@ struct xen_memory_reservation { > > * Unprivileged domains can specify only DOMID_SELF. > > */ > > domid_t domid; > > + unsigned int cpu; > > }; > > We cannot change the size of existing hypercall structures.Except Xen bumps major version number to 4 ? :-) You are worrying about PV guests that lag behind with syncing pulic headers such as NetBSD/Xen ?> In this case we could steal bits from address_bits field and create a pair > of 16-bit fields from it. Also, a physical cpu id is not a great fit for > this hypercall -- it is meaningless to most guests who do not see the > physical cpu map. > Better to pass a vcpu_id and let Xen work out the most appropriate physical > cpu id based on the vcpu''s affinity. Or have a concept of per-guest > ''virtual node identifiers'' and pass a ''uint16_t vnodeid''. The latter might > actually be a nice abstraction -- it''d be good to know other people''s > thoughts on this?Making struct xen_machphys_mapping NUMA-aware is also a no-go, right? It would additionally need a min_mfn and a vnodeid member. Oh, and how should the guest query how many vnode''s exist? -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Isaku Yamahata
2007-Aug-13 14:00 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
On Mon, Aug 13, 2007 at 02:59:31PM +0200, Christoph Egger wrote:> > In this case we could steal bits from address_bits field and create a pair > > of 16-bit fields from it. Also, a physical cpu id is not a great fit for > > this hypercall -- it is meaningless to most guests who do not see the > > physical cpu map. > > Better to pass a vcpu_id and let Xen work out the most appropriate physical > > cpu id based on the vcpu''s affinity. Or have a concept of per-guest > > ''virtual node identifiers'' and pass a ''uint16_t vnodeid''. The latter might > > actually be a nice abstraction -- it''d be good to know other people''s > > thoughts on this? > > Making struct xen_machphys_mapping NUMA-aware is also a no-go, right? > It would additionally need a min_mfn and a vnodeid member. > > Oh, and how should the guest query how many vnode''s exist?Domain save/restore/dump-core also want to know those infomations. Probably One approach is to introduce hypercalls or to store those in xenstore. Another approach would be to introduce magic pages like start_info, and embed it as reserved pages. -- yamahata _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-13 14:06 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
On 13/8/07 13:59, "Christoph Egger" <Christoph.Egger@amd.com> wrote:>> We cannot change the size of existing hypercall structures. > > Except Xen bumps major version number to 4 ? :-) > > You are worrying about PV guests that lag behind with syncing > pulic headers such as NetBSD/Xen ?It''s not merely an API issue, it''s an ABI compatibility issue. Existing guests will provide structures that are too small (and thus have trailing garbage, or potentially even cross over into an unmapped page causing copy_from_guest() to fail). Also this particular structure is included inside others (like struct xen_memory_exchange) and will change all the field offsets. Not good.> Making struct xen_machphys_mapping NUMA-aware is also a no-go, right? > It would additionally need a min_mfn and a vnodeid member.Actually I think it can stay as is. Guests are supposed to be robust against unmapped holes in the m2p table. So we can continue to have one big virtual address range covering all valid MFNs. This is only going to fail if virtual address space is scarce compared with machine address space (e.g., we kind of run up against this in a mild way with x86 PAE).> Oh, and how should the guest query how many vnode''s exist?I think we should add topology discovery hypercalls. Xen needs to know this stuff anyway, so we just provide a mechanism for guests to extract it. An alternative is to start exporting virtual ACPI tables to PV guests. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2007-Aug-13 20:49 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
* Keir Fraser <keir@xensource.com> [2007-08-13 09:08]:> On 13/8/07 13:59, "Christoph Egger" <Christoph.Egger@amd.com> wrote: > > >> We cannot change the size of existing hypercall structures. > > > > Except Xen bumps major version number to 4 ? :-) > > > > You are worrying about PV guests that lag behind with syncing > > pulic headers such as NetBSD/Xen ? > > It''s not merely an API issue, it''s an ABI compatibility issue. Existing > guests will provide structures that are too small (and thus have trailing > garbage, or potentially even cross over into an unmapped page causing > copy_from_guest() to fail). Also this particular structure is included > inside others (like struct xen_memory_exchange) and will change all the > field offsets. Not good. > > > Making struct xen_machphys_mapping NUMA-aware is also a no-go, right? > > It would additionally need a min_mfn and a vnodeid member. > > Actually I think it can stay as is. Guests are supposed to be robust against > unmapped holes in the m2p table. So we can continue to have one big virtual > address range covering all valid MFNs. This is only going to fail if virtual > address space is scarce compared with machine address space (e.g., we kind > of run up against this in a mild way with x86 PAE). > > > Oh, and how should the guest query how many vnode''s exist? > > I think we should add topology discovery hypercalls. Xen needs to know this > stuff anyway, so we just provide a mechanism for guests to extract it. An > alternative is to start exporting virtual ACPI tables to PV guests.One concern has been the static nature of the ACPI SRAT data versus the dynamic ability of the vcpu to cpu mapping. If the scheduler is migrating the guest vcpu to various cpus, then the SRAT information is likely to be incorrect. That said, if one creates a vnode, and it sufficiently restricts the vcpu affinity, then accurate SRAT information can be exported for the guest to utilize. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2007-Aug-15 10:12 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
Ryan Harper wrote:> One concern has been the static nature of the ACPI SRAT data versus the > dynamic ability of the vcpu to cpu mapping. If the scheduler is > migrating the guest vcpu to various cpus, then the SRAT information is > likely to be incorrect.I think this is a problem even for the native OSes when you think of CPU- and/or memory-hotplugging. Although Linux can do CPU hotplugging, AFAIK NUMA isn''t currently considered in this process. I think the most feasible approach would be to rebuild all affected structures when the hotplug event occurs. This will probably considered quite rare and thus could be potentially more costly, so I this is not something you want to do every time Xen decides to reschedule a VCPU. So IMHO pinning VCPUs to a certain node (actually all cores within this node) is OK for now.> > That said, if one creates a vnode, and it sufficiently restricts the > vcpu affinity, then accurate SRAT information can be exported for the > guest to utilize.My patch does this automatically. CPU affinity information from the config file is ignored and each VCPUs affinity is set to match the NUMA topology. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2007-Aug-15 10:13 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
Keir Fraser wrote:> I think we should add topology discovery hypercalls. Xen needs to know this > stuff anyway, so we just provide a mechanism for guests to extract it. An > alternative is to start exporting virtual ACPI tables to PV guests.I will look at this next. The HVM approach seemed to be easier from my POV, but this NUMA propagation is also benefical for PV guests. Maybe one should solve the whole NUMA-ballooning issue while looking at this. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-15 10:43 UTC
Re: [Xen-devel] [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
On 15/8/07 11:13, "Andre Przywara" <andre.przywara@amd.com> wrote:>> I think we should add topology discovery hypercalls. Xen needs to know this >> stuff anyway, so we just provide a mechanism for guests to extract it. An >> alternative is to start exporting virtual ACPI tables to PV guests. > I will look at this next. The HVM approach seemed to be easier from my > POV, but this NUMA propagation is also benefical for PV guests. Maybe > one should solve the whole NUMA-ballooning issue while looking at this.Topology discovery hypercalls are the way to go imo. And to fix ballooking it''s just going to have to become numa-aware, by hooking into whatever numa apis Linux provides internally. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andi Kleen
2007-Aug-15 11:18 UTC
[Xen-devel] Re: [PATCH 2/4] [HVM] introduce CPU affinity for allocate_physmap call
"Andre Przywara" <andre.przywara@amd.com> writes:> Ryan Harper wrote: > > One concern has been the static nature of the ACPI SRAT data versus the > > dynamic ability of the vcpu to cpu mapping. If the scheduler is > > migrating the guest vcpu to various cpus, then the SRAT information is > > likely to be incorrect. > I think this is a problem even for the native OSes when you think of > CPU- and/or memory-hotplugging. Although Linux can do CPU hotplugging, > AFAIK NUMA isn''t currently considered in this process.IA64 (and I think PPC) Linux support node hotplug. Node hot unplug is currently missing because the memory hotunplug support is not finished yet. There is no interface to notify NUMA aware user space of topology changes though. x86 Linux currently doesn''t but will assign new CPUs to existing nodes as reported in SRAT.> I think the > most feasible approach would be to rebuild all affected structures > when the hotplug event occurs. This will probably considered quite > rare and thus could be potentially more costly, so I this is not > something you want to do every time Xen decides to reschedule a > VCPU.In the current Linux implementation just report all nodes at boot up (even if they have little or no memory) and then you can add/remove CPUs to them as needed. When you migrate to another box with more nodes that likely won''t work, but that could be probably made configurable. -Andi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel