thr3ads.net - Xen devel - [Xen-devel] RFC: large system support

If this information is useful, please help other people find it:
Share via:

Bill Burns

2008-Aug-12 18:41 UTC

[Xen-devel] RFC: large system support - 128 CPUs

There are a couple of issues with building the Hypervisor
with max_phys_cpus=128 for x86_64. (Note that this was
on a 3.1 base, but unstable appears to have
the same issue, at least with the first part).

First is a build assertion due to the size of
the page_info structure and the shadow_page_info
structures get out of sync due to the presence
of cpumask_t in the page info structure.

A possible fix is to tack on the following to
the end of shadow_page_info structure:

--- xen/arch/x86/mm/shadow/private.h.orig	2007-12-06 12:48:38.000000000 -0500
+++ xen/arch/x86/mm/shadow/private.h	2008-08-12 12:52:49.000000000 -0400
@@ -243,6 +243,12 @@ struct shadow_page_info
         /* For non-pinnable shadows, a higher entry that points at us */
         paddr_t up;
     };
+#if NR_CPUS > 64
+    /* Need to add some padding to match struct page_info size,
+    * if cpumask_t is larger than a long
+    */
+    u8 padding[sizeof(cpumask_t)-sizeof(long)];
+#endif
 };

 /* The structure above *must* be the same size as a struct page_info



The other issue is at runtime with a fault when
trying to bring up cpu 126. Seems the GDT space
reserved is not quite enough to hold the per
cpu entries. Crude fix (awaiting test results,
so not sure that this is sufficient.):

--- xen/include/asm-x86/desc.h.orig	2007-12-06 12:48:39.000000000 -0500
+++ xen/include/asm-x86/desc.h	2008-07-31 13:19:52.000000000 -0400
@@ -5,7 +5,11 @@
  * Xen reserves a memory page of GDT entries.
  * No guest GDT entries exist beyond the Xen reserved area.
  */
+#if MAX_PHYS_CPUS > 64
+#define NR_RESERVED_GDT_PAGES   2
+#else
 #define NR_RESERVED_GDT_PAGES   1
+#endif
 #define NR_RESERVED_GDT_BYTES   (NR_RESERVED_GDT_PAGES * PAGE_SIZE)
 #define NR_RESERVED_GDT_ENTRIES (NR_RESERVED_GDT_BYTES / 8)


Bill



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2008-Aug-13 08:21 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

Both seem to be hacks to get to 128 CPUs, without consideration of how
to go beyond that, or perhaps even drop the fixed (compile-time) limit
altogether. Since we have to expect to be run on larger systems not too
far into the future, I think it rather needs to be explored how to address
these issues (and any potential others) in a fully scalable way.

Jan
>>> Bill Burns <bburns@redhat.com> 12.08.08 20:41 >>>
There are a couple of issues with building the Hypervisor
with max_phys_cpus=128 for x86_64. (Note that this was
on a 3.1 base, but unstable appears to have
the same issue, at least with the first part).

First is a build assertion due to the size of
the page_info structure and the shadow_page_info
structures get out of sync due to the presence
of cpumask_t in the page info structure.

A possible fix is to tack on the following to
the end of shadow_page_info structure:

--- xen/arch/x86/mm/shadow/private.h.orig	2007-12-06 12:48:38.000000000 -0500
+++ xen/arch/x86/mm/shadow/private.h	2008-08-12 12:52:49.000000000 -0400
@@ -243,6 +243,12 @@ struct shadow_page_info
         /* For non-pinnable shadows, a higher entry that points at us */
         paddr_t up;
     };
+#if NR_CPUS > 64
+    /* Need to add some padding to match struct page_info size,
+    * if cpumask_t is larger than a long
+    */
+    u8 padding[sizeof(cpumask_t)-sizeof(long)];
+#endif
 };

 /* The structure above *must* be the same size as a struct page_info



The other issue is at runtime with a fault when
trying to bring up cpu 126. Seems the GDT space
reserved is not quite enough to hold the per
cpu entries. Crude fix (awaiting test results,
so not sure that this is sufficient.):

--- xen/include/asm-x86/desc.h.orig	2007-12-06 12:48:39.000000000 -0500
+++ xen/include/asm-x86/desc.h	2008-07-31 13:19:52.000000000 -0400
@@ -5,7 +5,11 @@
  * Xen reserves a memory page of GDT entries.
  * No guest GDT entries exist beyond the Xen reserved area.
  */
+#if MAX_PHYS_CPUS > 64
+#define NR_RESERVED_GDT_PAGES   2
+#else
 #define NR_RESERVED_GDT_PAGES   1
+#endif
 #define NR_RESERVED_GDT_BYTES   (NR_RESERVED_GDT_PAGES * PAGE_SIZE)
 #define NR_RESERVED_GDT_ENTRIES (NR_RESERVED_GDT_BYTES / 8)


Bill



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com 
http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2008-Aug-13 08:21 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

At 14:41 -0400 on 12 Aug (1218552070), Bill Burns wrote:> First is a build assertion due to the size of
> the page_info structure and the shadow_page_info
> structures get out of sync due to the presence
> of cpumask_t in the page info structure.
> 
> A possible fix is to tack on the following to
> the end of shadow_page_info structure:
Yep, that''ll sort it out fine.  I don''t think the #if is even
needed
because a cpumask is always at least the size of a long.  Or you could
add a "cpumask_t _unused;" to the union with mbz in it since
that''s
where the sizes get out of sync.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2008-Aug-13 08:22 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

At 09:21 +0100 on 13 Aug (1218619274), Jan Beulich
wrote:> Both seem to be hacks to get to 128 CPUs, without consideration of how
> to go beyond that
I think the shadow_page_info one is a general fix for my implicit
assumption that sizeof(cpumask_t) == sizeof (long).

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-13 08:26 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

On 13/8/08 09:22, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
> At 09:21 +0100 on 13 Aug (1218619274), Jan Beulich wrote:
>> Both seem to be hacks to get to 128 CPUs, without consideration of how
>> to go beyond that
> 
> I think the shadow_page_info one is a general fix for my implicit
> assumption that sizeof(cpumask_t) == sizeof (long).
Do some fields after the cpumask need to line up in both structures? Placing
a dummy cpumask in the shadow_page structure might make most sense.

For the other one I''ll have to think a bit. The need for GDT entries
per CPU
currently obviously means scaling much past a few hundred CPUs is going to
be difficult.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2008-Aug-13 08:45 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

>>> Keir Fraser <keir.fraser@eu.citrix.com> 13.08.08 10:26
>>>
>On 13/8/08 09:22, "Tim Deegan" <Tim.Deegan@citrix.com>
wrote:
>
>> At 09:21 +0100 on 13 Aug (1218619274), Jan Beulich wrote:
>>> Both seem to be hacks to get to 128 CPUs, without consideration of
how
>>> to go beyond that
>> 
>> I think the shadow_page_info one is a general fix for my implicit
>> assumption that sizeof(cpumask_t) == sizeof (long).
>
>Do some fields after the cpumask need to line up in both structures? Placing
>a dummy cpumask in the shadow_page structure might make most sense.
>
>For the other one I''ll have to think a bit. The need for GDT
entries per CPU
>currently obviously means scaling much past a few hundred CPUs is going to
>be difficult.
But the cpumask-in-page_info is a scalability concern, too - systems with
many CPUs will tend to have a lot of memory, and the growing overhead
of the page_info array may become an issue then, too. Page clustering
may be an option to reduce/eliminate the growth, though I didn''t spend
much thought on this or possible alternatives.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-13 08:47 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

On 13/8/08 09:45, "Jan Beulich" <jbeulich@novell.com> wrote:
> But the cpumask-in-page_info is a scalability concern, too - systems with
> many CPUs will tend to have a lot of memory, and the growing overhead
> of the page_info array may become an issue then, too. Page clustering
> may be an option to reduce/eliminate the growth, though I didn''t
spend
> much thought on this or possible alternatives.
An extra 8 bytes per page per 64 CPUs is hardly a concern I think.
We''re
talking an overhead of 32 bytes per megabyte per CPU. The concern over
growing page_info array with growing memory is fallacious -- the overhead is
a constant fraction of total memory, if #CPUs is held constant.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-13 08:52 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

On 13/8/08 09:47, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
> On 13/8/08 09:45, "Jan Beulich" <jbeulich@novell.com>
wrote:
> 
>> But the cpumask-in-page_info is a scalability concern, too - systems
with
>> many CPUs will tend to have a lot of memory, and the growing overhead
>> of the page_info array may become an issue then, too. Page clustering
>> may be an option to reduce/eliminate the growth, though I
didn''t spend
>> much thought on this or possible alternatives.
> 
> An extra 8 bytes per page per 64 CPUs is hardly a concern I think.
We''re
> talking an overhead of 32 bytes per megabyte per CPU.
Put another way, at 512 CPUs the cpumasks would incur an overhead of <2% of
total memory. It''s only really beyond that threshold that I''d
be concerned.
The fact is it''ll be a good while before 512 CPUs is concerning us, and
we''ll have plenty of other scalability concerns, no doubt, by that
point.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bill Burns

2008-Aug-13 10:23 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

Keir Fraser wrote:> On 13/8/08 09:22, "Tim Deegan" <Tim.Deegan@citrix.com>
wrote:
> 
>> At 09:21 +0100 on 13 Aug (1218619274), Jan Beulich wrote:
>>> Both seem to be hacks to get to 128 CPUs, without consideration of
how
>>> to go beyond that
>> I think the shadow_page_info one is a general fix for my implicit
>> assumption that sizeof(cpumask_t) == sizeof (long).
> 
> Do some fields after the cpumask need to line up in both structures?
Placing
> a dummy cpumask in the shadow_page structure might make most sense.
Yes, there is a check that a field of page_info and a
field of the shadow_page_info are at the same offset.
Both compile time checks are in private.h
> 
> For the other one I''ll have to think a bit. The need for GDT
entries per CPU
> currently obviously means scaling much past a few hundred CPUs is going to
> be difficult.
Yes, would like something better here. And as I said, we
don''t know yet that just adding the additional page solves
anything.

 Bill

> 
>  -- Keir
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-13 10:25 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

On 13/8/08 11:23, "Bill Burns" <bburns@redhat.com> wrote:
>> For the other one I''ll have to think a bit. The need for GDT
entries per CPU
>> currently obviously means scaling much past a few hundred CPUs is going
to
>> be difficult.
> 
> Yes, would like something better here. And as I said, we
> don''t know yet that just adding the additional page solves
> anything.
How many CPUs do you currently need/want to support?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bill Burns

2008-Aug-13 10:53 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

Keir Fraser wrote:> On 13/8/08 11:23, "Bill Burns" <bburns@redhat.com> wrote:
> 
>>> For the other one I''ll have to think a bit. The need for
GDT entries per CPU
>>> currently obviously means scaling much past a few hundred CPUs is
going to
>>> be difficult.
>> Yes, would like something better here. And as I said, we
>> don''t know yet that just adding the additional page solves
>> anything.
> 
> How many CPUs do you currently need/want to support?
> 
Currently just looking to get 128 working.
But would be nice to have some proper sizing,
or even detection of running out. There is a
''last'' GDT entry or some such #define, that is
never used (at least in the 3.1 code base).

 Bill

>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-13 11:15 UTC

head link

Re: [Xen-devel] RFC: large system support - 128 CPUs

On 13/8/08 11:53, "Bill Burns" <bburns@redhat.com> wrote:
>> How many CPUs do you currently need/want to support?
>> 
> 
> Currently just looking to get 128 working.
> But would be nice to have some proper sizing,
> or even detection of running out. There is a
> ''last'' GDT entry or some such #define, that is
> never used (at least in the 3.1 code base).
I think your ''two pages'' change will probably work. Then we
just need a
run-time check when bringing a CPU online that there is space in the GDT for
its entries.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2008 - RFC: large system support - 128 CPUs

[Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs

Re: [Xen-devel] RFC: large system support - 128 CPUs