thr3ads.net - Xen devel - [PATCH 0 of 2 RFC] Rework populate-on-demand sweeping [Jun 2012]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2012-Jun-08 11:45 UTC

[PATCH 0 of 2 RFC] Rework populate-on-demand sweeping

Rework populate-on-demand sweeping

Last summer I did some work on testing whether our PoD sweeping code
was achieving its goals: namely, never crashing unnecessairly,
minimizing boot time, and maximizing the number of superpages in the
p2m table.

This is one of the resulting patch series.  

I''m posting it to make sure that maintainers think it''s still
suitable
for inclusion in 4.2.  The patces against 4.1 have been extensively in
the XenServer testing framework and have been in use by XenServer
customers for over 9 months now.  But the p2m code has changed
extensively in that time, so one could argue that the testing doesn''t
give us the same degree of confidence in the patches against 4.2 as
against 4.1.  (On the other hand, the PoD code hasn''t changed that
much.)

I haven''t done more than compile-test it at this point, so please just
review ideas and "is this 4.2 material".  If I get positive feedback,
I''ll do more testing and re-submit.

George Dunlap

2012-Jun-08 11:45 UTC

head link

[PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

When demand-populating pages due to guest accesses, check recently populated
pages to see if we can reclaim them for the cache.  This should keep the PoD
cache filled when the start-of-day scrubber is going through.

The number 128 was chosen by experiment.  Windows does its page
scrubbing in parallel; while a small nubmer like 4 works well for
single VMs, it breaks down as multiple vcpus are scrubbing different
pages in parallel.  Increasing to 128 works well for higher numbers of
vcpus.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -909,6 +909,26 @@ p2m_pod_emergency_sweep_super(struct p2m
     p2m->pod.reclaim_super = i ? i - SUPERPAGE_PAGES : 0;
 }
 
+/* When populating a new superpage, look at recently populated superpages
+ * hoping that they''ve been zeroed.  This will snap up zeroed pages as
soon as
+ * the guest OS is done with them. */
+static void
+p2m_pod_check_last_super(struct p2m_domain *p2m, unsigned long gfn_aligned)
+{
+    unsigned long check_gfn;
+
+    ASSERT(p2m->pod.last_populated_index < POD_HISTORY_MAX);
+
+    check_gfn = p2m->pod.last_populated[p2m->pod.last_populated_index];
+
+    p2m->pod.last_populated[p2m->pod.last_populated_index] = gfn_aligned;
+
+    p2m->pod.last_populated_index = ( p2m->pod.last_populated_index + 1 )
% POD_HISTORY_MAX;
+
+    p2m->pod.reclaim_super += p2m_pod_zero_check_superpage(p2m, check_gfn);
+}
+
+
 #define POD_SWEEP_STRIDE  16
 static void
 p2m_pod_emergency_sweep(struct p2m_domain *p2m)
@@ -1066,6 +1086,12 @@ p2m_pod_demand_populate(struct p2m_domai
         __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
     }
 
+    /* Check the last guest demand-populate */
+    if ( p2m->pod.entry_count > p2m->pod.count 
+         && order == 9
+         && q & P2M_ALLOC )
+        p2m_pod_check_last_super(p2m, gfn_aligned);
+
     pod_unlock(p2m);
     return 0;
 out_of_memory:
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -287,6 +287,9 @@ struct p2m_domain {
         unsigned         reclaim_super; /* Last gpfn of a scan */
         unsigned         reclaim_single; /* Last gpfn of a scan */
         unsigned         max_guest;    /* gpfn of max guest demand-populate */
+#define POD_HISTORY_MAX 128
+        unsigned         last_populated[POD_HISTORY_MAX]; /* gpfn of last guest
page demand-populated */
+        int              last_populated_index;
         mm_lock_t        lock;         /* Locking of private pod structs,   *
                                         * not relying on the p2m lock.      */
     } pod;

George Dunlap

2012-Jun-08 11:45 UTC

head link

[PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Testing has shown that doing sweeps for superpages slows down boot
significantly, but does not result in a significantly higher number of
superpages after boot.  Early sweeping for 4k pages causes superpages
to be broken up unnecessarily.

Only sweep if we''re really out of memory.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -880,34 +880,6 @@ p2m_pod_zero_check(struct p2m_domain *p2
 }
 
 #define POD_SWEEP_LIMIT 1024
-static void
-p2m_pod_emergency_sweep_super(struct p2m_domain *p2m)
-{
-    unsigned long i, start, limit;
-
-    if ( p2m->pod.reclaim_super == 0 )
-    {
-        p2m->pod.reclaim_super =
(p2m->pod.max_guest>>PAGE_ORDER_2M)<<PAGE_ORDER_2M;
-        p2m->pod.reclaim_super -= SUPERPAGE_PAGES;
-    }
-    
-    start = p2m->pod.reclaim_super;
-    limit = (start > POD_SWEEP_LIMIT) ? (start - POD_SWEEP_LIMIT) : 0;
-
-    for ( i=p2m->pod.reclaim_super ; i > 0 ; i -= SUPERPAGE_PAGES )
-    {
-        p2m_pod_zero_check_superpage(p2m, i);
-        /* Stop if we''re past our limit and we have found *something*.
-         *
-         * NB that this is a zero-sum game; we''re increasing our cache
size
-         * by increasing our ''debt''.  Since we hold the p2m
lock,
-         * (entry_count - count) must remain the same. */
-        if ( !page_list_empty(&p2m->pod.super) &&  i < limit
)
-            break;
-    }
-
-    p2m->pod.reclaim_super = i ? i - SUPERPAGE_PAGES : 0;
-}
 
 /* When populating a new superpage, look at recently populated superpages
  * hoping that they''ve been zeroed.  This will snap up zeroed pages as
soon as
@@ -1021,34 +993,18 @@ p2m_pod_demand_populate(struct p2m_domai
         return 0;
     }
 
-    /* Once we''ve ballooned down enough that we can fill the remaining
-     * PoD entries from the cache, don''t sweep even if the particular
-     * list we want to use is empty: that can lead to thrashing zero pages 
-     * through the cache for no good reason.  */
-    if ( p2m->pod.entry_count > p2m->pod.count )
-    {
+    /* Only sweep if we''re actually out of memory.  Doing anything
else
+     * causes unnecessary time and fragmentation of superpages in the p2m. */
+    if ( p2m->pod.count == 0 )
+        p2m_pod_emergency_sweep(p2m);
 
-        /* If we''re low, start a sweep */
-        if ( order == PAGE_ORDER_2M &&
page_list_empty(&p2m->pod.super) )
-            /* Note that sweeps scan other ranges in the p2m. In an scenario
-             * in which p2m locks are fine-grained, this may result in
deadlock.
-             * Using trylock on the gfn''s as we sweep would avoid it.
*/
-            p2m_pod_emergency_sweep_super(p2m);
-
-        if ( page_list_empty(&p2m->pod.single) &&
-             ( ( order == PAGE_ORDER_4K )
-               || (order == PAGE_ORDER_2M &&
page_list_empty(&p2m->pod.super) ) ) )
-            /* Same comment regarding deadlock applies */
-            p2m_pod_emergency_sweep(p2m);
-    }
+    if ( p2m->pod.count == 0 )
+        goto out_of_memory;
 
     /* Keep track of the highest gfn demand-populated by a guest fault */
     if ( gfn > p2m->pod.max_guest )
         p2m->pod.max_guest = gfn;
 
-    if ( p2m->pod.count == 0 )
-        goto out_of_memory;
-
     /* Get a page f/ the cache.  A NULL return value indicates that the
      * 2-meg range should be marked singleton PoD, and retried */
     if ( (p = p2m_pod_cache_get(p2m, order)) == NULL )

Jan Beulich

2012-Jun-08 12:02 UTC

head link

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

>>> On 08.06.12 at 13:45, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> When demand-populating pages due to guest accesses, check recently
populated
> pages to see if we can reclaim them for the cache.  This should keep the
PoD
> cache filled when the start-of-day scrubber is going through.
> 
> The number 128 was chosen by experiment.  Windows does its page
> scrubbing in parallel; while a small nubmer like 4 works well for
> single VMs, it breaks down as multiple vcpus are scrubbing different
> pages in parallel.  Increasing to 128 works well for higher numbers of
> vcpus.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -909,6 +909,26 @@ p2m_pod_emergency_sweep_super(struct p2m
>      p2m->pod.reclaim_super = i ? i - SUPERPAGE_PAGES : 0;
>  }
>  
> +/* When populating a new superpage, look at recently populated superpages
> + * hoping that they''ve been zeroed.  This will snap up zeroed
pages as soon as
> + * the guest OS is done with them. */
> +static void
> +p2m_pod_check_last_super(struct p2m_domain *p2m, unsigned long
gfn_aligned)
> +{
> +    unsigned long check_gfn;
> +
> +    ASSERT(p2m->pod.last_populated_index < POD_HISTORY_MAX);
> +
> +    check_gfn =
p2m->pod.last_populated[p2m->pod.last_populated_index];
> +
> +    p2m->pod.last_populated[p2m->pod.last_populated_index] =
gfn_aligned;
> +
> +    p2m->pod.last_populated_index = ( p2m->pod.last_populated_index
+ 1 ) % POD_HISTORY_MAX;
> +
> +    p2m->pod.reclaim_super += p2m_pod_zero_check_superpage(p2m,
check_gfn);
> +}
> +
> +
>  #define POD_SWEEP_STRIDE  16
>  static void
>  p2m_pod_emergency_sweep(struct p2m_domain *p2m)
> @@ -1066,6 +1086,12 @@ p2m_pod_demand_populate(struct p2m_domai
>          __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
>      }
>  
> +    /* Check the last guest demand-populate */
> +    if ( p2m->pod.entry_count > p2m->pod.count 
> +         && order == 9
> +         && q & P2M_ALLOC )
> +        p2m_pod_check_last_super(p2m, gfn_aligned);
> +
>      pod_unlock(p2m);
>      return 0;
>  out_of_memory:
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -287,6 +287,9 @@ struct p2m_domain {
>          unsigned         reclaim_super; /* Last gpfn of a scan */
>          unsigned         reclaim_single; /* Last gpfn of a scan */
>          unsigned         max_guest;    /* gpfn of max guest
demand-populate */
> +#define POD_HISTORY_MAX 128
> +        unsigned         last_populated[POD_HISTORY_MAX]; /* gpfn of last
guest page demand-populated */
unsigned long?

Also, wouldn''t it be better to allocate this table dynamically, at
once allowing its size to scale with the number of vCPU-s in the
guest?
> +        int              last_populated_index;
''unsigned int'' is generally better suited for array indexes
(and
definitely on x86-64).

Jan
>          mm_lock_t        lock;         /* Locking of private pod structs,
> *
>                                          * not relying on the p2m lock.
> */
>      } pod;
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel

Tim Deegan

2012-Jun-14 09:07 UTC

head link

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

At 13:02 +0100 on 08 Jun (1339160536), Jan Beulich
wrote:> >>> On 08.06.12 at 13:45, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> > --- a/xen/include/asm-x86/p2m.h
> > +++ b/xen/include/asm-x86/p2m.h
> > @@ -287,6 +287,9 @@ struct p2m_domain {
> >          unsigned         reclaim_super; /* Last gpfn of a scan */
> >          unsigned         reclaim_single; /* Last gpfn of a scan */
> >          unsigned         max_guest;    /* gpfn of max guest
demand-populate */
> > +#define POD_HISTORY_MAX 128
> > +        unsigned         last_populated[POD_HISTORY_MAX]; /* gpfn of
last guest page demand-populated */
This is the gpfns of the last 128 order-9 superpages populated, right?
Also, this line is >80 columns - I think I saw a few others in this series. 
> unsigned long?
> 
> Also, wouldn''t it be better to allocate this table dynamically, at
> once allowing its size to scale with the number of vCPU-s in the
> guest?
You could even make it a small per-vcpu array, assuming that the parallel 
scrubbing will be symmetric across vcpus.

Cheers,

Tim.

Tim Deegan

2012-Jun-14 09:11 UTC

head link

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

At 11:45 +0000 on 08 Jun (1339155933), George Dunlap
wrote:> +    if ( p2m->pod.count == 0 )
> +        goto out_of_memory;
>  
>      /* Keep track of the highest gfn demand-populated by a guest fault */
>      if ( gfn > p2m->pod.max_guest )
>          p2m->pod.max_guest = gfn;
>  
> -    if ( p2m->pod.count == 0 )
> -        goto out_of_memory;
> -
Is this code motion just noise?  Since out_of_memory crasheds the guest,
I assume so.

Cheers,

Tim.

Tim Deegan

2012-Jun-14 09:12 UTC

head link

Re: [PATCH 0 of 2 RFC] Rework populate-on-demand sweeping

At 11:45 +0000 on 08 Jun (1339155931), George Dunlap
wrote:> I haven''t done more than compile-test it at this point, so please
just
> review ideas and "is this 4.2 material".  If I get positive
feedback,
> I''ll do more testing and re-submit.
Yes, AFAIC this is 4.2 material. 

Tim.

George Dunlap

2012-Jun-14 12:42 UTC

head link

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

On Thu, Jun 14, 2012 at 10:11 AM, Tim Deegan <tim@xen.org>
wrote:> At 11:45 +0000 on 08 Jun (1339155933), George Dunlap wrote:
>> +    if ( p2m->pod.count == 0 )
>> +        goto out_of_memory;
>>
>>      /* Keep track of the highest gfn demand-populated by a guest fault
*/
>>      if ( gfn > p2m->pod.max_guest )
>>          p2m->pod.max_guest = gfn;
>>
>> -    if ( p2m->pod.count == 0 )
>> -        goto out_of_memory;
>> -
>
> Is this code motion just noise?  Since out_of_memory crasheds the guest,
> I assume so.
It will have no practical effect, other than improving the efficiency
of crashing a guest by a few cycles, if that''s what you''re
asking.
It''s more about taste: it just seemed silly to make an assignment and
*then* see if you were going to crash. :-)

 -George

Tim Deegan

2012-Jun-14 13:13 UTC

head link

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

At 13:42 +0100 on 14 Jun (1339681375), George Dunlap
wrote:> On Thu, Jun 14, 2012 at 10:11 AM, Tim Deegan <tim@xen.org> wrote:
> > At 11:45 +0000 on 08 Jun (1339155933), George Dunlap wrote:
> >> +    if ( p2m->pod.count == 0 )
> >> +        goto out_of_memory;
> >>
> >>      /* Keep track of the highest gfn demand-populated by a guest
fault */
> >>      if ( gfn > p2m->pod.max_guest )
> >>          p2m->pod.max_guest = gfn;
> >>
> >> -    if ( p2m->pod.count == 0 )
> >> -        goto out_of_memory;
> >> -
> >
> > Is this code motion just noise?  Since out_of_memory crasheds the
guest,
> > I assume so.
> 
> It will have no practical effect, other than improving the efficiency
> of crashing a guest by a few cycles, if that''s what
you''re asking.
> It''s more about taste: it just seemed silly to make an assignment
and
> *then* see if you were going to crash. :-)
Well, yes. :)  I don''t think it quite falls under the description of
the
patch, though.  Can you split it out as a cleanup patch?

Cheers,

Tim.

George Dunlap

2012-Jun-14 13:32 UTC

head link

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

[re-including xen-devel in the cc]]

On Thu, Jun 14, 2012 at 2:13 PM, Tim Deegan <tim@xen.org>
wrote:> At 13:42 +0100 on 14 Jun (1339681375), George Dunlap wrote:
>> On Thu, Jun 14, 2012 at 10:11 AM, Tim Deegan <tim@xen.org> wrote:
>> > At 11:45 +0000 on 08 Jun (1339155933), George Dunlap wrote:
>> >> +    if ( p2m->pod.count == 0 )
>> >> +        goto out_of_memory;
>> >>
>> >>      /* Keep track of the highest gfn demand-populated by a
guest fault */
>> >>      if ( gfn > p2m->pod.max_guest )
>> >>          p2m->pod.max_guest = gfn;
>> >>
>> >> -    if ( p2m->pod.count == 0 )
>> >> -        goto out_of_memory;
>> >> -
>> >
>> > Is this code motion just noise?  Since out_of_memory crasheds the
guest,
>> > I assume so.
>>
>> It will have no practical effect, other than improving the efficiency
>> of crashing a guest by a few cycles, if that''s what
you''re asking.
>> It''s more about taste: it just seemed silly to make an
assignment and
>> *then* see if you were going to crash. :-)
>
> Well, yes. :)  I don''t think it quite falls under the description
of the
> patch, though.  Can you split it out as a cleanup patch?
If that''s what you meant, you should have said so. :-)  I shall do so
for the next series.

 -George

George Dunlap

2012-Jun-14 14:24 UTC

head link

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

On 14/06/12 10:07, Tim Deegan wrote:>
>
> At 13:02 +0100 on 08 Jun (1339160536), Jan Beulich wrote:
>>>>> On 08.06.12 at 13:45, George
Dunlap<george.dunlap@eu.citrix.com>  wrote:
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -287,6 +287,9 @@ struct p2m_domain {
>>>           unsigned         reclaim_super; /* Last gpfn of a scan */
>>>           unsigned         reclaim_single; /* Last gpfn of a scan
*/
>>>           unsigned         max_guest;    /* gpfn of max guest
demand-populate */
>>> +#define POD_HISTORY_MAX 128
>>> +        unsigned         last_populated[POD_HISTORY_MAX]; /* gpfn
of last guest page demand-populated */
> This is the gpfns of the last 128 order-9 superpages populated, right?
Ah, yes -- just order 9.> Also, this line is>80 columns - I think I saw a few others in this
series.
I''ll go through and check, thanks.>
>> unsigned long?
>>
>> Also, wouldn''t it be better to allocate this table
dynamically, at
>> once allowing its size to scale with the number of vCPU-s in the
>> guest?
> You could even make it a small per-vcpu array, assuming that the parallel
> scrubbing will be symmetric across vcpus.I can''t remember exactly what I found here (this was last summer I was 
doing the tests); it may be that Windows creates a bunch of tasks which 
may migrate to various cpus.  If that were the case, a global list would 
be better than per-vcpu lists.

The problem with dynamically scaling the list is that I don''t have a 
heuristic to hand for how to scale it.

In both cases, it''s not unlikely that making a change without testing 
will significantly reduce the effectiveness of the patch.  Would you 
rather hold off and wait until I can get a chance to run my benchmarks 
again (which may miss the 4.2 cycle), or accept a tidied-up version of 
this patch first, and hope to get a revised method (using dynamic 
scaling or per-vcpu arrays) in before 4.2, but for sure by 4.3?

  -George

Tim Deegan

2012-Jun-14 15:36 UTC

head link

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

At 15:24 +0100 on 14 Jun (1339687486), George Dunlap
wrote:> >>Also, wouldn''t it be better to allocate this table
dynamically, at
> >>once allowing its size to scale with the number of vCPU-s in the
> >>guest?
> >You could even make it a small per-vcpu array, assuming that the
parallel
> >scrubbing will be symmetric across vcpus.
> I can''t remember exactly what I found here (this was last summer I
was
> doing the tests); it may be that Windows creates a bunch of tasks which 
> may migrate to various cpus.  If that were the case, a global list would 
> be better than per-vcpu lists.
> 
> The problem with dynamically scaling the list is that I don''t have
a
> heuristic to hand for how to scale it.
> 
> In both cases, it''s not unlikely that making a change without
testing
> will significantly reduce the effectiveness of the patch.  Would you 
> rather hold off and wait until I can get a chance to run my benchmarks 
> again (which may miss the 4.2 cycle), or accept a tidied-up version of 
> this patch first, and hope to get a revised method (using dynamic 
> scaling or per-vcpu arrays) in before 4.2, but for sure by 4.3?
I''d be happy with the fixed size for 4.2.

Tim.

Xen devel - Jun 2012 - [PATCH 0 of 2 RFC] Rework populate-on-demand sweeping

[PATCH 0 of 2 RFC] Rework populate-on-demand sweeping

[PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

[PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Re: [PATCH 0 of 2 RFC] Rework populate-on-demand sweeping

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Re: [PATCH 2 of 2 RFC] xen, pod: Only sweep in an emergency, and only for 4k pages

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)

Re: [PATCH 1 of 2 RFC] xen, pod: Zero-check recently populated pages (checklast)