thr3ads.net - Xen devel - [Xen-devel] [PATCH] scrub pages on guest termination [May 2008]

If this information is useful, please help other people find it:
Share via:

Ben Guthro

2008-May-23 15:00 UTC

[Xen-devel] [PATCH] scrub pages on guest termination

This patch solves the following problem.  When a large VS terminates, the node
locks
up. The node locks up because the page_scrub_kick routine sends a softirq to
all processors instructing them to run the page scrub code.  There they
interfere
with each other as they serialize behind the page_scrub_lock.

The patch does two things:

(1) In page_scrub_kick, only a single cpu is interrupted.  Some cpu other than
the calling cpu is chosen (if available) because we assume the calling cpu
has other higher priority work to do.

(2) In page_scrub_softirq, if more than one cpu is online, the first cpu
to start scrubbing designates itself as the primary_scrubber.  As such
it is dedicated to scrubbing pages until the list is empty.  Other cpus
might call page_scrub_softirq but they spend only 1 msec scrubbing before
returning to check for other higher priority work.  But, with multiple cpus
online, the node can afford to have one cpu dedicated to scrubbing when
that work needs to be done.

Signed-off-by: Robert Phillips <rphillips@virtualiron.com>
Signed-off-by: Ben Guthro <bguthro@virtualiron.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-May-23 16:04 UTC

head link

Re: [Xen-devel] [PATCH] scrub pages on guest termination

The aim of the loop was to scrub enough pages in a batch that lock
contention is kept tolerably low. Even if 16 pages is not sufficient for
that, I¹m surprised a node¹ (you mean a whole system, presumably?) would
appear to lock up. Maybe pages would be scrubbed slower than we¹d like, but
still CPUs should be able to get the spinlock often enough to evaluate
whether they have spent 1ms in the loop and hence get out of there.

What sort of system were you seeing the lockup on? Does it have very many
physical CPUs?

 -- Keir

On 23/5/08 16:00, "Ben Guthro" <bguthro@virtualiron.com> wrote:
> This patch solves the following problem.  When a large VS terminates, the
node
> locks
> up. The node locks up because the page_scrub_kick routine sends a softirq
to
> all processors instructing them to run the page scrub code.  There they
> interfere
> with each other as they serialize behind the page_scrub_lock.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ben Guthro

2008-May-23 17:01 UTC

head link

Re: [Xen-devel] [PATCH] scrub pages on guest termination

Yes, sorry -  should have removed our terminology from the description.
Node=physical machine
VS=HVM guest w/ pv-on-hvm drivers
Looking back at the original bug report - it seems to indicate it was 
migrating from a system with 2 processors to one with 8

Specifcally - from
Dell Precision WorkStation 380
Processor:    Intel(R) Pentium(R) D CPU 2.80GHz
# of CPUs:    2
Speed:    2.8GHz

to

Supermicro X7DB8
Processor:    Genuine Intel(R) CPU @ 2.13GHz
# of CPUs:    8
Speed:    2.133 GHz




Keir Fraser wrote:> The aim of the loop was to scrub enough pages in a batch that lock 
> contention is kept tolerably low. Even if 16 pages is not sufficient 
> for that, I''m surprised a ''node'' (you mean a
whole system,
> presumably?) would appear to lock up. Maybe pages would be scrubbed 
> slower than we''d like, but still CPUs should be able to get the 
> spinlock often enough to evaluate whether they have spent 1ms in the 
> loop and hence get out of there.
>
> What sort of system were you seeing the lockup on? Does it have very 
> many physical CPUs?
>
>  -- Keir
>
> On 23/5/08 16:00, "Ben Guthro" <bguthro@virtualiron.com>
wrote:
>
>     This patch solves the following problem.  When a large VS
>     terminates, the node locks
>     up. The node locks up because the page_scrub_kick routine sends a
>     softirq to
>     all processors instructing them to run the page scrub code.  There
>     they interfere
>     with each other as they serialize behind the page_scrub_lock.
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-May-23 17:19 UTC

head link

Re: [Xen-devel] [PATCH] scrub pages on guest termination

On 23/5/08 18:01, "Ben Guthro" <bguthro@virtualiron.com> wrote:
> Yes, sorry -  should have removed our terminology from the description.
> Node=physical machine
> VS=HVM guest w/ pv-on-hvm drivers
> Looking back at the original bug report - it seems to indicate it was
> migrating from a system with 2 processors to one with 8
It¹s very surprising that lock contention would cause such a severe lack of
progress on an 8-CPU system. If the lock is that hotly contended then even
the usage of it in free_domheap_pages() has to be questionable.

I¹m inclined to say that if we want to address this then we should do it in
one or more of the following ways:
 1. Count CPUs into the scrub function with an atomic_t and beyond a limit
all other CPUs bail straight out after re-setting their timer.
 2. Increase scrub batch size to reduce proportion of time that each loop
iteration holds the lock.
 3. Turn the spin_lock() into a spin_trylock() so that the timeout check can
be guaranteed to execute frequently.
 4. Eliminate the global lock by building a lock-free linked list, or by
maintaining per-CPU hashed work queues with work stealing, or... etc.

The patch as-is at least suffers from the issue that the primary scrubber¹
should be regularly checking for softirq work. But I¹m not sure such a
sizeable change to the scheduling policy for scrubbing (such as it is!) is
necessary or desirable.

Option 4 is on the morally highest ground but is of course the most work.
:-)

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2008 - [PATCH] scrub pages on guest termination

[Xen-devel] [PATCH] scrub pages on guest termination

Re: [Xen-devel] [PATCH] scrub pages on guest termination

Re: [Xen-devel] [PATCH] scrub pages on guest termination

Re: [Xen-devel] [PATCH] scrub pages on guest termination