thr3ads.net - Xen devel - alloc_heap_pages is low efficient with more CPUs [Oct 2012]

If this information is useful, please help other people find it:
Share via:

tupeng212

2012-Oct-11 15:18 UTC

alloc_heap_pages is low efficient with more CPUs

I am confused with a problem:
I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one
VM with 1 CPU and 40G RAM.
For the first time I started the VM, it just took 3s, But for the second
starting it took 30s.
After studied it by printing log, I have located a place in the hypervisor where
cost too much time,
occupied 98% of the whole starting time.

xen/common/page_alloc.c 
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
    unsigned int zone_lo, unsigned int zone_hi,
    unsigned int node, unsigned int order, unsigned int memflags)
{
        if ( pg[i].u.free.need_tlbflush )
        {
            /* Add in extra CPUs that need flushing because of this page. */
            cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
            tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
            cpus_or(mask, mask, extra_cpus_mask);
        }

}

1 in the first starting, most of need_tlbflush=NULL, so cost little; in the
second one, most of RAM have been used
  thus makes need_tlbflush=true, so cost much.

2 but I repeated the same experiment in another blade which contains 16 physical
CPUs and 64G physical RAM, the second
  starting cost 3s. After I traced the process between the two second startings,
found that count entering into the judgement of
  pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the
difference.

3 The code I pasted aims to compute a mask to determine whether it should flush
CPU's TLB. I traced the values in starting period below:

cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF,
extra_cpus_mask=0xFFFFFFFFFFFFFFFF
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
//after, mask=0, extra_cpus_mask=0
cpus_or(mask, mask, extra_cpus_mask);
//after, mask=0, extra_cpus_mask=0

  every time it starts with mask=0, and ends with the same result mask=0, so
leads to flush CPU's TLB definitely,
  which seems meaningless in the staring process.

4 The problem is, this seemed meaningless code is very time-consuming, CPUs get
more, time costs more, it takes 30s here in my blade
  with 64 CPUs which may need some solution to improve the efficiency.
  



tupeng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2012-Oct-11 15:41 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

Not sure I understand. With 16 CPUs you find domain startup takes 3s always.
With 64 CPUs you find it takes 3s first time, then 30s in future? And this
is due to cost of tlbflush_filter() (not actual TLB flushes, because you
always end up with mask=0)? If tlbflush_filter() were that expensive I¹d
expect the 16-CPU case to have slowdown after the first domain startup, too.

 -- Keir

On 11/10/2012 16:18, "tupeng212" <tupeng212@gmail.com> wrote:
> I am confused with a problem:
> I have a blade with 64 physical CPUs and 64G physical RAM, and defined only
> one VM with 1 CPU and 40G RAM.
> For the first time I started the VM, it just took 3s, But for the second
> starting it took 30s.
> After studied it by printing log, I have located a place in the hypervisor
> where cost too much time,
> occupied 98% of the whole starting time.
>  
> xen/common/page_alloc.c
> /* Allocate 2^@order contiguous pages. */
> static struct page_info *alloc_heap_pages(
>     unsigned int zone_lo, unsigned int zone_hi,
>     unsigned int node, unsigned int order, unsigned int memflags)
> {
>         if ( pg[i].u.free.need_tlbflush )
>         {
>             /* Add in extra CPUs that need flushing because of this page.
*/
>             cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
>             tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
>             cpus_or(mask, mask, extra_cpus_mask);
>         }
>  
> }
>  
> 1 in the first starting, most of need_tlbflush=NULL, so cost little; in the
> second one, most of RAM have been used
>   thus makes need_tlbflush=true, so cost much.
>  
> 2 but I repeated the same experiment in another blade which contains 16
> physical CPUs and 64G physical RAM, the second
>   starting cost 3s. After I traced the process between the two second
> startings, found that count entering into the judgement of
>   pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the
> difference.
>  
> 3 The code I pasted aims to compute a mask to determine whether it should
> flush CPU''s TLB. I traced the values in starting period below:
>  
>>  
>> cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
>>  
>> //after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF,
>> extra_cpus_mask=0xFFFFFFFFFFFFFFFF
>>  
>> tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
>>  
>> //after, mask=0, extra_cpus_mask=0
>>  
>> cpus_or(mask, mask, extra_cpus_mask);
>>  
>> //after, mask=0, extra_cpus_mask=0
>  
>   every time it starts with mask=0, and ends with the same result mask=0,
so
> leads to flush CPU''s TLB definitely,
>   which seems meaningless in the staring process.
>  
> 4 The problem is, this seemed meaningless code is very time-consuming, CPUs
> get more, time costs more, it takes 30s here in my blade
>   with 64 CPUs which may need some solution to improve the efficiency.
>   
> 
> tupeng
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

tupeng212

2012-Oct-12 12:24 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

With 16 CPUs you find domain startup takes 3s always. 
:No
With 16CPUs, first 0.3s, second 3s
With 64CPUs, first 3s, second 30s.

With 64 CPUs you find it takes 3s first time, then 30s in future? 
: Yes

And this is due to cost of tlbflush_filter() (not actual TLB flushes, because
you always end up with mask=0)?
: Yes, it costs much in tlbflush_filter() in the judgement.
TLB flushing is really very fast, it just sends a IPI to related CPU.
In the starting process's allocation, it always ends up with mask=0 which
seems needless.

If tlbflush_filter() were that expensive I’d expect the 16-CPU case to have
slowdown after the first domain startup, too.
: Yes, you are right, 16CPU slows down too after its first startup.

The reason is very clear, I have discussed it with others, tlbflush_filter() is
low efficient is no doubt,
But I don't know how to improve it .

and I also used  xen oprofile to find the following two functions are called
high frequently:
alloc_heap_pages: 40%
__next_cpu : 20%
others: 0.x%
.....
alloc_heap_pages -> tlbflush_filter -> for_each_cpu_mask next_cpu  ->
__next_cpu
it seems traveling among CPUs is expensive.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Mukesh Rathor

2012-Oct-12 22:39 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On Fri, 12 Oct 2012 20:24:16 +0800
tupeng212 <tupeng212@gmail.com> wrote:
> With 16 CPUs you find domain startup takes 3s always. 
> :No
> With 16CPUs, first 0.3s, second 3s
> With 64CPUs, first 3s, second 30s.
> 
> With 64 CPUs you find it takes 3s first time, then 30s in future? 
> : Yes
...........

I had seen this a while ago, it was always page scrubbing. That
algoright was changed IIRC, so not sure if still could be the cause.

My 2 cents,
thanks
Mukesh

Keir Fraser

2012-Oct-13 06:03 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On 12/10/2012 23:39, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
> On Fri, 12 Oct 2012 20:24:16 +0800
> tupeng212 <tupeng212@gmail.com> wrote:
> 
>> With 16 CPUs you find domain startup takes 3s always.
>> :No
>> With 16CPUs, first 0.3s, second 3s
>> With 64CPUs, first 3s, second 30s.
>> 
>> With 64 CPUs you find it takes 3s first time, then 30s in future?
>> : Yes
> 
> ...........
> 
> I had seen this a while ago, it was always page scrubbing. That
> algoright was changed IIRC, so not sure if still could be the cause.
Tupeng,

If the tlbflush_filter() and cpumask_or() lines are commented out from the
if(need_tlbflush) block in alloc_heap_pages(), what are the domain creation
times like then? By the way it looks like you are not using xen-unstable or
xen-4.2, can you try with one of these later versions of Xen?

 -- Keir
> My 2 cents,
> thanks
> Mukesh

tupeng212

2012-Oct-13 06:46 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

What if you replace tlbflush_filter() call with
cpus_clear(&extra_cpus_mask)?
:  you mean just clear it,  maybe a little violent..,  you 'd like to do it
at any other place.

I assume you see lots of looping in one of those two functions, but only
single-page-at-a-time calls into alloc_domheap_pages()->alloc_heap_pages()?
:  In populate_physmap,  all pages are 2M size, 
static void populate_physmap(struct memop_args *a) 
{
    for ( i = a->nr_done; i < a->nr_extents; i++ )
    {
page = alloc_domheap_pages(d, a->extent_order, a->memflags)
->alloc_heap_pages ;  //a->extent_order = 9, always 2M size
}
//you mean move that block and TLB-flush here to avoid for loop ?
}



tupeng212

From: Keir Fraser
Date: 2012-10-13 14:30
To: tupeng212
Subject: Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs
What if you replace tlbflush_filter() call with
cpus_clear(&extra_cpus_mask)? Seems a bit silly to do, but I’d like to know
how much a few cpumask operations per page is costing (most are of course much
quicker than tlbflush_filter as they operate on 64 bits per iteration, rather
than one bit per iteration).

If that is suitably fast, I think we can have a go at fixing this by pulling the
TLB-flush logic out of alloc_heap_pages() and into the loops in
increwase_reservation() and populate_physmap() in common/memory.c. I assume you
see lots of looping in one of those two functions, but only
single-page-at-a-time calls into alloc_domheap_pages()->alloc_heap_pages()?

  -- Keir

On 13/10/2012 07:21, "tupeng212" <tupeng212@gmail.com> wrote:


If the tlbflush_filter() and cpumask_or() lines are commented out from the
if(need_tlbflush) block in alloc_heap_pages(), what are the domain creation
times like then? 
: You mean removing these code from alloc_heap_pages, then try it.
I didn't do it as you said, but I calculated the whole time of
if(need_tlbflush) block
by using time1=NOW() ...block ... time2=NOW(), time=time2-time1, its unit is ns,
and s = ns * 10^9
it occupy high rate of the whole time. whole starting time is 30s, then this
block may be 29s.
 
By the way it looks like you are not using xen-unstable or
xen-4.2, can you try with one of these later versions of Xen?
: fortunately, other groups have to use xen-4.2, we have repeated this
experiment on
that source code too, it changed nothing, time is still very long in second
starting.
 


tupeng
 
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

tupeng212

2012-Oct-13 07:20 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
    unsigned int zone_lo, unsigned int zone_hi,
    unsigned int node, unsigned int order, unsigned int memflags)
{
    cpus_clear(mask);
    for ( i = 0; i < (1 << order); i++ )
    {    
        if ( pg[i].u.free.need_tlbflush )
        {
            /* Add in extra CPUs that need flushing because of this page. */
            cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
            tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
            cpus_or(mask, mask, extra_cpus_mask);
        }
    }
    if ( unlikely(!cpus_empty(mask)) )
    {
        perfc_incr(need_flush_tlb_flush);
        flush_tlb_mask(&mask);
    }
    return pg;
}

Is this equal to : in pg[1<<order-1].u.free.need_tlbflush, if we compute
mask=0, then flush TLB mask ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2012-Oct-13 08:59 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

If the allocations are 2M size, we can do better quite easily I think.
Please try the attached patch.

 -- Keir

On 13/10/2012 07:46, "tupeng212" <tupeng212@gmail.com> wrote:
> What if you replace tlbflush_filter() call with
cpus_clear(&extra_cpus_mask)?
> :  you mean just clear it,  maybe a little violent..,  you ''d like
to do it at
> any other place.
>  
> I assume you see lots of looping in one of those two functions, but only
> single-page-at-a-time calls into
alloc_domheap_pages()->alloc_heap_pages()?
> :  In populate_physmap,  all pages are 2M size,
> static void populate_physmap(struct memop_args *a)
> {
>     for ( i = a->nr_done; i < a->nr_extents; i++ )
>     {
> page = alloc_domheap_pages(d, a->extent_order, a->memflags)
->alloc_heap_pages
> ;  //a->extent_order = 9, always 2M size
> }
> //you mean move that block and TLB-flush here to avoid for loop ?
> }
> 
> tupeng212
>  
> From: Keir Fraser <mailto:keir.xen@gmail.com>
> Date: 2012-10-13 14:30
> To: tupeng212 <mailto:tupeng212@gmail.com>
> Subject: Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs
> What if you replace tlbflush_filter() call with
cpus_clear(&extra_cpus_mask)?
> Seems a bit silly to do, but I¹d like to know how much a few cpumask
> operations per page is costing (most are of course much quicker than
> tlbflush_filter as they operate on 64 bits per iteration, rather than one
bit
> per iteration).
> 
> If that is suitably fast, I think we can have a go at fixing this by
pulling
> the TLB-flush logic out of alloc_heap_pages() and into the loops in
> increwase_reservation() and populate_physmap() in common/memory.c. I assume
> you see lots of looping in one of those two functions, but only
> single-page-at-a-time calls into
alloc_domheap_pages()->alloc_heap_pages()?
> 
>   -- Keir
> 
> On 13/10/2012 07:21, "tupeng212" <tupeng212@gmail.com>
wrote:
> 
>> If the  tlbflush_filter() and cpumask_or() lines are commented out from
the
>> if(need_tlbflush) block in alloc_heap_pages(), what are the domain 
creation
>> times like then?
>> : You mean removing these code from  alloc_heap_pages, then try it.
>> I didn''t do it as you said, but I calculated  the whole time
of
>> if(need_tlbflush) block
>> by using time1=NOW() ...block ...  time2=NOW(), time=time2-time1, its
unit is
>> ns, and s = ns * 10^9
>> it occupy  high rate of the whole time. whole starting time is 30s,
then this
>> block may  be 29s.
>>  
>> By the way it looks like you are not using xen-unstable  or
>> xen-4.2, can you try with one of these later versions of Xen?
>> :  fortunately, other groups have to use xen-4.2, we have repeated this
>> experiment on 
>> that source code too, it changed nothing, time is still very  long in
second
>> starting.
>>  
>>  
>> 
>>  tupeng
>>  
>>  
>> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2012-Oct-13 08:59 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On 13/10/2012 08:20, "tupeng212" <tupeng212@gmail.com> wrote:
> /* Allocate 2^@order contiguous pages. */
> static struct page_info *alloc_heap_pages(
>     unsigned int zone_lo, unsigned int zone_hi,
>     unsigned int node, unsigned int order, unsigned int memflags)
> {
>     cpus_clear(mask);
>     for ( i = 0; i < (1 << order); i++ )
>     {    
>         if ( pg[i].u.free.need_tlbflush )
>         {
>             /* Add in extra CPUs that need flushing because of this page.
*/
>             cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
>             tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
>             cpus_or(mask, mask, extra_cpus_mask);
>         }
>     }
>     if ( unlikely(!cpus_empty(mask)) )
>     {
>         perfc_incr(need_flush_tlb_flush);
>         flush_tlb_mask(&mask);
>     }
>     return pg;
> }
>  
> Is this equal to : in pg[1<<order-1].u.free.need_tlbflush, if we
compute
> mask=0, then flush TLB mask ?
No!

 -- Keir

tupeng212

2012-Oct-15 13:27 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

Please try the attached patch.
: Great!  you have done a good job, needless time decreases badly to 1s.

If anybody has no proposal, I suggest you to commit this patch.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Keir Fraser

2012-Oct-15 15:45 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On 15/10/2012 14:27, "tupeng212" <tupeng212@gmail.com> wrote:
> Please try the attached patch.
> : Great!  you have done a good job, needless time decreases badly to 1s.
>  
> If anybody has no proposal, I suggest you to commit this patch.
I have applied it to xen-unstable. It probably makes sense to put it in 4.1
and 4.2 as well (cc''ed Jan, and attaching the backport for 4.1 again).

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Oct-16 07:51 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

>>> On 15.10.12 at 17:45, Keir Fraser <keir@xen.org> wrote:
> On 15/10/2012 14:27, "tupeng212" <tupeng212@gmail.com>
wrote:
> 
>> Please try the attached patch.
>> : Great!  you have done a good job, needless time decreases badly to
1s.
>>  
>> If anybody has no proposal, I suggest you to commit this patch.
> 
> I have applied it to xen-unstable. It probably makes sense to put it in 4.1
> and 4.2 as well (cc''ed Jan, and attaching the backport for 4.1
again).
Will do, but do you have an explanation how this simple, memory
only operation (64 CPUs isn''t that many) has this dramatic an
effect on performance. Are we bouncing cache lines this badly? If
so, which one(s)? I don''t see what would be written frequently
from multiple CPUs here - tlbflush_filter() itself only reads global
variables, but never writes them.

Jan

Keir Fraser

2012-Oct-16 08:03 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On 16/10/2012 08:51, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 15.10.12 at 17:45, Keir Fraser <keir@xen.org> wrote:
>> On 15/10/2012 14:27, "tupeng212" <tupeng212@gmail.com>
wrote:
>> 
>>> Please try the attached patch.
>>> : Great!  you have done a good job, needless time decreases badly
to 1s.
>>>  
>>> If anybody has no proposal, I suggest you to commit this patch.
>> 
>> I have applied it to xen-unstable. It probably makes sense to put it in
4.1
>> and 4.2 as well (cc''ed Jan, and attaching the backport for 4.1
again).
> 
> Will do, but do you have an explanation how this simple, memory
> only operation (64 CPUs isn''t that many) has this dramatic an
> effect on performance. Are we bouncing cache lines this badly? If
> so, which one(s)? I don''t see what would be written frequently
> from multiple CPUs here - tlbflush_filter() itself only reads global
> variables, but never writes them.
It''s just the small factors multiplying up. A 40G domain is 10M page
allocations, each of which does 64x per-cpu cpumask bitops and timestamp
compares. That''s going on for a billion (10^9) times round
tlbflush_filter()s loop. Each iteration need only take a few CPU cycles for
the effect to actually be noticeable. If the stuff being touched were not in
the cache (which of course it is, since this path is so hot and not touching
much memory) it would probably take an hour to create the domain!

 -- Keir
> Jan
>

Jan Beulich

2012-Oct-16 08:28 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

>>> On 16.10.12 at 10:03, Keir Fraser <keir.xen@gmail.com> wrote:
> On 16/10/2012 08:51, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>>>>> On 15.10.12 at 17:45, Keir Fraser <keir@xen.org>
wrote:
>>> On 15/10/2012 14:27, "tupeng212"
<tupeng212@gmail.com> wrote:
>>> 
>>>> Please try the attached patch.
>>>> : Great!  you have done a good job, needless time decreases
badly to 1s.
>>>>  
>>>> If anybody has no proposal, I suggest you to commit this patch.
>>> 
>>> I have applied it to xen-unstable. It probably makes sense to put
it in 4.1
>>> and 4.2 as well (cc''ed Jan, and attaching the backport for
4.1 again).
>> 
>> Will do, but do you have an explanation how this simple, memory
>> only operation (64 CPUs isn''t that many) has this dramatic an
>> effect on performance. Are we bouncing cache lines this badly? If
>> so, which one(s)? I don''t see what would be written frequently
>> from multiple CPUs here - tlbflush_filter() itself only reads global
>> variables, but never writes them.
> 
> It''s just the small factors multiplying up. A 40G domain is 10M
page
> allocations, each of which does 64x per-cpu cpumask bitops and timestamp
> compares. That''s going on for a billion (10^9) times round
> tlbflush_filter()s loop. Each iteration need only take a few CPU cycles for
> the effect to actually be noticeable. If the stuff being touched were not
in
> the cache (which of course it is, since this path is so hot and not
touching
> much memory) it would probably take an hour to create the domain!
Which means that your TODO remark in the change description
is more than a nice-to-have, since
- in the fallback case using 4k allocations your change doesn''t
  win anything
- even larger domains would still suffer from this very problem
  (albeit in 4.2 we try 1G allocations first, so your change should
  have an even more visible effect there, as long as falling back
  to smaller chunks isn''t needed)

One question of course is whether, for sufficiently large allocation
requests on sufficiently large systems, trying to avoid the TLB
flush is worth it at all, considering that determining where to skip
the flushes is costing so much more than the flushes themselves.

Jan

Keir Fraser

2012-Oct-16 08:53 UTC

head link

Re: alloc_heap_pages is low efficient with more CPUs

On 16/10/2012 09:28, "Jan Beulich" <JBeulich@suse.com> wrote:
>> It''s just the small factors multiplying up. A 40G domain is
10M page
>> allocations, each of which does 64x per-cpu cpumask bitops and
timestamp
>> compares. That''s going on for a billion (10^9) times round
>> tlbflush_filter()s loop. Each iteration need only take a few CPU cycles
for
>> the effect to actually be noticeable. If the stuff being touched were
not in
>> the cache (which of course it is, since this path is so hot and not
touching
>> much memory) it would probably take an hour to create the domain!
> 
> Which means that your TODO remark in the change description
> is more than a nice-to-have, since
> - in the fallback case using 4k allocations your change doesn''t
>   win anything
> - even larger domains would still suffer from this very problem
>   (albeit in 4.2 we try 1G allocations first, so your change should
>   have an even more visible effect there, as long as falling back
>   to smaller chunks isn''t needed)
Agreed. This is a simple starting point which is also easy to backport
though, and solves the immediate observed problem.
> One question of course is whether, for sufficiently large allocation
> requests on sufficiently large systems, trying to avoid the TLB
> flush is worth it at all, considering that determining where to skip
> the flushes is costing so much more than the flushes themselves.
It might be a simpler option. I''ll see how doing it the filtering way
looks.

 -- Keir

Maybe Matching Threads

Search for more possibly parallel threads

Xen devel - Oct 2012 - alloc_heap_pages is low efficient with more CPUs

alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Re: alloc_heap_pages is low efficient with more CPUs

Maybe Matching Threads