thr3ads.net - Xen devel - [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2011-Oct-20 13:36 UTC

[Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

This patch set makes some first steps towards eliminating the old cpumask
accessors, replacing them by such that don''t require the full NR_CPUS
bits to be allocated (which obviously can be pretty wasteful when
NR_CPUS is high, but the actual number is low or moderate).

01: introduce and use nr_cpu_ids and nr_cpumask_bits
02: eliminate cpumask accessors referencing NR_CPUS
03: eliminate direct assignments of CPU masks
04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
05: allocate CPU sibling and core maps dynamically
06: allow efficient allocation of multiple CPU masks at once

One reason I put the following ones together was to reduce the
differences between the disassembly of hypervisors built for 4095
and 2047 CPUs, which I looked at to determine the places where
cpumask_t variables get copied without using cpumask_copy() (a
job where grep is of no help). Hence consider these patch optional,
but recommended.

07: cpufreq: allocate CPU masks dynamically
08: x86/p2m: allocate CPU masks dynamically
09: cpupools: allocate CPU masks dynamically
10: credit: allocate CPU masks dynamically
11: x86/hpet: allocate CPU masks dynamically
12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps
dynamically

Signed-off-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-20 15:09 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
> This patch set makes some first steps towards eliminating the old cpumask
> accessors, replacing them by such that don''t require the full
NR_CPUS
> bits to be allocated (which obviously can be pretty wasteful when
> NR_CPUS is high, but the actual number is low or moderate).
> 
> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
> 02: eliminate cpumask accessors referencing NR_CPUS
> 03: eliminate direct assignments of CPU masks
> 04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
> 05: allocate CPU sibling and core maps dynamically
I''m not sure about this. We can save ~500 bytes per cpumask_t when
NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t''s do we
typically
have dynamically allocated all at once? Let''s say we waste 2kB per VCPU
and
per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs --
we''d
save ~4MB in that extreme case. But such a large system probably actually
will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
insignificant.

I suppose there is a second argument that it shrinks the containing
structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
helps reduce our order!=0 allocations?

By the way, I think we could avoid the NR_CPUS copying overhead everywhere
by having the cpumask.h functions respect nr_cpu_ids, but continuing to
return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
would not need to change tonnes of code. It only gets part of the benefit
(reducing cpu time overhead) but is more palatable?
> 06: allow efficient allocation of multiple CPU masks at once
That is utterly hideous and for insignificant saving.
> One reason I put the following ones together was to reduce the
> differences between the disassembly of hypervisors built for 4095
> and 2047 CPUs, which I looked at to determine the places where
> cpumask_t variables get copied without using cpumask_copy() (a
> job where grep is of no help). Hence consider these patch optional,
> but recommended.
> 
> 07: cpufreq: allocate CPU masks dynamically
> 08: x86/p2m: allocate CPU masks dynamically
> 09: cpupools: allocate CPU masks dynamically
> 10: credit: allocate CPU masks dynamically
> 11: x86/hpet: allocate CPU masks dynamically
> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps
dynamically
Questionable. Any subsystem that allocates no more than a handful of
cpumask_t''s is possibly just as well left alone... I''m not
dead set against
them if we deicde that 01-05 are actually worth pursuing, however.

 -- Keir
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-20 15:19 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>> This patch set makes some first steps towards eliminating the old
cpumask
>> accessors, replacing them by such that don''t require the full
NR_CPUS
>> bits to be allocated (which obviously can be pretty wasteful when
>> NR_CPUS is high, but the actual number is low or moderate).
>> 
>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>> 02: eliminate cpumask accessors referencing NR_CPUS
>> 03: eliminate direct assignments of CPU masks
>> 04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
>> 05: allocate CPU sibling and core maps dynamically
> 
> I''m not sure about this. We can save ~500 bytes per cpumask_t when
> NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t''s do
we typically
> have dynamically allocated all at once? Let''s say we waste 2kB per
VCPU and
> per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs --
we''d
> save ~4MB in that extreme case. But such a large system probably actually
> will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
> insignificant.
It''s not only the memory savings, but the time savings in manipulating
less space.
> I suppose there is a second argument that it shrinks the containing
> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
> helps reduce our order!=0 allocations?
Yes - that''s what made me start taking over these Linux bits. What I
sent here just continues on that route. I was really hoping that we
wouldn''t leave this in a half baked state.
> By the way, I think we could avoid the NR_CPUS copying overhead everywhere
> by having the cpumask.h functions respect nr_cpu_ids, but continuing to
> return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)?
This
> would not need to change tonnes of code. It only gets part of the benefit
> (reducing cpu time overhead) but is more palatable?
That would be possible, but would again leave is in a somewhat
incomplete state. (Note that I did leave NR_CPUS in the stop-
machine logic).
>> 06: allow efficient allocation of multiple CPU masks at once
> 
> That is utterly hideous and for insignificant saving.
I was afraid you would say that, and I''m not fully convinced
either. But I wanted to give it a try to see how bad it is. The
more significant saving here really comes from not allocating
the CPU masks at all for unused irq_desc-s.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-20 15:25 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 20.10.11 at 17:19, "Jan Beulich"
<JBeulich@suse.com> wrote:
>>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com>
wrote:
>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> 
> Yes - that''s what made me start taking over these Linux bits. What
I
> sent here just continues on that route. I was really hoping that we
> wouldn''t leave this in a half baked state.
I forgot to say: The new accessors produce a little more legible code
with cpumask_var_t and cpumask_t * (the second we converted to
already wherever possible) than with cpumask_t, because of the need
to use & on the latter (and as you will see they also allow removing a
number of ugly looking * operators, which I always considered only
temporary).

Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Cooper

2011-Oct-20 15:49 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 20/10/11 16:19, Jan Beulich wrote:>>>> On 20.10.11 at 17:09, Keir Fraser <keir.xen@gmail.com>
wrote:
>> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com>
wrote:
>>
>>> This patch set makes some first steps towards eliminating the old
cpumask
>>> accessors, replacing them by such that don''t require the
full NR_CPUS
>>> bits to be allocated (which obviously can be pretty wasteful when
>>> NR_CPUS is high, but the actual number is low or moderate).
>>>
>>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>>> 02: eliminate cpumask accessors referencing NR_CPUS
>>> 03: eliminate direct assignments of CPU masks
>>> 04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
>>> 05: allocate CPU sibling and core maps dynamically
>> I''m not sure about this. We can save ~500 bytes per cpumask_t
when
>> NR_CPUS=4096 and actual nr_cpus<64. But how many
cpumask_t''s do we typically
>> have dynamically allocated all at once? Let''s say we waste 2kB
per VCPU and
>> per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs --
we''d
>> save ~4MB in that extreme case. But such a large system probably
actually
>> will have a lot of CPUs. And also a lot of memory, such that 4MB is
quite
>> insignificant.
> It''s not only the memory savings, but the time savings in
manipulating
> less space.
>
>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> Yes - that''s what made me start taking over these Linux bits. What
I
> sent here just continues on that route. I was really hoping that we
> wouldn''t leave this in a half baked state.
>
>> By the way, I think we could avoid the NR_CPUS copying overhead
everywhere
>> by having the cpumask.h functions respect nr_cpu_ids, but continuing to
>> return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)?
This
>> would not need to change tonnes of code. It only gets part of the
benefit
>> (reducing cpu time overhead) but is more palatable?
> That would be possible, but would again leave is in a somewhat
> incomplete state. (Note that I did leave NR_CPUS in the stop-
> machine logic).
>
>>> 06: allow efficient allocation of multiple CPU masks at once
>> That is utterly hideous and for insignificant saving.
> I was afraid you would say that, and I''m not fully convinced
> either. But I wanted to give it a try to see how bad it is. The
> more significant saving here really comes from not allocating
> the CPU masks at all for unused irq_desc-s.
>
> Jan
The saving of not allocating masks for unused irq_desc''s (and
irq_cfg''s)
will be significant in the general case.  (3 * NR_UNUSED_IRQs *
sizeof(mask)) where the average system is wasting most of 224 IRQs per CPU.

However, I am against moving the masks out of irq_desc (perhaps this is
the C++ coder inside me).

Would an acceptable alternative be to change irq_desc to use
cpumask_var_t''s and allocate them on first use? (I have not spent long
thinking about this, so it is possible that the extra checks for Null
pointers on the irq path might be counter productive?)

~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-20 15:57 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 20.10.11 at 17:49, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> The saving of not allocating masks for unused irq_desc''s (and
irq_cfg''s)
> will be significant in the general case.  (3 * NR_UNUSED_IRQs *
> sizeof(mask)) where the average system is wasting most of 224 IRQs per CPU.
> 
> However, I am against moving the masks out of irq_desc (perhaps this is
> the C++ coder inside me).
I didn''t propose moving them out of there.
> Would an acceptable alternative be to change irq_desc to use
> cpumask_var_t''s and allocate them on first use? (I have not spent
long
> thinking about this, so it is possible that the extra checks for Null
> pointers on the irq path might be counter productive?)
That''s what it does (or really it allocates them for all GSIs at boot,
and
then in create_irq() for the dynamically allocated ones). There''s no
reason to worry about NULL checks - any access to a never initialized
irq_desc is a bug anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-20 16:04 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:
>>> 06: allow efficient allocation of multiple CPU masks at once
>> 
>> That is utterly hideous and for insignificant saving.
> 
> I was afraid you would say that, and I''m not fully convinced
> either. But I wanted to give it a try to see how bad it is. The
> more significant saving here really comes from not allocating
> the CPU masks at all for unused irq_desc-s.
Aren''t we planning to dynamically allocate irq_desc-s? That would seem
the
nicer solution here.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-20 16:06 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com> wrote:
>> I suppose there is a second argument that it shrinks the containing
>> structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
>> helps reduce our order!=0 allocations?
> 
> Yes - that''s what made me start taking over these Linux bits. What
I
> sent here just continues on that route. I was really hoping that we
> wouldn''t leave this in a half baked state.
Has Linux gone wholesale down this route, then?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-21 07:01 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 20.10.11 at 18:06, Keir Fraser <keir@xen.org> wrote:
> On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>>> I suppose there is a second argument that it shrinks the containing
>>> structures (struct domain, struct vcpu, struct irq_desc, ...) and
maybe
>>> helps reduce our order!=0 allocations?
>> 
>> Yes - that''s what made me start taking over these Linux bits.
What I
>> sent here just continues on that route. I was really hoping that we
>> wouldn''t leave this in a half baked state.
> 
> Has Linux gone wholesale down this route, then?
Yes - the old ones are left in there is something forces
CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS (which
we can avoid altogether).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-21 07:08 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com> wrote:
> This patch set makes some first steps towards eliminating the old cpumask
> accessors, replacing them by such that don''t require the full
NR_CPUS
> bits to be allocated (which obviously can be pretty wasteful when
> NR_CPUS is high, but the actual number is low or moderate).
> 
> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
> 02: eliminate cpumask accessors referencing NR_CPUS
> 03: eliminate direct assignments of CPU masks
> 04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
> 05: allocate CPU sibling and core maps dynamically
01-05/07-12:
Acked-by: Keir Fraser <keir@xen.org>
> 06: allow efficient allocation of multiple CPU masks at once
Not this one.

 -- Keir
> One reason I put the following ones together was to reduce the
> differences between the disassembly of hypervisors built for 4095
> and 2047 CPUs, which I looked at to determine the places where
> cpumask_t variables get copied without using cpumask_copy() (a
> job where grep is of no help). Hence consider these patch optional,
> but recommended.
> 
> 07: cpufreq: allocate CPU masks dynamically
> 08: x86/p2m: allocate CPU masks dynamically
> 09: cpupools: allocate CPU masks dynamically
> 10: credit: allocate CPU masks dynamically
> 11: x86/hpet: allocate CPU masks dynamically
> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps
dynamically
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-21 07:10 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 20.10.11 at 18:04, Keir Fraser <keir@xen.org> wrote:
> On 20/10/2011 16:19, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>>>> 06: allow efficient allocation of multiple CPU masks at once
>>> 
>>> That is utterly hideous and for insignificant saving.
>> 
>> I was afraid you would say that, and I''m not fully convinced
>> either. But I wanted to give it a try to see how bad it is. The
>> more significant saving here really comes from not allocating
>> the CPU masks at all for unused irq_desc-s.
> 
> Aren''t we planning to dynamically allocate irq_desc-s? That would
seem the
> nicer solution here.
Yes, I would hope so. But irrespective of that, allocating e.g. 512 bits
(times 4) just to use, say, 20-30 of them is bad - again, not so much
from a memory wasting pov, but rather from the fact that this
needlessly causes a larger cache and TLB footprint.

I actually think that ultimately we should try to remove all
non-dynamically allocated CPU masks (including statics, per-CPU
ones, and local variables - the latter being particularly important as
they cause pretty big stack frames, despite there now being at
most one [with the rare exception of two] of them per function,
which will continue to grow with higher NR_CPUS values).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-21 07:14 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 21.10.11 at 09:08, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>> This patch set makes some first steps towards eliminating the old
cpumask
>> accessors, replacing them by such that don''t require the full
NR_CPUS
>> bits to be allocated (which obviously can be pretty wasteful when
>> NR_CPUS is high, but the actual number is low or moderate).
>> 
>> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
>> 02: eliminate cpumask accessors referencing NR_CPUS
>> 03: eliminate direct assignments of CPU masks
>> 04: x86: allocate IRQ actions'' cpu_eoi_map dynamically
>> 05: allocate CPU sibling and core maps dynamically
> 
> 01-05/07-12:
> Acked-by: Keir Fraser <keir@xen.org>
> 
>> 06: allow efficient allocation of multiple CPU masks at once
> 
> Not this one.
Okay, I''ll re-work (and re-send) it then to use non-array allocations.

Jan
>> One reason I put the following ones together was to reduce the
>> differences between the disassembly of hypervisors built for 4095
>> and 2047 CPUs, which I looked at to determine the places where
>> cpumask_t variables get copied without using cpumask_copy() (a
>> job where grep is of no help). Hence consider these patch optional,
>> but recommended.
>> 
>> 07: cpufreq: allocate CPU masks dynamically
>> 08: x86/p2m: allocate CPU masks dynamically
>> 09: cpupools: allocate CPU masks dynamically
>> 10: credit: allocate CPU masks dynamically
>> 11: x86/hpet: allocate CPU masks dynamically
>> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps
dynamically
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com 
>> http://lists.xensource.com/xen-devel 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-21 07:49 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 21.10.11 at 09:08, Keir Fraser <keir.xen@gmail.com> wrote:
> On 20/10/2011 14:36, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 01-05/07-12:
> Acked-by: Keir Fraser <keir@xen.org>
>
>> 07: cpufreq: allocate CPU masks dynamically
I didn''t apply this one, as it wants
http://lists.xensource.com/archives/html/xen-devel/2011-10/msg01180.html
to be applied first, but there was no reaction from AMD so far. Mark?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-21 08:00 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

On 21/10/2011 08:10, "Jan Beulich" <JBeulich@suse.com> wrote:
>> Aren''t we planning to dynamically allocate irq_desc-s? That
would seem the
>> nicer solution here.
> 
> Yes, I would hope so. But irrespective of that, allocating e.g. 512 bits
> (times 4) just to use, say, 20-30 of them is bad - again, not so much
> from a memory wasting pov, but rather from the fact that this
> needlessly causes a larger cache and TLB footprint.
Oh, I don''t mind you changing irq_desc to use cpumask_var_t-s.
It''s the
extra step of using an array to save a few pointers, that I reject.
> I actually think that ultimately we should try to remove all
> non-dynamically allocated CPU masks (including statics, per-CPU
> ones, and local variables - the latter being particularly important as
> they cause pretty big stack frames, despite there now being at
> most one [with the rare exception of two] of them per function,
> which will continue to grow with higher NR_CPUS values).
OTOH they are probably in some cases used in contexts where dynamic
allocation (and possibility of failure) is not nice? And we can compare this
effort with just increasing per-cpu stack sizes, for example?

I''m not particularly against moving in this direction, but there might
be
cases where it isn''t worth the effort, or there are better solutions.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Oct-21 08:23 UTC

head link

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

>>> On 21.10.11 at 10:00, Keir Fraser <keir.xen@gmail.com> wrote:
> On 21/10/2011 08:10, "Jan Beulich" <JBeulich@suse.com>
wrote:
> 
>>> Aren''t we planning to dynamically allocate irq_desc-s?
That would seem the
>>> nicer solution here.
>> 
>> Yes, I would hope so. But irrespective of that, allocating e.g. 512
bits
>> (times 4) just to use, say, 20-30 of them is bad - again, not so much
>> from a memory wasting pov, but rather from the fact that this
>> needlessly causes a larger cache and TLB footprint.
> 
> Oh, I don''t mind you changing irq_desc to use cpumask_var_t-s.
It''s the
> extra step of using an array to save a few pointers, that I reject.
That''s what I understood.
>> I actually think that ultimately we should try to remove all
>> non-dynamically allocated CPU masks (including statics, per-CPU
>> ones, and local variables - the latter being particularly important as
>> they cause pretty big stack frames, despite there now being at
>> most one [with the rare exception of two] of them per function,
>> which will continue to grow with higher NR_CPUS values).
> 
> OTOH they are probably in some cases used in contexts where dynamic
> allocation (and possibility of failure) is not nice? And we can compare
this
> effort with just increasing per-cpu stack sizes, for example?
> 
> I''m not particularly against moving in this direction, but there
might be
> cases where it isn''t worth the effort, or there are better
solutions.
Indeed, in some cases it may mean moving from stack variables to
e.g. per-CPU pre-allocated ones.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2011 - [PATCH 00/12] cpumask handling scalability improvements

[Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements