thr3ads.net - Xen devel - [RFC] Extending numbers of event channels [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Wei Liu

2012-Dec-03 16:29 UTC

[RFC] Extending numbers of event channels

Hi all

There has been discussion on extending number of event channels back in
September [0].

Regarding Jan''s comment in [0], I don''t think allowing user to
specify
arbitrary number of levels a good idea. Because only the last level
should be shared among vcpus, other level should be in percpu struct to
allow for quicker lookup. The idea to let user specify levels will be
too complicated in implementation and blow up percpu section (since the
size grows exponentially). Three levels should be quite enough. See
maths below.

Number of event channels:
 * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
 * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
Basically the third level is a new ABI, so I choose to use unsigned long
long here to get more event channels.

Pages occupied by the third level (if PAGE_SIZE=4k):
 * 32bit: 64k  / 8 / 4k = 2
 * 64bit: 512k / 8 / 4k = 16

Making second level percpu will incur overhead. In fact we move the
array in shared info into percpu struct:
 * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
 * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte

What concerns me is that the struct evtchn buckets are allocated all at
once during initialization phrase. To save memory inside Xen, the
internal allocation/free scheme for evtchn needs to be modified. Ian
suggested we do small number of buckets at start of day then dynamically
fault in more as required.

To sum up:
     1. Guest should allocate pages for third level evtchn.
     2. Guest should register third level pages via a new hypercall op.
     3. Hypervisor should setup third level evtchn in that hypercall op.
     4. Only last level (third in this case) should be shared among
        vcpus.
     5. Need a flexible allocation/free scheme of struct evtchn.
     6. Debug dumping should use snapshot to avoid holding event lock
        for too long. (Jan''s concern in [0])

Any comments are welcomed.


Wei.

[0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921

Jan Beulich

2012-Dec-03 17:35 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote:
> Regarding Jan''s comment in [0], I don''t think allowing
user to specify
> arbitrary number of levels a good idea. Because only the last level
> should be shared among vcpus, other level should be in percpu struct to
> allow for quicker lookup. The idea to let user specify levels will be
> too complicated in implementation and blow up percpu section (since the
> size grows exponentially). Three levels should be quite enough. See
> maths below.
I didn''t ask to implement more than three levels, I just asked for
the interface to establish the number of levels a guest wants to
use to allow for higher numbers (passing of which would result in
-EINVAL in your implementation).
> Number of event channels:
>  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
>  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> Basically the third level is a new ABI, so I choose to use unsigned long
> long here to get more event channels.
Please don''t: This would make things less consistent to handle
at least in the guest side code. And I don''t see why you would
have a need to do so anyway (or else your argument above
against further levels would become questionable).
> Pages occupied by the third level (if PAGE_SIZE=4k):
>  * 32bit: 64k  / 8 / 4k = 2
>  * 64bit: 512k / 8 / 4k = 16
> 
> Making second level percpu will incur overhead. In fact we move the
> array in shared info into percpu struct:
>  * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
>  * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> 
> What concerns me is that the struct evtchn buckets are allocated all at
> once during initialization phrase. To save memory inside Xen, the
> internal allocation/free scheme for evtchn needs to be modified. Ian
> suggested we do small number of buckets at start of day then dynamically
> fault in more as required.
> 
> To sum up:
>      1. Guest should allocate pages for third level evtchn.
>      2. Guest should register third level pages via a new hypercall op.
Doesn''t the guest also need to set up space for the 2nd level?

Jan
>      3. Hypervisor should setup third level evtchn in that hypercall op.
>      4. Only last level (third in this case) should be shared among
>         vcpus.
>      5. Need a flexible allocation/free scheme of struct evtchn.
>      6. Debug dumping should use snapshot to avoid holding event lock
>         for too long. (Jan''s concern in [0])
> 
> Any comments are welcomed.
> 
> 
> Wei.
> 
> [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921

Ian Campbell

2012-Dec-03 17:43 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:> Hi all
> 
> There has been discussion on extending number of event channels back in
> September [0].
While we are changing the ABI anyway, it would be nice to consider the
possibility for some small-ish number of per-vcpu evtchns.

These would potentially be useful for things like IPI vectors and such
which every vcpu has.

e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
64 which would be more than sufficient.

Ian.

Jan Beulich

2012-Dec-03 17:48 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 03.12.12 at 18:43, Ian Campbell <Ian.Campbell@citrix.com>
wrote:
> On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:
>> Hi all
>> 
>> There has been discussion on extending number of event channels back in
>> September [0].
> 
> While we are changing the ABI anyway, it would be nice to consider the
> possibility for some small-ish number of per-vcpu evtchns.
> 
> These would potentially be useful for things like IPI vectors and such
> which every vcpu has.
While I agree to this, ...
> e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
> 64 which would be more than sufficient.
... I would have hoped for this field to retain its use as top level
selector.

Jan

Ian Campbell

2012-Dec-03 17:50 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 17:48 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 18:43, Ian Campbell
<Ian.Campbell@citrix.com> wrote:
> > On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:
> >> Hi all
> >> 
> >> There has been discussion on extending number of event channels
back in
> >> September [0].
> > 
> > While we are changing the ABI anyway, it would be nice to consider the
> > possibility for some small-ish number of per-vcpu evtchns.
> > 
> > These would potentially be useful for things like IPI vectors and such
> > which every vcpu has.
> 
> While I agree to this, ...
> 
> > e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or
> > 64 which would be more than sufficient.
> 
> ... I would have hoped for this field to retain its use as top level
> selector.
Yes, I''m not sure why I thought it became unused...

Ian.

Wei Liu

2012-Dec-03 17:52 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com>
wrote:
> > Regarding Jan''s comment in [0], I don''t think
allowing user to specify
> > arbitrary number of levels a good idea. Because only the last level
> > should be shared among vcpus, other level should be in percpu struct
to
> > allow for quicker lookup. The idea to let user specify levels will be
> > too complicated in implementation and blow up percpu section (since
the
> > size grows exponentially). Three levels should be quite enough. See
> > maths below.
> 
> I didn''t ask to implement more than three levels, I just asked for
> the interface to establish the number of levels a guest wants to
> use to allow for higher numbers (passing of which would result in
> -EINVAL in your implementation).
> 
Ah, I understand now. How about something like this:

struct EVTCHNOP_reg_nlevel {
    int levels;
    void *level_specified_reg_struct;
}
> > Number of event channels:
> >  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> >  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > Basically the third level is a new ABI, so I choose to use unsigned
long
> > long here to get more event channels.
> 
> Please don''t: This would make things less consistent to handle
> at least in the guest side code. And I don''t see why you would
> have a need to do so anyway (or else your argument above
> against further levels would become questionable).
> 
It was suggested by Ian to use unsigned long long. Ian, why do you
prefer unsigned long long to unsigned long?
> > Pages occupied by the third level (if PAGE_SIZE=4k):
> >  * 32bit: 64k  / 8 / 4k = 2
> >  * 64bit: 512k / 8 / 4k = 16
> > 
> > Making second level percpu will incur overhead. In fact we move the
> > array in shared info into percpu struct:
> >  * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
> >  * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> > 
> > What concerns me is that the struct evtchn buckets are allocated all
at
> > once during initialization phrase. To save memory inside Xen, the
> > internal allocation/free scheme for evtchn needs to be modified. Ian
> > suggested we do small number of buckets at start of day then
dynamically
> > fault in more as required.
> > 
> > To sum up:
> >      1. Guest should allocate pages for third level evtchn.
> >      2. Guest should register third level pages via a new hypercall
op.
> 
> Doesn''t the guest also need to set up space for the 2nd level?
> 
Yes. That will be embedded in percpu struct vcpu_info, which will be
also register via the same hypercall op.


Wei.
> Jan
> 
> >      3. Hypervisor should setup third level evtchn in that hypercall
op.
> >      4. Only last level (third in this case) should be shared among
> >         vcpus.
> >      5. Need a flexible allocation/free scheme of struct evtchn.
> >      6. Debug dumping should use snapshot to avoid holding event lock
> >         for too long. (Jan''s concern in [0])
> > 
> > Any comments are welcomed.
> > 
> > 
> > Wei.
> > 
> > [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921 
> 
> 
>

Ian Campbell

2012-Dec-03 17:57 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote:> On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> > >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com>
wrote:
> > > Regarding Jan''s comment in [0], I don''t think
allowing user to specify
> > > arbitrary number of levels a good idea. Because only the last
level
> > > should be shared among vcpus, other level should be in percpu
struct to
> > > allow for quicker lookup. The idea to let user specify levels
will be
> > > too complicated in implementation and blow up percpu section
(since the
> > > size grows exponentially). Three levels should be quite enough.
See
> > > maths below.
> > 
> > I didn''t ask to implement more than three levels, I just
asked for
> > the interface to establish the number of levels a guest wants to
> > use to allow for higher numbers (passing of which would result in
> > -EINVAL in your implementation).
> > 
> 
> Ah, I understand now. How about something like this:
> 
> struct EVTCHNOP_reg_nlevel {
>     int levels;
>     void *level_specified_reg_struct;
> }
> 
> > > Number of event channels:
> > >  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> > >  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE =
512k
> > > Basically the third level is a new ABI, so I choose to use
unsigned long
> > > long here to get more event channels.
> > 
> > Please don''t: This would make things less consistent to
handle
> > at least in the guest side code. And I don''t see why you
would
> > have a need to do so anyway (or else your argument above
> > against further levels would become questionable).
> > 
> 
> It was suggested by Ian to use unsigned long long. Ian, why do you
> prefer unsigned long long to unsigned long?
I thought having 32 and 64 bit be the same might simplify some things,
but if not then that''s fine.

Is 32k event channels going to be enough in the long run? I suppose any
system capable of running such a number of guests ought to be using 64
bit == 512k which should at least last a bit longer.
> > > Pages occupied by the third level (if PAGE_SIZE=4k):
> > >  * 32bit: 64k  / 8 / 4k = 2
> > >  * 64bit: 512k / 8 / 4k = 16
> > > 
> > > Making second level percpu will incur overhead. In fact we move
the
> > > array in shared info into percpu struct:
> > >  * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128
byte
> > >  * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512
byte
> > > 
> > > What concerns me is that the struct evtchn buckets are allocated
all at
> > > once during initialization phrase. To save memory inside Xen, the
> > > internal allocation/free scheme for evtchn needs to be modified.
Ian
> > > suggested we do small number of buckets at start of day then
dynamically
> > > fault in more as required.
> > > 
> > > To sum up:
> > >      1. Guest should allocate pages for third level evtchn.
> > >      2. Guest should register third level pages via a new
hypercall op.
> > 
> > Doesn''t the guest also need to set up space for the 2nd
level?
> > 
> 
> Yes. That will be embedded in percpu struct vcpu_info, which will be
> also register via the same hypercall op.
NB that there is already a vcpu info placement hypercall. I have no
problem making this be a prerequisite for this work.

Ian.

Jan Beulich

2012-Dec-03 18:00 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote:
> On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
>> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com>
wrote:
>> > Regarding Jan''s comment in [0], I don''t think
allowing user to specify
>> > arbitrary number of levels a good idea. Because only the last
level
>> > should be shared among vcpus, other level should be in percpu
struct to
>> > allow for quicker lookup. The idea to let user specify levels will
be
>> > too complicated in implementation and blow up percpu section
(since the
>> > size grows exponentially). Three levels should be quite enough.
See
>> > maths below.
>> 
>> I didn''t ask to implement more than three levels, I just asked
for
>> the interface to establish the number of levels a guest wants to
>> use to allow for higher numbers (passing of which would result in
>> -EINVAL in your implementation).
>> 
> 
> Ah, I understand now. How about something like this:
> 
> struct EVTCHNOP_reg_nlevel {
>     int levels;
>     void *level_specified_reg_struct;
> }
Yes, just "unsigned int" please.
>> > To sum up:
>> >      1. Guest should allocate pages for third level evtchn.
>> >      2. Guest should register third level pages via a new
hypercall op.
>> 
>> Doesn''t the guest also need to set up space for the 2nd level?
>> 
> 
> Yes. That will be embedded in percpu struct vcpu_info, which will be
> also register via the same hypercall op.
"struct vcpu_info"? Same hypercall? Or are you mixing up types?

Jan

Wei Liu

2012-Dec-03 18:09 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com>
wrote:
> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> >> >>> On 03.12.12 at 17:29, Wei Liu
<Wei.Liu2@citrix.com> wrote:
> >> > Regarding Jan''s comment in [0], I don''t
think allowing user to specify
> >> > arbitrary number of levels a good idea. Because only the last
level
> >> > should be shared among vcpus, other level should be in percpu
struct to
> >> > allow for quicker lookup. The idea to let user specify levels
will be
> >> > too complicated in implementation and blow up percpu section
(since the
> >> > size grows exponentially). Three levels should be quite
enough. See
> >> > maths below.
> >> 
> >> I didn''t ask to implement more than three levels, I just
asked for
> >> the interface to establish the number of levels a guest wants to
> >> use to allow for higher numbers (passing of which would result in
> >> -EINVAL in your implementation).
> >> 
> > 
> > Ah, I understand now. How about something like this:
> > 
> > struct EVTCHNOP_reg_nlevel {
> >     int levels;
> >     void *level_specified_reg_struct;
> > }
> 
> Yes, just "unsigned int" please.
> 
Right, "unsigned int".
> >> > To sum up:
> >> >      1. Guest should allocate pages for third level evtchn.
> >> >      2. Guest should register third level pages via a new
hypercall op.
> >> 
> >> Doesn''t the guest also need to set up space for the 2nd
level?
> >> 
> > 
> > Yes. That will be embedded in percpu struct vcpu_info, which will be
> > also register via the same hypercall op.
> 
> "struct vcpu_info"? Same hypercall? Or are you mixing up types?
> 
What I meant was the second level will be embedded in struct vcpu_info,
and the 2nd level will be registered via some hypercall (not the struct
vcpu_info).


Wei.
> Jan
>

Wei Liu

2012-Dec-03 18:15 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 17:57 +0000, Ian Campbell wrote:> On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote:
> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> > > >>> On 03.12.12 at 17:29, Wei Liu
<Wei.Liu2@citrix.com> wrote:
> > > > Regarding Jan''s comment in [0], I don''t
think allowing user to specify
> > > > arbitrary number of levels a good idea. Because only the
last level
> > > > should be shared among vcpus, other level should be in
percpu struct to
> > > > allow for quicker lookup. The idea to let user specify
levels will be
> > > > too complicated in implementation and blow up percpu section
(since the
> > > > size grows exponentially). Three levels should be quite
enough. See
> > > > maths below.
> > > 
> > > I didn''t ask to implement more than three levels, I just
asked for
> > > the interface to establish the number of levels a guest wants to
> > > use to allow for higher numbers (passing of which would result in
> > > -EINVAL in your implementation).
> > > 
> > 
> > Ah, I understand now. How about something like this:
> > 
> > struct EVTCHNOP_reg_nlevel {
> >     int levels;
> >     void *level_specified_reg_struct;
> > }
> > 
> > > > Number of event channels:
> > > >  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE
= 64k
> > > >  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE
= 512k
> > > > Basically the third level is a new ABI, so I choose to use
unsigned long
> > > > long here to get more event channels.
> > > 
> > > Please don''t: This would make things less consistent to
handle
> > > at least in the guest side code. And I don''t see why you
would
> > > have a need to do so anyway (or else your argument above
> > > against further levels would become questionable).
> > > 
> > 
> > It was suggested by Ian to use unsigned long long. Ian, why do you
> > prefer unsigned long long to unsigned long?
> 
> I thought having 32 and 64 bit be the same might simplify some things,
> but if not then that''s fine.
> 
> Is 32k event channels going to be enough in the long run? I suppose any
> system capable of running such a number of guests ought to be using 64
> bit == 512k which should at least last a bit longer.
> 
I think 32k is quite enough for 32bit machines. And I agree with "system
capable of running such a number of guests ought to be using 64 bit =512k"
;-)
> > > > Pages occupied by the third level (if PAGE_SIZE=4k):
> > > >  * 32bit: 64k  / 8 / 4k = 2
> > > >  * 64bit: 512k / 8 / 4k = 16
> > > > 
> > > > Making second level percpu will incur overhead. In fact we
move the
> > > > array in shared info into percpu struct:
> > > >  * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long)
= 128 byte
> > > >  * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long)
= 512 byte
> > > > 
> > > > What concerns me is that the struct evtchn buckets are
allocated all at
> > > > once during initialization phrase. To save memory inside
Xen, the
> > > > internal allocation/free scheme for evtchn needs to be
modified. Ian
> > > > suggested we do small number of buckets at start of day then
dynamically
> > > > fault in more as required.
> > > > 
> > > > To sum up:
> > > >      1. Guest should allocate pages for third level evtchn.
> > > >      2. Guest should register third level pages via a new
hypercall op.
> > > 
> > > Doesn''t the guest also need to set up space for the 2nd
level?
> > > 
> > 
> > Yes. That will be embedded in percpu struct vcpu_info, which will be
> > also register via the same hypercall op.
> 
> NB that there is already a vcpu info placement hypercall. I have no
> problem making this be a prerequisite for this work.
> 
I saw that one. But that''s something down to implementation, so I
didn''t
go into details.


Wei.

David Vrabel

2012-Dec-03 18:52 UTC

head link

Re: [RFC] Extending numbers of event channels

On 03/12/12 16:29, Wei Liu wrote:> Hi all
> 
> There has been discussion on extending number of event channels back in
> September [0].
It seems that the decision has been made to go for this N-level
approach.  Were any other methods considered?

Would a per-VCPU ring of pending events work?  The ABI will be easier to
extend in the future for more event channels.  The guest side code will
be simpler.  It will be easier to fairly service the events as they will
be processed in the order they were raised.

The complexity would be in ensuring that events were not lost due to
lack of space in the ring.  This may make the ring prohibitively large
or require complex or expensive tracking of pending events inside Xen.
> Regarding Jan''s comment in [0], I don''t think allowing
user to specify
> arbitrary number of levels a good idea. Because only the last level
> should be shared among vcpus, other level should be in percpu struct to
> allow for quicker lookup. The idea to let user specify levels will be
> too complicated in implementation and blow up percpu section (since the
> size grows exponentially). Three levels should be quite enough. See
> maths below.
> 
> Number of event channels:
>  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
>  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> Basically the third level is a new ABI, so I choose to use unsigned long
> long here to get more event channels.
32-bit guests will have to treat the unsigned long long as two separate
words and iterate over then individually.

This is easy to do -- we have an experimental build of Xen and the
kernel that extends the number of event channels for 32-bit dom0 to 4096
by having each selector bit select a group of 4 words.

David
> 
> Pages occupied by the third level (if PAGE_SIZE=4k):
>  * 32bit: 64k  / 8 / 4k = 2
>  * 64bit: 512k / 8 / 4k = 16
> 
> Making second level percpu will incur overhead. In fact we move the
> array in shared info into percpu struct:
>  * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte
>  * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte
> 
> What concerns me is that the struct evtchn buckets are allocated all at
> once during initialization phrase. To save memory inside Xen, the
> internal allocation/free scheme for evtchn needs to be modified. Ian
> suggested we do small number of buckets at start of day then dynamically
> fault in more as required.
> 
> To sum up:
>      1. Guest should allocate pages for third level evtchn.
>      2. Guest should register third level pages via a new hypercall op.
>      3. Hypervisor should setup third level evtchn in that hypercall op.
>      4. Only last level (third in this case) should be shared among
>         vcpus.
>      5. Need a flexible allocation/free scheme of struct evtchn.
>      6. Debug dumping should use snapshot to avoid holding event lock
>         for too long. (Jan''s concern in [0])
> 
> Any comments are welcomed.
> 
> 
> Wei.
> 
> [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921

Wei Liu

2012-Dec-03 19:11 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, 2012-12-03 at 18:52 +0000, David Vrabel wrote:> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> > 
> > There has been discussion on extending number of event channels back
in
> > September [0].
> 
> It seems that the decision has been made to go for this N-level
> approach.  Were any other methods considered?
> 
Not yet. The discussion is still open.
> Would a per-VCPU ring of pending events work?  The ABI will be easier to
> extend in the future for more event channels.  The guest side code will
> be simpler.  It will be easier to fairly service the events as they will
> be processed in the order they were raised.
> 
Will there be scenario that we need to raise some evtchn''s priority?
The
ring approach is completely fair.
> The complexity would be in ensuring that events were not lost due to
> lack of space in the ring.  This may make the ring prohibitively large
> or require complex or expensive tracking of pending events inside Xen.
> 
This also needs to be considered and evaluated...
> > Regarding Jan''s comment in [0], I don''t think
allowing user to specify
> > arbitrary number of levels a good idea. Because only the last level
> > should be shared among vcpus, other level should be in percpu struct
to
> > allow for quicker lookup. The idea to let user specify levels will be
> > too complicated in implementation and blow up percpu section (since
the
> > size grows exponentially). Three levels should be quite enough. See
> > maths below.
> > 
> > Number of event channels:
> >  * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k
> >  * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k
> > Basically the third level is a new ABI, so I choose to use unsigned
long
> > long here to get more event channels.
> 
> 32-bit guests will have to treat the unsigned long long as two separate
> words and iterate over then individually.
> 
> This is easy to do -- we have an experimental build of Xen and the
> kernel that extends the number of event channels for 32-bit dom0 to 4096
> by having each selector bit select a group of 4 words.
> 
4096 is not enough. But this idea of each selector bit selecting more
words could be useful. Can you send me your patches then I will see what
I can do.


Wei.

Wei Liu

2012-Dec-03 20:56 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel
wrote:> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> > 
> > There has been discussion on extending number of event channels back
in
> > September [0].
> 
> It seems that the decision has been made to go for this N-level
> approach.  Were any other methods considered?
> 
> Would a per-VCPU ring of pending events work?  The ABI will be easier to
> extend in the future for more event channels.  The guest side code will
> be simpler.  It will be easier to fairly service the events as they will
> be processed in the order they were raised.
> 
> The complexity would be in ensuring that events were not lost due to
> lack of space in the ring.  This may make the ring prohibitively large
> or require complex or expensive tracking of pending events inside Xen.
> 
If I understand correctly, one event will always be queued up for
processing in the ring model, will this be too overkill? What if event
generation is much faster than processing?

In the current implementation, one event channel can be raised multiple
times but it is only processed once.


Wei.

Jan Beulich

2012-Dec-04 08:05 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com> wrote:
> On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:
>> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com>
wrote:
>> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
>> >> Doesn''t the guest also need to set up space for the
2nd level?
>> >> 
>> > 
>> > Yes. That will be embedded in percpu struct vcpu_info, which will
be
>> > also register via the same hypercall op.
>> 
>> "struct vcpu_info"? Same hypercall? Or are you mixing up
types?
>> 
> 
> What I meant was the second level will be embedded in struct vcpu_info,
> and the 2nd level will be registered via some hypercall (not the struct
> vcpu_info).
I would strongly recommend against embedding in
struct vcpu_info, particularly in the context of intending to
allow for having further levels in the future.

Plus I don''t think you really can embed this - there''s just
not
enough space left (I''m sure you''re aware that you
can''t
extend the structure in size), in fact there''s no space left at
all in the architecture independent part of the structure.

The only option you have is to declare the array you need to
add to immediately follow the structure when having used
the placement hypercall. That would probably be acceptable
for the second from the top level, but you''d again run into
(space) issues when wanting more than 3 levels (as you
validly said, all but the leaf level ought to be per-vCPU).

Jan

Ian Campbell

2012-Dec-04 09:30 UTC

head link

Re: [RFC] Extending numbers of event channels

On Tue, 2012-12-04 at 08:05 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com>
wrote:
> > On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:
> >> >>> On 03.12.12 at 18:52, Wei Liu
<Wei.Liu2@citrix.com> wrote:
> >> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:
> >> >> Doesn''t the guest also need to set up space for
the 2nd level?
> >> >> 
> >> > 
> >> > Yes. That will be embedded in percpu struct vcpu_info, which
will be
> >> > also register via the same hypercall op.
> >> 
> >> "struct vcpu_info"? Same hypercall? Or are you mixing up
types?
> >> 
> > 
> > What I meant was the second level will be embedded in struct
vcpu_info,
> > and the 2nd level will be registered via some hypercall (not the
struct
> > vcpu_info).
> 
> I would strongly recommend against embedding in
> struct vcpu_info, particularly in the context of intending to
> allow for having further levels in the future.
> 
> Plus I don''t think you really can embed this - there''s
just not
> enough space left (I''m sure you''re aware that you
can''t
> extend the structure in size), in fact there''s no space left at
> all in the architecture independent part of the structure.
> 
> The only option you have is to declare the array you need to
> add to immediately follow the structure when having used
> the placement hypercall. That would probably be acceptable
> for the second from the top level, but you''d again run into
> (space) issues when wanting more than 3 levels (as you
> validly said, all but the leaf level ought to be per-vCPU).
The amount of space needed for levels 1..N-1 ought to be pretty
formulaic for any N, so we could choose to require that enough space be
available after vcpu_info for levels 1..N-1?

With larger N that might become multiple pages, is that a problem? It
might be preferable to have an explicit call which takes potentially
non-contiguous addresses.

Ian.

Jan Beulich

2012-Dec-04 09:37 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 04.12.12 at 10:30, Ian Campbell <Ian.Campbell@citrix.com>
wrote:
> The amount of space needed for levels 1..N-1 ought to be pretty
> formulaic for any N, so we could choose to require that enough space be
> available after vcpu_info for levels 1..N-1?
> 
> With larger N that might become multiple pages, is that a problem? It
> might be preferable to have an explicit call which takes potentially
> non-contiguous addresses.
Yes, that was my point about space restriction. The current
registration call allows only for a single page.

Additionally, if we add something variably sized to vcpu_info,
I don''t think we should make this immediately successive - that
would prevent fixed size extensions in the future. Instead, we
should have the guest tell at what offset from the base of the
structure the bitmaps start.

Jan

George Dunlap

2012-Dec-04 11:29 UTC

head link

Re: [RFC] Extending numbers of event channels

On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel
<david.vrabel@citrix.com>wrote:
> On 03/12/12 16:29, Wei Liu wrote:
> > Hi all
> >
> > There has been discussion on extending number of event channels back
in
> > September [0].
>
> It seems that the decision has been made to go for this N-level
> approach.  Were any other methods considered?
>
> Would a per-VCPU ring of pending events work?  The ABI will be easier to
> extend in the future for more event channels.  The guest side code will
> be simpler.  It will be easier to fairly service the events as they will
> be processed in the order they were raised.
>
Not having looked in depth at either solution, it does seem like having
something like this, rather than an ever-growing array if bits that needs
to be examined, would be simpler and more scalable.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

David Vrabel

2012-Dec-04 11:35 UTC

head link

Re: [RFC] Extending numbers of event channels

On 03/12/12 20:56, Wei Liu wrote:> On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:
>> On 03/12/12 16:29, Wei Liu wrote:
>>> Hi all
>>>
>>> There has been discussion on extending number of event channels
back in
>>> September [0].
>>
>> It seems that the decision has been made to go for this N-level
>> approach.  Were any other methods considered?
>>
>> Would a per-VCPU ring of pending events work?  The ABI will be easier
to
>> extend in the future for more event channels.  The guest side code will
>> be simpler.  It will be easier to fairly service the events as they
will
>> be processed in the order they were raised.
>>
>> The complexity would be in ensuring that events were not lost due to
>> lack of space in the ring.  This may make the ring prohibitively large
>> or require complex or expensive tracking of pending events inside Xen.
>>
> 
> If I understand correctly, one event will always be queued up for
> processing in the ring model, will this be too overkill? What if event
> generation is much faster than processing?
> 
> In the current implementation, one event channel can be raised multiple
> times but it is only processed once.
There would have to be something in Xen to ensure an event was only
added to the ring once.  i.e., if it''s already in the ring, it
doesn''t
get added.  This is the tricky bit and I can''t immediately think of how
this would work.

David

Jan Beulich

2012-Dec-04 13:45 UTC

head link

Re: [RFC] Extending numbers of event channels

>>> On 04.12.12 at 12:29, George Dunlap <dunlapg@umich.edu>
wrote:
> On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel
<david.vrabel@citrix.com>wrote:
> 
>> On 03/12/12 16:29, Wei Liu wrote:
>> > Hi all
>> >
>> > There has been discussion on extending number of event channels
back in
>> > September [0].
>>
>> It seems that the decision has been made to go for this N-level
>> approach.  Were any other methods considered?
>>
>> Would a per-VCPU ring of pending events work?  The ABI will be easier
to
>> extend in the future for more event channels.  The guest side code will
>> be simpler.  It will be easier to fairly service the events as they
will
>> be processed in the order they were raised.
>>
> 
> Not having looked in depth at either solution, it does seem like having
> something like this, rather than an ever-growing array if bits that needs
> to be examined, would be simpler and more scalable.
Surely more scalable, but simpler?

Jan

Tim Deegan

2012-Dec-06 10:03 UTC

head link

Re: [RFC] Extending numbers of event channels

At 11:35 +0000 on 04 Dec (1354620912), David Vrabel
wrote:> On 03/12/12 20:56, Wei Liu wrote:
> > On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:
> >> On 03/12/12 16:29, Wei Liu wrote:
> >>> Hi all
> >>>
> >>> There has been discussion on extending number of event
channels back in
> >>> September [0].
> >>
> >> It seems that the decision has been made to go for this N-level
> >> approach.  Were any other methods considered?
> >>
> >> Would a per-VCPU ring of pending events work?  The ABI will be
easier to
> >> extend in the future for more event channels.  The guest side code
will
> >> be simpler.  It will be easier to fairly service the events as
they will
> >> be processed in the order they were raised.
> >>
> >> The complexity would be in ensuring that events were not lost due
to
> >> lack of space in the ring.  This may make the ring prohibitively
large
> >> or require complex or expensive tracking of pending events inside
Xen.
> >>
> > 
> > If I understand correctly, one event will always be queued up for
> > processing in the ring model, will this be too overkill? What if event
> > generation is much faster than processing?
> > 
> > In the current implementation, one event channel can be raised
multiple
> > times but it is only processed once.
> 
> There would have to be something in Xen to ensure an event was only
> added to the ring once.  i.e., if it''s already in the ring, it
doesn''t
> get added.  This is the tricky bit and I can''t immediately think
of how
> this would work.
Well, Xen could keep a bitmap of which events it had inserted in the
ring, and then the guest could clear bits from the bitmap as it serviced
the events... :)

More seriously, though, I prefer the existing bitmap interface.  It
handles masking and merging naturally, and it lets the guest service
events in whatever order it chooses (not having to process 1 word per
event of potentially uninteresting events off the ring to get to the one
it wants).

Tim.

Xen devel - Dec 2012 - [RFC] Extending numbers of event channels

[RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels

Re: [RFC] Extending numbers of event channels