Hi all There has been discussion on extending number of event channels back in September [0]. Regarding Jan''s comment in [0], I don''t think allowing user to specify arbitrary number of levels a good idea. Because only the last level should be shared among vcpus, other level should be in percpu struct to allow for quicker lookup. The idea to let user specify levels will be too complicated in implementation and blow up percpu section (since the size grows exponentially). Three levels should be quite enough. See maths below. Number of event channels: * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k Basically the third level is a new ABI, so I choose to use unsigned long long here to get more event channels. Pages occupied by the third level (if PAGE_SIZE=4k): * 32bit: 64k / 8 / 4k = 2 * 64bit: 512k / 8 / 4k = 16 Making second level percpu will incur overhead. In fact we move the array in shared info into percpu struct: * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte What concerns me is that the struct evtchn buckets are allocated all at once during initialization phrase. To save memory inside Xen, the internal allocation/free scheme for evtchn needs to be modified. Ian suggested we do small number of buckets at start of day then dynamically fault in more as required. To sum up: 1. Guest should allocate pages for third level evtchn. 2. Guest should register third level pages via a new hypercall op. 3. Hypervisor should setup third level evtchn in that hypercall op. 4. Only last level (third in this case) should be shared among vcpus. 5. Need a flexible allocation/free scheme of struct evtchn. 6. Debug dumping should use snapshot to avoid holding event lock for too long. (Jan''s concern in [0]) Any comments are welcomed. Wei. [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
>>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: > Regarding Jan''s comment in [0], I don''t think allowing user to specify > arbitrary number of levels a good idea. Because only the last level > should be shared among vcpus, other level should be in percpu struct to > allow for quicker lookup. The idea to let user specify levels will be > too complicated in implementation and blow up percpu section (since the > size grows exponentially). Three levels should be quite enough. See > maths below.I didn''t ask to implement more than three levels, I just asked for the interface to establish the number of levels a guest wants to use to allow for higher numbers (passing of which would result in -EINVAL in your implementation).> Number of event channels: > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > Basically the third level is a new ABI, so I choose to use unsigned long > long here to get more event channels.Please don''t: This would make things less consistent to handle at least in the guest side code. And I don''t see why you would have a need to do so anyway (or else your argument above against further levels would become questionable).> Pages occupied by the third level (if PAGE_SIZE=4k): > * 32bit: 64k / 8 / 4k = 2 > * 64bit: 512k / 8 / 4k = 16 > > Making second level percpu will incur overhead. In fact we move the > array in shared info into percpu struct: > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte > > What concerns me is that the struct evtchn buckets are allocated all at > once during initialization phrase. To save memory inside Xen, the > internal allocation/free scheme for evtchn needs to be modified. Ian > suggested we do small number of buckets at start of day then dynamically > fault in more as required. > > To sum up: > 1. Guest should allocate pages for third level evtchn. > 2. Guest should register third level pages via a new hypercall op.Doesn''t the guest also need to set up space for the 2nd level? Jan> 3. Hypervisor should setup third level evtchn in that hypercall op. > 4. Only last level (third in this case) should be shared among > vcpus. > 5. Need a flexible allocation/free scheme of struct evtchn. > 6. Debug dumping should use snapshot to avoid holding event lock > for too long. (Jan''s concern in [0]) > > Any comments are welcomed. > > > Wei. > > [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote:> Hi all > > There has been discussion on extending number of event channels back in > September [0].While we are changing the ABI anyway, it would be nice to consider the possibility for some small-ish number of per-vcpu evtchns. These would potentially be useful for things like IPI vectors and such which every vcpu has. e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or 64 which would be more than sufficient. Ian.
>>> On 03.12.12 at 18:43, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote: >> Hi all >> >> There has been discussion on extending number of event channels back in >> September [0]. > > While we are changing the ABI anyway, it would be nice to consider the > possibility for some small-ish number of per-vcpu evtchns. > > These would potentially be useful for things like IPI vectors and such > which every vcpu has.While I agree to this, ...> e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or > 64 which would be more than sufficient.... I would have hoped for this field to retain its use as top level selector. Jan
On Mon, 2012-12-03 at 17:48 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 18:43, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Mon, 2012-12-03 at 16:29 +0000, Wei Liu wrote: > >> Hi all > >> > >> There has been discussion on extending number of event channels back in > >> September [0]. > > > > While we are changing the ABI anyway, it would be nice to consider the > > possibility for some small-ish number of per-vcpu evtchns. > > > > These would potentially be useful for things like IPI vectors and such > > which every vcpu has. > > While I agree to this, ... > > > e.g. reusing the per-vcpu space in evtchn_pending_sel would give 32 or > > 64 which would be more than sufficient. > > ... I would have hoped for this field to retain its use as top level > selector.Yes, I''m not sure why I thought it became unused... Ian.
On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: > > Regarding Jan''s comment in [0], I don''t think allowing user to specify > > arbitrary number of levels a good idea. Because only the last level > > should be shared among vcpus, other level should be in percpu struct to > > allow for quicker lookup. The idea to let user specify levels will be > > too complicated in implementation and blow up percpu section (since the > > size grows exponentially). Three levels should be quite enough. See > > maths below. > > I didn''t ask to implement more than three levels, I just asked for > the interface to establish the number of levels a guest wants to > use to allow for higher numbers (passing of which would result in > -EINVAL in your implementation). >Ah, I understand now. How about something like this: struct EVTCHNOP_reg_nlevel { int levels; void *level_specified_reg_struct; }> > Number of event channels: > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > > Basically the third level is a new ABI, so I choose to use unsigned long > > long here to get more event channels. > > Please don''t: This would make things less consistent to handle > at least in the guest side code. And I don''t see why you would > have a need to do so anyway (or else your argument above > against further levels would become questionable). >It was suggested by Ian to use unsigned long long. Ian, why do you prefer unsigned long long to unsigned long?> > Pages occupied by the third level (if PAGE_SIZE=4k): > > * 32bit: 64k / 8 / 4k = 2 > > * 64bit: 512k / 8 / 4k = 16 > > > > Making second level percpu will incur overhead. In fact we move the > > array in shared info into percpu struct: > > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte > > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte > > > > What concerns me is that the struct evtchn buckets are allocated all at > > once during initialization phrase. To save memory inside Xen, the > > internal allocation/free scheme for evtchn needs to be modified. Ian > > suggested we do small number of buckets at start of day then dynamically > > fault in more as required. > > > > To sum up: > > 1. Guest should allocate pages for third level evtchn. > > 2. Guest should register third level pages via a new hypercall op. > > Doesn''t the guest also need to set up space for the 2nd level? >Yes. That will be embedded in percpu struct vcpu_info, which will be also register via the same hypercall op. Wei.> Jan > > > 3. Hypervisor should setup third level evtchn in that hypercall op. > > 4. Only last level (third in this case) should be shared among > > vcpus. > > 5. Need a flexible allocation/free scheme of struct evtchn. > > 6. Debug dumping should use snapshot to avoid holding event lock > > for too long. (Jan''s concern in [0]) > > > > Any comments are welcomed. > > > > > > Wei. > > > > [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921 > > >
On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote:> On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: > > >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: > > > Regarding Jan''s comment in [0], I don''t think allowing user to specify > > > arbitrary number of levels a good idea. Because only the last level > > > should be shared among vcpus, other level should be in percpu struct to > > > allow for quicker lookup. The idea to let user specify levels will be > > > too complicated in implementation and blow up percpu section (since the > > > size grows exponentially). Three levels should be quite enough. See > > > maths below. > > > > I didn''t ask to implement more than three levels, I just asked for > > the interface to establish the number of levels a guest wants to > > use to allow for higher numbers (passing of which would result in > > -EINVAL in your implementation). > > > > Ah, I understand now. How about something like this: > > struct EVTCHNOP_reg_nlevel { > int levels; > void *level_specified_reg_struct; > } > > > > Number of event channels: > > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > > > Basically the third level is a new ABI, so I choose to use unsigned long > > > long here to get more event channels. > > > > Please don''t: This would make things less consistent to handle > > at least in the guest side code. And I don''t see why you would > > have a need to do so anyway (or else your argument above > > against further levels would become questionable). > > > > It was suggested by Ian to use unsigned long long. Ian, why do you > prefer unsigned long long to unsigned long?I thought having 32 and 64 bit be the same might simplify some things, but if not then that''s fine. Is 32k event channels going to be enough in the long run? I suppose any system capable of running such a number of guests ought to be using 64 bit == 512k which should at least last a bit longer.> > > Pages occupied by the third level (if PAGE_SIZE=4k): > > > * 32bit: 64k / 8 / 4k = 2 > > > * 64bit: 512k / 8 / 4k = 16 > > > > > > Making second level percpu will incur overhead. In fact we move the > > > array in shared info into percpu struct: > > > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte > > > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte > > > > > > What concerns me is that the struct evtchn buckets are allocated all at > > > once during initialization phrase. To save memory inside Xen, the > > > internal allocation/free scheme for evtchn needs to be modified. Ian > > > suggested we do small number of buckets at start of day then dynamically > > > fault in more as required. > > > > > > To sum up: > > > 1. Guest should allocate pages for third level evtchn. > > > 2. Guest should register third level pages via a new hypercall op. > > > > Doesn''t the guest also need to set up space for the 2nd level? > > > > Yes. That will be embedded in percpu struct vcpu_info, which will be > also register via the same hypercall op.NB that there is already a vcpu info placement hypercall. I have no problem making this be a prerequisite for this work. Ian.
>>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote: > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: >> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: >> > Regarding Jan''s comment in [0], I don''t think allowing user to specify >> > arbitrary number of levels a good idea. Because only the last level >> > should be shared among vcpus, other level should be in percpu struct to >> > allow for quicker lookup. The idea to let user specify levels will be >> > too complicated in implementation and blow up percpu section (since the >> > size grows exponentially). Three levels should be quite enough. See >> > maths below. >> >> I didn''t ask to implement more than three levels, I just asked for >> the interface to establish the number of levels a guest wants to >> use to allow for higher numbers (passing of which would result in >> -EINVAL in your implementation). >> > > Ah, I understand now. How about something like this: > > struct EVTCHNOP_reg_nlevel { > int levels; > void *level_specified_reg_struct; > }Yes, just "unsigned int" please.>> > To sum up: >> > 1. Guest should allocate pages for third level evtchn. >> > 2. Guest should register third level pages via a new hypercall op. >> >> Doesn''t the guest also need to set up space for the 2nd level? >> > > Yes. That will be embedded in percpu struct vcpu_info, which will be > also register via the same hypercall op."struct vcpu_info"? Same hypercall? Or are you mixing up types? Jan
On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote: > > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: > >> >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: > >> > Regarding Jan''s comment in [0], I don''t think allowing user to specify > >> > arbitrary number of levels a good idea. Because only the last level > >> > should be shared among vcpus, other level should be in percpu struct to > >> > allow for quicker lookup. The idea to let user specify levels will be > >> > too complicated in implementation and blow up percpu section (since the > >> > size grows exponentially). Three levels should be quite enough. See > >> > maths below. > >> > >> I didn''t ask to implement more than three levels, I just asked for > >> the interface to establish the number of levels a guest wants to > >> use to allow for higher numbers (passing of which would result in > >> -EINVAL in your implementation). > >> > > > > Ah, I understand now. How about something like this: > > > > struct EVTCHNOP_reg_nlevel { > > int levels; > > void *level_specified_reg_struct; > > } > > Yes, just "unsigned int" please. >Right, "unsigned int".> >> > To sum up: > >> > 1. Guest should allocate pages for third level evtchn. > >> > 2. Guest should register third level pages via a new hypercall op. > >> > >> Doesn''t the guest also need to set up space for the 2nd level? > >> > > > > Yes. That will be embedded in percpu struct vcpu_info, which will be > > also register via the same hypercall op. > > "struct vcpu_info"? Same hypercall? Or are you mixing up types? >What I meant was the second level will be embedded in struct vcpu_info, and the 2nd level will be registered via some hypercall (not the struct vcpu_info). Wei.> Jan >
On Mon, 2012-12-03 at 17:57 +0000, Ian Campbell wrote:> On Mon, 2012-12-03 at 17:52 +0000, Wei Liu wrote: > > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: > > > >>> On 03.12.12 at 17:29, Wei Liu <Wei.Liu2@citrix.com> wrote: > > > > Regarding Jan''s comment in [0], I don''t think allowing user to specify > > > > arbitrary number of levels a good idea. Because only the last level > > > > should be shared among vcpus, other level should be in percpu struct to > > > > allow for quicker lookup. The idea to let user specify levels will be > > > > too complicated in implementation and blow up percpu section (since the > > > > size grows exponentially). Three levels should be quite enough. See > > > > maths below. > > > > > > I didn''t ask to implement more than three levels, I just asked for > > > the interface to establish the number of levels a guest wants to > > > use to allow for higher numbers (passing of which would result in > > > -EINVAL in your implementation). > > > > > > > Ah, I understand now. How about something like this: > > > > struct EVTCHNOP_reg_nlevel { > > int levels; > > void *level_specified_reg_struct; > > } > > > > > > Number of event channels: > > > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > > > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > > > > Basically the third level is a new ABI, so I choose to use unsigned long > > > > long here to get more event channels. > > > > > > Please don''t: This would make things less consistent to handle > > > at least in the guest side code. And I don''t see why you would > > > have a need to do so anyway (or else your argument above > > > against further levels would become questionable). > > > > > > > It was suggested by Ian to use unsigned long long. Ian, why do you > > prefer unsigned long long to unsigned long? > > I thought having 32 and 64 bit be the same might simplify some things, > but if not then that''s fine. > > Is 32k event channels going to be enough in the long run? I suppose any > system capable of running such a number of guests ought to be using 64 > bit == 512k which should at least last a bit longer. >I think 32k is quite enough for 32bit machines. And I agree with "system capable of running such a number of guests ought to be using 64 bit =512k" ;-)> > > > Pages occupied by the third level (if PAGE_SIZE=4k): > > > > * 32bit: 64k / 8 / 4k = 2 > > > > * 64bit: 512k / 8 / 4k = 16 > > > > > > > > Making second level percpu will incur overhead. In fact we move the > > > > array in shared info into percpu struct: > > > > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte > > > > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte > > > > > > > > What concerns me is that the struct evtchn buckets are allocated all at > > > > once during initialization phrase. To save memory inside Xen, the > > > > internal allocation/free scheme for evtchn needs to be modified. Ian > > > > suggested we do small number of buckets at start of day then dynamically > > > > fault in more as required. > > > > > > > > To sum up: > > > > 1. Guest should allocate pages for third level evtchn. > > > > 2. Guest should register third level pages via a new hypercall op. > > > > > > Doesn''t the guest also need to set up space for the 2nd level? > > > > > > > Yes. That will be embedded in percpu struct vcpu_info, which will be > > also register via the same hypercall op. > > NB that there is already a vcpu info placement hypercall. I have no > problem making this be a prerequisite for this work. >I saw that one. But that''s something down to implementation, so I didn''t go into details. Wei.
On 03/12/12 16:29, Wei Liu wrote:> Hi all > > There has been discussion on extending number of event channels back in > September [0].It seems that the decision has been made to go for this N-level approach. Were any other methods considered? Would a per-VCPU ring of pending events work? The ABI will be easier to extend in the future for more event channels. The guest side code will be simpler. It will be easier to fairly service the events as they will be processed in the order they were raised. The complexity would be in ensuring that events were not lost due to lack of space in the ring. This may make the ring prohibitively large or require complex or expensive tracking of pending events inside Xen.> Regarding Jan''s comment in [0], I don''t think allowing user to specify > arbitrary number of levels a good idea. Because only the last level > should be shared among vcpus, other level should be in percpu struct to > allow for quicker lookup. The idea to let user specify levels will be > too complicated in implementation and blow up percpu section (since the > size grows exponentially). Three levels should be quite enough. See > maths below. > > Number of event channels: > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > Basically the third level is a new ABI, so I choose to use unsigned long > long here to get more event channels.32-bit guests will have to treat the unsigned long long as two separate words and iterate over then individually. This is easy to do -- we have an experimental build of Xen and the kernel that extends the number of event channels for 32-bit dom0 to 4096 by having each selector bit select a group of 4 words. David> > Pages occupied by the third level (if PAGE_SIZE=4k): > * 32bit: 64k / 8 / 4k = 2 > * 64bit: 512k / 8 / 4k = 16 > > Making second level percpu will incur overhead. In fact we move the > array in shared info into percpu struct: > * 32bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 128 byte > * 64bit: sizeof(unsigned long) * 8 * sizeof(unsigned long) = 512 byte > > What concerns me is that the struct evtchn buckets are allocated all at > once during initialization phrase. To save memory inside Xen, the > internal allocation/free scheme for evtchn needs to be modified. Ian > suggested we do small number of buckets at start of day then dynamically > fault in more as required. > > To sum up: > 1. Guest should allocate pages for third level evtchn. > 2. Guest should register third level pages via a new hypercall op. > 3. Hypervisor should setup third level evtchn in that hypercall op. > 4. Only last level (third in this case) should be shared among > vcpus. > 5. Need a flexible allocation/free scheme of struct evtchn. > 6. Debug dumping should use snapshot to avoid holding event lock > for too long. (Jan''s concern in [0]) > > Any comments are welcomed. > > > Wei. > > [0] http://thread.gmane.org/gmane.comp.emulators.xen.devel/139921
On Mon, 2012-12-03 at 18:52 +0000, David Vrabel wrote:> On 03/12/12 16:29, Wei Liu wrote: > > Hi all > > > > There has been discussion on extending number of event channels back in > > September [0]. > > It seems that the decision has been made to go for this N-level > approach. Were any other methods considered? >Not yet. The discussion is still open.> Would a per-VCPU ring of pending events work? The ABI will be easier to > extend in the future for more event channels. The guest side code will > be simpler. It will be easier to fairly service the events as they will > be processed in the order they were raised. >Will there be scenario that we need to raise some evtchn''s priority? The ring approach is completely fair.> The complexity would be in ensuring that events were not lost due to > lack of space in the ring. This may make the ring prohibitively large > or require complex or expensive tracking of pending events inside Xen. >This also needs to be considered and evaluated...> > Regarding Jan''s comment in [0], I don''t think allowing user to specify > > arbitrary number of levels a good idea. Because only the last level > > should be shared among vcpus, other level should be in percpu struct to > > allow for quicker lookup. The idea to let user specify levels will be > > too complicated in implementation and blow up percpu section (since the > > size grows exponentially). Three levels should be quite enough. See > > maths below. > > > > Number of event channels: > > * 32bit: 1024 * sizeof(unsigned long long) * BITS_PER_BYTE = 64k > > * 64bit: 4096 * sizeof(unsigned long long) * BITS_PER_BYTE = 512k > > Basically the third level is a new ABI, so I choose to use unsigned long > > long here to get more event channels. > > 32-bit guests will have to treat the unsigned long long as two separate > words and iterate over then individually. > > This is easy to do -- we have an experimental build of Xen and the > kernel that extends the number of event channels for 32-bit dom0 to 4096 > by having each selector bit select a group of 4 words. >4096 is not enough. But this idea of each selector bit selecting more words could be useful. Can you send me your patches then I will see what I can do. Wei.
On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote:> On 03/12/12 16:29, Wei Liu wrote: > > Hi all > > > > There has been discussion on extending number of event channels back in > > September [0]. > > It seems that the decision has been made to go for this N-level > approach. Were any other methods considered? > > Would a per-VCPU ring of pending events work? The ABI will be easier to > extend in the future for more event channels. The guest side code will > be simpler. It will be easier to fairly service the events as they will > be processed in the order they were raised. > > The complexity would be in ensuring that events were not lost due to > lack of space in the ring. This may make the ring prohibitively large > or require complex or expensive tracking of pending events inside Xen. >If I understand correctly, one event will always be queued up for processing in the ring model, will this be too overkill? What if event generation is much faster than processing? In the current implementation, one event channel can be raised multiple times but it is only processed once. Wei.
>>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com> wrote: > On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote: >> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote: >> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: >> >> Doesn''t the guest also need to set up space for the 2nd level? >> >> >> > >> > Yes. That will be embedded in percpu struct vcpu_info, which will be >> > also register via the same hypercall op. >> >> "struct vcpu_info"? Same hypercall? Or are you mixing up types? >> > > What I meant was the second level will be embedded in struct vcpu_info, > and the 2nd level will be registered via some hypercall (not the struct > vcpu_info).I would strongly recommend against embedding in struct vcpu_info, particularly in the context of intending to allow for having further levels in the future. Plus I don''t think you really can embed this - there''s just not enough space left (I''m sure you''re aware that you can''t extend the structure in size), in fact there''s no space left at all in the architecture independent part of the structure. The only option you have is to declare the array you need to add to immediately follow the structure when having used the placement hypercall. That would probably be acceptable for the second from the top level, but you''d again run into (space) issues when wanting more than 3 levels (as you validly said, all but the leaf level ought to be per-vCPU). Jan
On Tue, 2012-12-04 at 08:05 +0000, Jan Beulich wrote:> >>> On 03.12.12 at 19:09, Wei Liu <Wei.Liu2@citrix.com> wrote: > > On Mon, 2012-12-03 at 18:00 +0000, Jan Beulich wrote: > >> >>> On 03.12.12 at 18:52, Wei Liu <Wei.Liu2@citrix.com> wrote: > >> > On Mon, 2012-12-03 at 17:35 +0000, Jan Beulich wrote: > >> >> Doesn''t the guest also need to set up space for the 2nd level? > >> >> > >> > > >> > Yes. That will be embedded in percpu struct vcpu_info, which will be > >> > also register via the same hypercall op. > >> > >> "struct vcpu_info"? Same hypercall? Or are you mixing up types? > >> > > > > What I meant was the second level will be embedded in struct vcpu_info, > > and the 2nd level will be registered via some hypercall (not the struct > > vcpu_info). > > I would strongly recommend against embedding in > struct vcpu_info, particularly in the context of intending to > allow for having further levels in the future. > > Plus I don''t think you really can embed this - there''s just not > enough space left (I''m sure you''re aware that you can''t > extend the structure in size), in fact there''s no space left at > all in the architecture independent part of the structure. > > The only option you have is to declare the array you need to > add to immediately follow the structure when having used > the placement hypercall. That would probably be acceptable > for the second from the top level, but you''d again run into > (space) issues when wanting more than 3 levels (as you > validly said, all but the leaf level ought to be per-vCPU).The amount of space needed for levels 1..N-1 ought to be pretty formulaic for any N, so we could choose to require that enough space be available after vcpu_info for levels 1..N-1? With larger N that might become multiple pages, is that a problem? It might be preferable to have an explicit call which takes potentially non-contiguous addresses. Ian.
>>> On 04.12.12 at 10:30, Ian Campbell <Ian.Campbell@citrix.com> wrote: > The amount of space needed for levels 1..N-1 ought to be pretty > formulaic for any N, so we could choose to require that enough space be > available after vcpu_info for levels 1..N-1? > > With larger N that might become multiple pages, is that a problem? It > might be preferable to have an explicit call which takes potentially > non-contiguous addresses.Yes, that was my point about space restriction. The current registration call allows only for a single page. Additionally, if we add something variably sized to vcpu_info, I don''t think we should make this immediately successive - that would prevent fixed size extensions in the future. Instead, we should have the guest tell at what offset from the base of the structure the bitmaps start. Jan
On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel <david.vrabel@citrix.com>wrote:> On 03/12/12 16:29, Wei Liu wrote: > > Hi all > > > > There has been discussion on extending number of event channels back in > > September [0]. > > It seems that the decision has been made to go for this N-level > approach. Were any other methods considered? > > Would a per-VCPU ring of pending events work? The ABI will be easier to > extend in the future for more event channels. The guest side code will > be simpler. It will be easier to fairly service the events as they will > be processed in the order they were raised. >Not having looked in depth at either solution, it does seem like having something like this, rather than an ever-growing array if bits that needs to be examined, would be simpler and more scalable. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/12/12 20:56, Wei Liu wrote:> On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote: >> On 03/12/12 16:29, Wei Liu wrote: >>> Hi all >>> >>> There has been discussion on extending number of event channels back in >>> September [0]. >> >> It seems that the decision has been made to go for this N-level >> approach. Were any other methods considered? >> >> Would a per-VCPU ring of pending events work? The ABI will be easier to >> extend in the future for more event channels. The guest side code will >> be simpler. It will be easier to fairly service the events as they will >> be processed in the order they were raised. >> >> The complexity would be in ensuring that events were not lost due to >> lack of space in the ring. This may make the ring prohibitively large >> or require complex or expensive tracking of pending events inside Xen. >> > > If I understand correctly, one event will always be queued up for > processing in the ring model, will this be too overkill? What if event > generation is much faster than processing? > > In the current implementation, one event channel can be raised multiple > times but it is only processed once.There would have to be something in Xen to ensure an event was only added to the ring once. i.e., if it''s already in the ring, it doesn''t get added. This is the tricky bit and I can''t immediately think of how this would work. David
>>> On 04.12.12 at 12:29, George Dunlap <dunlapg@umich.edu> wrote: > On Mon, Dec 3, 2012 at 6:52 PM, David Vrabel <david.vrabel@citrix.com>wrote: > >> On 03/12/12 16:29, Wei Liu wrote: >> > Hi all >> > >> > There has been discussion on extending number of event channels back in >> > September [0]. >> >> It seems that the decision has been made to go for this N-level >> approach. Were any other methods considered? >> >> Would a per-VCPU ring of pending events work? The ABI will be easier to >> extend in the future for more event channels. The guest side code will >> be simpler. It will be easier to fairly service the events as they will >> be processed in the order they were raised. >> > > Not having looked in depth at either solution, it does seem like having > something like this, rather than an ever-growing array if bits that needs > to be examined, would be simpler and more scalable.Surely more scalable, but simpler? Jan
At 11:35 +0000 on 04 Dec (1354620912), David Vrabel wrote:> On 03/12/12 20:56, Wei Liu wrote: > > On Mon, Dec 03, 2012 at 06:52:41PM +0000, David Vrabel wrote: > >> On 03/12/12 16:29, Wei Liu wrote: > >>> Hi all > >>> > >>> There has been discussion on extending number of event channels back in > >>> September [0]. > >> > >> It seems that the decision has been made to go for this N-level > >> approach. Were any other methods considered? > >> > >> Would a per-VCPU ring of pending events work? The ABI will be easier to > >> extend in the future for more event channels. The guest side code will > >> be simpler. It will be easier to fairly service the events as they will > >> be processed in the order they were raised. > >> > >> The complexity would be in ensuring that events were not lost due to > >> lack of space in the ring. This may make the ring prohibitively large > >> or require complex or expensive tracking of pending events inside Xen. > >> > > > > If I understand correctly, one event will always be queued up for > > processing in the ring model, will this be too overkill? What if event > > generation is much faster than processing? > > > > In the current implementation, one event channel can be raised multiple > > times but it is only processed once. > > There would have to be something in Xen to ensure an event was only > added to the ring once. i.e., if it''s already in the ring, it doesn''t > get added. This is the tricky bit and I can''t immediately think of how > this would work.Well, Xen could keep a bitmap of which events it had inserted in the ring, and then the guest could clear bits from the bitmap as it serviced the events... :) More seriously, though, I prefer the existing bitmap interface. It handles masking and merging naturally, and it lets the guest service events in whatever order it chooses (not having to process 1 word per event of potentially uninteresting events off the ring to get to the one it wants). Tim.