Guy Zana
2007-Aug-09 17:45 UTC
[Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
We propose the following method in order to support interdomain interrupt sharing, where one of the domains is an HVM assigned with a pass-through device. This method is limited in a way that we can support sharing between just two domains: dom0 and an HVM. This method is based on changing polarity. Terminology ========Change polarity algorithm (CPA) - Algorithm when polarity inversion is used for the EOI recognition. For details see http://lists.xensource.com/archives/html/xen-devel/2007-05/msg01148.html . PLINE - Physical Line. This is the reflection of the physical line. By changing polarity we know what is the physical line''s status. VLINE - Virtual Line. This is the HVM virtual line. PT Device - A pass-through PCI device assigned to the HVM. Dom0 Device - A PCI device assigned to dom0 (by default). Interrupt Sharing - Determined by two or more PCI devices, which their''s intx lines are connected to the same IOAPIC''s pin (OR wired), and assigned to different domains. Re-occurring interrupts - The pline is held asserted while the IOAPIC fire interrupts continuously. Spurious interrupts - Whitin a domain context, an interrupt that passed the ISR chain without handling. NOTE: A single PCI device can not be assigned to more than one domain simultaneously. When a single device is assigned to an HVM, using CPA, we update the HVM''s VLINE according to the PLINE state (both hold the same value) thus providing complete reflection. It is trivial to see how more than one device that shares the same line could be assigned to the HVM (using the same CPA). In general, we should consider the situation were N devices from Dom0 shares the same line with M devices from HVM. There are 3 cases possible: 1. N=0, i.e. this line belongs to HVM devices. This case is already solved with CPA. 2. M=0, i.e. this line belongs to Dom0 devices. This is basic dom0 functionality. 3. N != 0, M != 0. This is the situation that we want to handle now, from now on we''ll refer to this situation as interdomain shared interrupt. Although, our method could be extended to contain handling for all of the above cases. Problems related to Interdomain Interrupt Sharing ===================================* Spurious interrupts. * Interrupt starvation. * When we use CPA, we are not getting re-occurring interrupts, this should be taken into account. * Even if a shared interrupt was handled by a domain specific ISR, it is not guaranteed that the pline will be deasserted. * Interrupt storming - _Physical_ storming is solved transparently by CPA. Goals ====* Letting both the HVM and DOM0 a chance to handle the interrupt * Update the HVM''s VLINE correctly when sharing an interrupt * Avoid spurious interrupts or at least minimize the number of such interrupts injected into HVM. * Stay with a reasonable interrupt latency. Proffered Method ============1. We gain shared line assertion state by using CPA, at an assert/deassert event we save the line''s state. 2. We perform most of the logic in a periodic timer module. Modules =====1. Timer module. Periodic callback that does all the logic processing. 2. XEN interrupt handler. Handler is replaced by CPA that updates PLINE. 3. Dom0 ISR chain. At the end of the chain, we know whether the interrupt was handled or not, and update the status in Xen using a hypercall. States ====1. Idle. The PLINE is deasserted. This is "relax state". We''re awaiting the interrupt to come. 2. In Dom0. The interrupt is currently handled by Dom0. The event was sent into Dom0 and Dom0 ISR is processing it. 3. Process Interrupt. The interrupt was handled by Dom0. Dom0 got back to us with the results of the handling. Now we need to decide what to do next. This state can be reached only from state [2]. State machine ==========The timer callback implement the state machine, it freezes when we are in the idle state. The "events" described below are polled by the timer. We also perform changes in dom0''s ISR chain in order to generate these "events". The following events are handled: A. PLINE is deasserted. This event will move state machine to _Idle_ state from any state. This can happen in one of 2 cases: 1. Initialization. 2. As a result of PLINE deassertion. If PLINE went down, it means that we''re done. B. Idle state and PLINE is asserted. In this case the interrupt is injected into DOM0. The state machine moves to "In Dom0". We always firstly let domain0 try to handle the interrupt, thus logically creating an interdomain ISR chain beginning with dom0. C. "In Dom0" and PLINE status is asserted (We read the status from a timer). Do nothing. We don''t know what to do with this interrupt yet. D. "Process Interrupt" and PLINE is asserted. Few cases are possible: 1. If Dom0 successfully handled the last interrupt and the interrupt wasn''t injected into the HVM, inject the interrupt into Dom0 and move to state "In Dom0". This is the Dom0 interrupt, keep injecting into Dom0. 2. If Dom0 successfully handled the last interrupt and the interrupt was injected into the HVM, deassert the HVM vline, and re-inject the interrupt into Dom0. Move to state "In Dom0". (This is done in order to solve a case where the HVM was handling the interrupt, but the line didn''t get deasserted because a Dom0 device asserted it before the a PT device deasserted it (as result of the HVM handling). In this case we assume that the HVM is done with it and now it''s Dom0''s turn.) 3. If Dom0 didn''t successfully handle the last interrupt and the interrupt was not injected into the HVM, inject the interrupt into the HVM and stay in the same state. This is an HVM''s interrupt. Dom0 rejected it. 4. If Dom0 didn''t successfully handle the last interrupt and the interrupt was injected into HVM, inject interrupt into Dom0 and move to state "in Dom0". HVM is not done yet with current interrupt. E."Process Interrupt" and the PLINE is deasserted,- deassert the HVM interrupt(if neccesary) and move to idle. We handled the interrupt. Prepare ourselves for the new one. The main idea here is to inject the interrupt into Dom0 when we don''t know what to do with it. If Dom0 takes the ownership, then let it handle the interrupt. If not, we inject it into the HVM. We recognize that all of the PT devices are not asserting the line by PLINE deassertion or by Dom0 taking the ownership back to it. Any ideas and comments are welcome. Best regards, Alex Novik, Neocleus. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-10 02:58 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
Hi, Guy, Thanks for very good description. Basically I think this should work, but with following concerns: - How to choose the timeout value? Small timeout may result more spurious injection and performance penalty, while large timeout may not satisfy driver expectation to high-speed device. - How to cope with existing irq sharing mechanism for PV driver domain? Existing approach between PV driver domain and dom0 is based on some trigger point, i.e, guest EOI. Keep insertion count and track guest response. Timeout mechanism is different, and I guess two paths are difficult to share logic. How about a mixed sharing case, say among dom0/PV domain/ HVM domain? - interrupt delay within HVM may be exaggerated under some special condition, if HVM is not ready to handle the injection at D.3 (like blocked in I/O emulation) while later D.4 will cancel previous injection at next timeout. Then only at next D.3 HVM gets re-injection again and it may or may not be delayed again upon status at that time. Did you run some heavy workload and observe any complains? But, anyway, I think timeout is the only way to support sharing irq case (if without MSI and we do want to allow it), though with performance issue. :-) Thanks, Kevin>From: Guy Zana >Sent: 2007年8月10日 1:46 > >We propose the following method in order to support interdomain >interrupt sharing, where one of the domains is an HVM assigned with a >pass-through device. This method is limited in a way that we can support >sharing between just two domains: dom0 and an HVM. This method is >based >on changing polarity. > >Terminology >========>Change polarity algorithm (CPA) - Algorithm when polarity inversion is >used for the EOI recognition. For details see >http://lists.xensource.com/archives/html/xen-devel/2007-05/msg01148.ht >ml >. >PLINE - Physical Line. This is the reflection of the physical line. By >changing polarity we know what is the physical line''s status. >VLINE - Virtual Line. This is the HVM virtual line. >PT Device - A pass-through PCI device assigned to the HVM. >Dom0 Device - A PCI device assigned to dom0 (by default). >Interrupt Sharing - Determined by two or more PCI devices, which their''s >intx lines are connected to the same IOAPIC''s pin (OR wired), and >assigned to different domains. >Re-occurring interrupts - The pline is held asserted while the IOAPIC >fire interrupts continuously. >Spurious interrupts - Whitin a domain context, an interrupt that passed >the ISR chain without handling. > >NOTE: A single PCI device can not be assigned to more than one >domain >simultaneously. > >When a single device is assigned to an HVM, using CPA, we update the >HVM''s VLINE according to the PLINE state (both hold the same value) >thus >providing complete reflection. It is trivial to see how more than one >device that shares the same line could be assigned to the HVM (using >the >same CPA). > >In general, we should consider the situation were N devices from Dom0 >shares the same line with M devices from HVM. There are 3 cases >possible: > >1. N=0, i.e. this line belongs to HVM devices. This case is already >solved with CPA. >2. M=0, i.e. this line belongs to Dom0 devices. This is basic dom0 >functionality. >3. N != 0, M != 0. This is the situation that we want to handle now, >from now on we''ll refer to this situation as interdomain shared >interrupt. > >Although, our method could be extended to contain handling for all of >the above cases. > >Problems related to Interdomain Interrupt Sharing >===================================>* Spurious interrupts. >* Interrupt starvation. >* When we use CPA, we are not getting re-occurring interrupts, this >should be taken into account. >* Even if a shared interrupt was handled by a domain specific ISR, it is >not guaranteed that the pline will be deasserted. >* Interrupt storming - _Physical_ storming is solved transparently by >CPA. > >Goals >====>* Letting both the HVM and DOM0 a chance to handle the interrupt >* Update the HVM''s VLINE correctly when sharing an interrupt >* Avoid spurious interrupts or at least minimize the number of such >interrupts injected into HVM. >* Stay with a reasonable interrupt latency. > >Proffered Method >============>1. We gain shared line assertion state by using CPA, at an >assert/deassert event we save the line''s state. >2. We perform most of the logic in a periodic timer module. > >Modules >=====>1. Timer module. Periodic callback that does all the logic processing. >2. XEN interrupt handler. Handler is replaced by CPA that updates >PLINE. >3. Dom0 ISR chain. At the end of the chain, we know whether the >interrupt was handled or not, and update the status in Xen using a >hypercall. > >States >====>1. Idle. The PLINE is deasserted. This is "relax state". We''re awaiting >the interrupt to come. >2. In Dom0. The interrupt is currently handled by Dom0. The event was >sent into Dom0 and Dom0 ISR is processing it. >3. Process Interrupt. The interrupt was handled by Dom0. Dom0 got >back >to us with the results of the handling. Now we need to decide what to do >next. This state can be reached only from state [2]. > >State machine >==========>The timer callback implement the state machine, it freezes when we are >in the idle state. >The "events" described below are polled by the timer. We also perform >changes in dom0''s ISR chain in order to generate these "events". > >The following events are handled: > >A. PLINE is deasserted. This event will move state machine to _Idle_ >state from any state. >This can happen in one of 2 cases: >1. Initialization. >2. As a result of PLINE deassertion. If PLINE went down, it means that >we''re done. > >B. Idle state and PLINE is asserted. In this case the interrupt is >injected into DOM0. The state machine moves to "In Dom0". We always >firstly let domain0 try to handle the interrupt, thus logically creating >an interdomain ISR chain beginning with dom0. > >C. "In Dom0" and PLINE status is asserted (We read the status from a >timer). Do nothing. We don''t know what to do with this interrupt yet. > >D. "Process Interrupt" and PLINE is asserted. >Few cases are possible: >1. If Dom0 successfully handled the last interrupt and the interrupt >wasn''t injected into the HVM, inject the interrupt into Dom0 and move to >state "In Dom0". This is the Dom0 interrupt, keep injecting into Dom0. >2. If Dom0 successfully handled the last interrupt and the interrupt was >injected into the HVM, deassert the HVM vline, and re-inject the >interrupt into Dom0. Move to state "In Dom0". >(This is done in order to solve a case where the HVM was handling the >interrupt, but the line didn''t get deasserted because a Dom0 device >asserted it before the a PT device deasserted it (as result of the HVM >handling). In this case we assume that the HVM is done with it and now >it''s Dom0''s turn.) >3. If Dom0 didn''t successfully handle the last interrupt and the >interrupt was not injected into the HVM, inject the interrupt into the >HVM and stay in the same state. This is an HVM''s interrupt. Dom0 >rejected it. >4. If Dom0 didn''t successfully handle the last interrupt and the >interrupt was injected into HVM, inject interrupt into Dom0 and move to >state "in Dom0". HVM is not done yet with current interrupt. > >E."Process Interrupt" and the PLINE is deasserted,- deassert the HVM >interrupt(if neccesary) and move to idle. We handled the interrupt. >Prepare ourselves for the new one. > >The main idea here is to inject the interrupt into Dom0 when we don''t >know what to do with it. If Dom0 takes the ownership, then let it handle >the interrupt. If not, we inject it into the HVM. We recognize that all >of the PT devices are not asserting the line by PLINE deassertion or by >Dom0 taking the ownership back to it. > >Any ideas and comments are welcome. > >Best regards, >Alex Novik, >Neocleus. > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 07:01 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote:> The main idea here is to inject the interrupt into Dom0 when we don''t > know what to do with it. If Dom0 takes the ownership, then let it handle > the interrupt. If not, we inject it into the HVM. We recognize that all > of the PT devices are not asserting the line by PLINE deassertion or by > Dom0 taking the ownership back to it.This needs dom0 kernel changes and does not solve the general sharing problem (among multiple HVM domains, or among HVM domains and PV domains other than dom0). Could you somehow track which guest is most likely to handle the interrupt, deliver to it first, and then detect the immediate re-interrupt if it EOIs without handling? Plus have a timeout if it does not EOI in reasonable time? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 07:04 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
On 10/8/07 08:01, "Keir Fraser" <keir@xensource.com> wrote:> On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote: > >> The main idea here is to inject the interrupt into Dom0 when we don''t >> know what to do with it. If Dom0 takes the ownership, then let it handle >> the interrupt. If not, we inject it into the HVM. We recognize that all >> of the PT devices are not asserting the line by PLINE deassertion or by >> Dom0 taking the ownership back to it. > > This needs dom0 kernel changes and does not solve the general sharing > problem (among multiple HVM domains, or among HVM domains and PV domains > other than dom0). Could you somehow track which guest is most likely to > handle the interrupt, deliver to it first, and then detect the immediate > re-interrupt if it EOIs without handling? Plus have a timeout if it does not > EOI in reasonable time?My thought here is a simple priority list with move-to-back of the frontmost domain when we deliver him the interrupt but he does not deassert the line either in reasonable time or by the time he EOIs the interrupt. This is simple generic logic needing no PV guest changes. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-10 07:15 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
>From: Keir raser >Sent: 2007年8月10日 15:05 > >On 10/8/07 08:01, "Keir Fraser" <keir@xensource.com> wrote: > >> On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote: >> >>> The main idea here is to inject the interrupt into Dom0 when we don''t >>> know what to do with it. If Dom0 takes the ownership, then let it >handle >>> the interrupt. If not, we inject it into the HVM. We recognize that all >>> of the PT devices are not asserting the line by PLINE deassertion or >by >>> Dom0 taking the ownership back to it. >> >> This needs dom0 kernel changes and does not solve the general >sharing >> problem (among multiple HVM domains, or among HVM domains and >PV domains >> other than dom0). Could you somehow track which guest is most likely >to >> handle the interrupt, deliver to it first, and then detect the immediate >> re-interrupt if it EOIs without handling? Plus have a timeout if it does >not >> EOI in reasonable time? > >My thought here is a simple priority list with move-to-back of the >frontmost >domain when we deliver him the interrupt but he does not deassert the >line >either in reasonable time or by the time he EOIs the interrupt. This is >simple generic logic needing no PV guest changes. > > -- Keir >How is the priority defined? What''s reasonable time for different device requirement? PV irq sharing takes response from all shared side, and Guy''s RFC only takes dom0''s response. Now your suggestion is much simpler toward timeout only, but what do you expect the final performance to be? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 07:37 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
On 10/8/07 08:15, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> My thought here is a simple priority list with move-to-back of the >> frontmost >> domain when we deliver him the interrupt but he does not deassert the >> line >> either in reasonable time or by the time he EOIs the interrupt. This is >> simple generic logic needing no PV guest changes. >> >> -- Keir > > How is the priority defined?It is defined dynamically by the move-to-back policy of the priority list.> What''s reasonable time for different device requirement?For the timeout? Actually I''m not sure how important having a timeout actually is -- unless in the worst case it can reset the PCI device and ensure the line is quiesced in that way. Otherwise a non-responsive guest is unlikely to deassert its device and hence you cannot timeout and re-enable the interrupt line anyway. I consider this to be a secondary issue in implementing shared interrupts, and can reasonably be left until later.> PV irq sharing takes response from all shared side, and Guy''s RFC > only takes dom0''s response. Now your suggestion is much simpler > toward timeout only, but what do you expect the final performance > to be?The timeout isn''t part of this method''s normal operation. The usual case will be that we deliver to just one guest -- at the front of our priority list -- and it was the correct single guest to deliver the interrupt to. In which case the list does not change, and if using the polarity-change method from Neocleus we would take the usual two interrupts per device assertion (one on +ve edge, one on -ve edge), or just one interrupt if we use the existing Xen late-EOI method or Intel''s dummy-EOI method. We take potentially two interrupts if the highest-prio domain is not the service domain for this particular interrupt. In this case we move domain to back of list and continue to deliver until the line is deasserted. Neocleus polarity-change method works really nicely here because we take no second interrupt until the physical INTx line is actually deasserted (and hence the interrupt is serviced,a nd our delivery algorithm hence terminates). Using Xen/Intel methods of EOI''ing we have to somehow detect the immediate re-interrupt on EOI (which will happen because the physical INTx line is still asserted) Worst case is where multiple devices are issuing interrupts simultaneously, of course. In this case we do truely *need* to issue the interrupt to multiple guests. This will work, but be a bit slow. I think this is true of the Neocleus algorithm too, however. In conclusion, my algorithm works well when I run through it in my head. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-10 08:02 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年8月10日 15:37 >> How is the priority defined? > >It is defined dynamically by the move-to-back policy of the priority list.Considering the sharing between high-speed device and low-speed device, simple move-to-back policy (once EOI) is not most efficient. At least we can take interrupt frequency as one factor of priority too.> >> What''s reasonable time for different device requirement? > >For the timeout? Actually I''m not sure how important having a timeout >actually is -- unless in the worst case it can reset the PCI device and >ensure the line is quiesced in that way. Otherwise a non-responsive >guest is >unlikely to deassert its device and hence you cannot timeout and >re-enable >the interrupt line anyway. I consider this to be a secondary issue in >implementing shared interrupts, and can reasonably be left until later. >Seems you are talking about a bogus case where guest is not willing to handle the injection (like driver unload) but leaves device in assertion state. Yes, for such bogus condition happen, there''s nothing to do except disabling the physical RTE. While my question is about the efficiency of timeout under different condition. Say the top of the list is HVM domain at the time, and HVM domain has vRTE masked (driver unload, or previous injection is in handle), in this case we may not want to inject now and wait same ''reasonable time'' for non-response and instead move-to-back can make effect immediately.>> PV irq sharing takes response from all shared side, and Guy''s RFC >> only takes dom0''s response. Now your suggestion is much simpler >> toward timeout only, but what do you expect the final performance >> to be? > >The timeout isn''t part of this method''s normal operation. The usual case >will be that we deliver to just one guest -- at the front of our priority >list -- and it was the correct single guest to deliver the interrupt to. InThis is hard to tell, since no clue to check whether it''s right one due to randomness of interrupt occurrence.> >Worst case is where multiple devices are issuing interrupts >simultaneously, >of course. In this case we do truely *need* to issue the interrupt to >multiple guests. This will work, but be a bit slow. I think this is true of >the Neocleus algorithm too, however. > >In conclusion, my algorithm works well when I run through it in my >head. :-) >Definitely, this is a workable approach and can be applied to both solutions. My concern is just how it behaves considering performance. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 08:16 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
On 10/8/07 09:02, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Considering the sharing between high-speed device and low-speed > device, simple move-to-back policy (once EOI) is not most efficient. > At least we can take interrupt frequency as one factor of priority too.My assumption would be that any given interrupt is due to only one device, and that in this case it is always most probable that the interrupting device is the high-speed one. Whenever a low-speed device interrupt occurs that will slow things down because we will deliver to the high-speed driver first, wait for unmask/EOI, then see the line is not deasserted, then move high-speed device to back, and re-deliver to low-speed device. Plus, on the next interrupt you will deliver to the low-speed device first even though it is most likely a high-speed device interrupt. Clearly we could be smarter here (only move-to-back after N failures, for example). I''m not convinced the extra complexity is worth it though - I think this kind of scenario is rare enough. I''d like to see a simple sharing method measured and found wanting before adding extra heuristics.> While my question is about the efficiency of timeout under different > condition. Say the top of the list is HVM domain at the time, and > HVM domain has vRTE masked (driver unload, or previous injection is > in handle), in this case we may not want to inject now and wait same > ''reasonable time'' for non-response and instead move-to-back can > make effect immediately.Okay, yes, the driver-unloaded case at least needs to be handled. But it seems to me that the timeout here could be in the hundreds of milliseconds, minimum. It should be an extremely occasional event that the timeout is needed.>> The timeout isn''t part of this method''s normal operation. The usual case >> will be that we deliver to just one guest -- at the front of our priority >> list -- and it was the correct single guest to deliver the interrupt to. In > > This is hard to tell, since no clue to check whether it''s right one due > to randomness of interrupt occurrence.Well yes. My interest here is in working well for one active device at a time (ie. Other devices are basically quiescent). Or, if there are multiple devices active at a time, only one is delivering a really significant number of interrupts. If you have multiple high-speed devices and want maximum performance, I think people know to avoid shared interrupts for those devices if possible, by shuffling PCI cards and so on. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Aug-10 08:41 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年8月10日 16:16 > >On 10/8/07 09:02, "Tian, Kevin" <kevin.tian@intel.com> wrote: >rare enough. I''d like to see a simple sharing method measured and >found >wanting before adding extra heuristics.Sure, and let''s start from simple first. Just remind for drivers with timeout check on expected interrupt delivery, that slow condition may exaggerate the complain opportunity though it''s also not solved when not sharing.> >> While my question is about the efficiency of timeout under different >> condition. Say the top of the list is HVM domain at the time, and >> HVM domain has vRTE masked (driver unload, or previous injection is >> in handle), in this case we may not want to inject now and wait same >> ''reasonable time'' for non-response and instead move-to-back can >> make effect immediately. > >Okay, yes, the driver-unloaded case at least needs to be handled. But it >seems to me that the timeout here could be in the hundreds of >milliseconds, >minimum. It should be an extremely occasional event that the timeout is >needed.I can agree with ''occasional'' but not ''extremely occasional''. :-) HVM, if in head of the list, may be in block state waiting Qemu to respond, while at same time Qemu may wait for driver (like disk r/w) and driver may wait for interrupt. In such condition, 1st injection into HVM will cause timeout anyway and only next injection can get handled after dom0 gets its interrupt. Just think that such inter-domain-dependency may make the case worse...> >>> The timeout isn''t part of this method''s normal operation. The usual >case >>> will be that we deliver to just one guest -- at the front of our priority >>> list -- and it was the correct single guest to deliver the interrupt to. In >> >> This is hard to tell, since no clue to check whether it''s right one due >> to randomness of interrupt occurrence. > >Well yes. My interest here is in working well for one active device at a >time (ie. Other devices are basically quiescent). Or, if there are multiple >devices active at a time, only one is delivering a really significant number >of interrupts. If you have multiple high-speed devices and want >maximum >performance, I think people know to avoid shared interrupts for those >devices if possible, by shuffling PCI cards and so on. >If we are clear to keep such assumption, then simplest is the best after warning to user. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 08:52 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
On 10/8/07 09:41, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Okay, yes, the driver-unloaded case at least needs to be handled. But it >> seems to me that the timeout here could be in the hundreds of >> milliseconds, >> minimum. It should be an extremely occasional event that the timeout is >> needed. > > I can agree with ''occasional'' but not ''extremely occasional''. :-) HVM, if > in head of the list, may be in block state waiting Qemu to respond, while > at same time Qemu may wait for driver (like disk r/w) and driver may > wait for interrupt. In such condition, 1st injection into HVM will cause > timeout anyway and only next injection can get handled after dom0 gets > its interrupt. Just think that such inter-domain-dependency may make > the case worse...Oh, I see. That''s another separate case to deal with. We''d attempt delivery to HVM, time out to dom0, then we would see interrupt is still asserted and... I guess we''d re-set the timeout on the HVM guest a few times, perhaps with some backoff. This case is a bit of a pain. :-( -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Aug-10 10:10 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
Thanks Kevin for all of your comments, I agree with them all. First, most the work here was done by Alex Novik, not me :) More comments below... Thanks, Guy.> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Friday, August 10, 2007 5:59 AM > To: Guy Zana; xen-devel@lists.xensource.com > Cc: Alex Novik > Subject: RE: [Xen-devel] [RFC] Pass-through Interdomain > Interrupts Sharing(HVM/Dom0) > > Hi, Guy, > Thanks for very good description. > > Basically I think this should work, but with following concerns: > > - How to choose the timeout value? > Small timeout may result more spurious injection and > performance penalty, while large timeout may not satisfy > driver expectation to high-speed device. >That''s a good point. The Spurious vs Starving is exactly opposite between the HVM and dom0. For an HVM that holds a vline, when you have a large timeout value it''ll result in more spurious interrupts since you hold the line asserted. The timeout value could be adaptive, increased (made slower) anytime it fires and it decides to do nothing and decreased anytime it take decisions. This may complicate things even further. Does the IOAPIC has a timeout value to fire an interrupt when the line is held asserted? Is using that value feasible? Freezing the timer is logically the same as masking the IOAPIC.> - How to cope with existing irq sharing mechanism for PV > driver domain? > Existing approach between PV driver domain and dom0 is > based on some trigger point, i.e, guest EOI. Keep insertion > count and track guest response. Timeout mechanism is > different, and I guess two paths are difficult to share logic. > > How about a mixed sharing case, say among dom0/PV > domain/ HVM domain?Sharing is problematic between multiple domains, at least when you have an HVM involved. I guess that it is infrequently that you''ll want to assign more than two devices sharing the same line to different domains other than dom0, I look at the M devices left to dom0 more as a nuisance. Didn''t give a lot of thought to that but you can probably allow PV domains in the shared interdomain ISR chain proposed. Injecting the interrupt to all of the PV domains & dom0 (simultaneously) and ORed their handling status result. Take actions based on that value. Sharing a line between 2 or more HVMs is much more difficult to solve.> > - interrupt delay within HVM may be exaggerated under some > special condition, if HVM is not ready to handle the > injection at D.3 (like blocked in I/O emulation) while later > D.4 will cancel previous injection at next timeout. Then only > at next D.3 HVM gets re-injection again and it may or may not > be delayed again upon status at that time.I''m not sure I understood - In a D3 -> D4 -> D3 event cycle the HVM''s vline is staying asserted. Dom0 always gets a chance to check out if the interrupt is his, but the vline stays asserted until dom0 handled it or until the pline is deasserted. The HVM will be ready when it will unmask the IOAPIC''s pin, and it''s VCPU will be executing. It doesn''t matter if you choose to assert or deassert its vline. In the meantime the timer will fire and that will create spurious interrupts in dom0 eventually. But an assumption we took is that we can''t avoid spurious interrupts and we rather get them in dom0.> > Did you run some heavy workload and observe any complains?We didn’t implement it yet :-) Thanks for the great comments! Guy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Aug-10 10:22 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: Friday, August 10, 2007 10:05 AM > To: Guy Zana; xen-devel@lists.xensource.com > Cc: Alex Novik > Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain > Interrupts Sharing (HVM/Dom0) > > My thought here is a simple priority list with move-to-back > of the frontmost domain when we deliver him the interrupt but > he does not deassert the line either in reasonable time or by > the time he EOIs the interrupt. This is simple generic logic > needing no PV guest changes. >Even if the HVM handled the interrupt successfully, it doesn''t mean that the pline will be deasserted (if another device assigned to another domain asserted it while the HVM processed the interrupt).You can''t tell whether the HVM handled the interrupt successfully or not. How this method overcome this? Btw, with the method we proposed you could add PV domains to the interdomain ISR chain, but it may not contain more than one HVM. Thanks, Guy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 11:21 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
On 10/8/07 11:22, "Guy Zana" <guy@neocleus.com> wrote:>> My thought here is a simple priority list with move-to-back >> of the frontmost domain when we deliver him the interrupt but >> he does not deassert the line either in reasonable time or by >> the time he EOIs the interrupt. This is simple generic logic >> needing no PV guest changes. > > Even if the HVM handled the interrupt successfully, it doesn''t mean that the > pline will be deasserted (if another device assigned to another domain > asserted it while the HVM processed the interrupt).You can''t tell whether the > HVM handled the interrupt successfully or not. How this method overcome this?It would cycle through the priority list, moving frontmost to back at each stage, until the line is deasserted.> Btw, with the method we proposed you could add PV domains to the interdomain > ISR chain, but it may not contain more than one HVM.Well, that kind of sucks doesn''t it. And yet your method is significantly more complicated than my approach, at least as described in your email. Simple and more general wins the day, unless your approach handles more cases or has better performance? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Aug-10 11:50 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: Friday, August 10, 2007 2:22 PM > To: Guy Zana; xen-devel@lists.xensource.com > Cc: Alex Novik > Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain > Interrupts Sharing (HVM/Dom0) > > On 10/8/07 11:22, "Guy Zana" <guy@neocleus.com> wrote: > > >> My thought here is a simple priority list with move-to-back of the > >> frontmost domain when we deliver him the interrupt but he does not > >> deassert the line either in reasonable time or by the time he EOIs > >> the interrupt. This is simple generic logic needing no PV guest > >> changes. > > > > Even if the HVM handled the interrupt successfully, it doesn''t mean > > that the pline will be deasserted (if another device assigned to > > another domain asserted it while the HVM processed the > interrupt).You > > can''t tell whether the HVM handled the interrupt > successfully or not. How this method overcome this? > > It would cycle through the priority list, moving frontmost to > back at each stage, until the line is deasserted.1. When will you deassert the HVM vline? 2. How do you avoid HVM spurious interrupts? Will you raise the line again? It is still getting complicated, and doesn''t handle all cases.> > > Btw, with the method we proposed you could add PV domains to the > > interdomain ISR chain, but it may not contain more than one HVM. > > Well, that kind of sucks doesn''t it. And yet your method is > significantly more complicated than my approach, at least as > described in your email. > Simple and more general wins the day, unless your approach > handles more cases or has better performance? >I''m really here to find the best method. In your method, you just don''t avoid HVM spurious interrupts, I think this is a _must_. The priority list is a good addition, for PV guests. If you want to avoid spurious interrupts in the HVM, the HVM must be the last in the list, which is what we did, but started simple (with dom0 and a single hvm). If you''ll tell me that HVM spurious interrupts is not that important I''ll agree to go with your method.> -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 13:18 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
On 10/8/07 12:50, "Guy Zana" <guy@neocleus.com> wrote:>> It would cycle through the priority list, moving frontmost to >> back at each stage, until the line is deasserted. > > 1. When will you deassert the HVM vline?I would turn vline assertions into pulses: the line would be asserted only instantaneously, to get latched by the VPIC/VIOAPIC. Actually I think this question is quite separate from whatever method we use for interrupt sharing: when would you deassert the vline when the interrupt is *not* shared? Whatever method we choose should be extendable to the shared case, and applied to whichever HVM guest we are currently choosing to deliver the interrupt to. So, whether the interrupt is shared or not, I see no value in modelling the state of the level-triggered vline.> 2. How do you avoid HVM spurious interrupts?I avoid most of them by the fact that a HVM guest that is not handling interrupts will get pushed down the priority list. Of course this won''t get rid of all spurious interrupts, but I''d expect it to get rid of enough (e.g., at least 50% even in some worst cases I can think of). So the question is: how sensitive is Windows to spurious interrupts? I know that Linux needs something like 99% of interrupts to be spurious for it to generate a warning. If Windows is similar then my approach would work just fine. -- Keir> Will you raise the line again? > It is still getting complicated, and doesn''t handle all cases. > >> >>> Btw, with the method we proposed you could add PV domains to the >>> interdomain ISR chain, but it may not contain more than one HVM. >> >> Well, that kind of sucks doesn''t it. And yet your method is >> significantly more complicated than my approach, at least as >> described in your email. >> Simple and more general wins the day, unless your approach >> handles more cases or has better performance? >> > > I''m really here to find the best method. > > In your method, you just don''t avoid HVM spurious interrupts, I think this is > a _must_. > The priority list is a good addition, for PV guests. If you want to avoid > spurious interrupts in the HVM, the HVM must be the last in the list, which is > what we did, but started simple (with dom0 and a single hvm). > > If you''ll tell me that HVM spurious interrupts is not that important I''ll > agree to go with your method._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Aug-10 15:51 UTC
RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
> -----Original Message----- > From: Keir Fraser [mailto:keir@xensource.com] > Sent: Friday, August 10, 2007 4:18 PM > To: Guy Zana; xen-devel@lists.xensource.com > Cc: Alex Novik > Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain > Interrupts Sharing (HVM/Dom0) > > On 10/8/07 12:50, "Guy Zana" <guy@neocleus.com> wrote: > > >> It would cycle through the priority list, moving frontmost > to back at > >> each stage, until the line is deasserted. > > > > 1. When will you deassert the HVM vline? > > I would turn vline assertions into pulses: the line would be > asserted only instantaneously, to get latched by the > VPIC/VIOAPIC. Actually I think this question is quite > separate from whatever method we use for interrupt > sharing: when would you deassert the vline when the interrupt > is *not* shared? Whatever method we choose should be > extendable to the shared case, and applied to whichever HVM > guest we are currently choosing to deliver the interrupt to. > So, whether the interrupt is shared or not, I see no value in > modelling the state of the level-triggered vline.Sounds good actually :-)> > > 2. How do you avoid HVM spurious interrupts? > > I avoid most of them by the fact that a HVM guest that is not > handling interrupts will get pushed down the priority list. > Of course this won''t get rid of all spurious interrupts, but > I''d expect it to get rid of enough (e.g., at least 50% even > in some worst cases I can think of). So the question is: how > sensitive is Windows to spurious interrupts? I know that > Linux needs something like 99% of interrupts to be spurious > for it to generate a warning. If Windows is similar then my > approach would work just fine.>From what I saw, Windows XP is not that sensitive to spurious interrupts (at least for ISA interrupts). In general, Windows tries hard to survive :-)We''ll have to check if a prioritize list will suffice, it would be simple, I agree. But you still do bad stuff and hope it''ll go unnoticed, sounds like a recipe for voodoo, it should be well tested at least. Thanks, Guy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-10 16:00 UTC
Re: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
On 10/8/07 16:51, "Guy Zana" <guy@neocleus.com> wrote:>> From what I saw, Windows XP is not that sensitive to spurious interrupts (at >> least for ISA interrupts). In general, Windows tries hard to survive :-) > We''ll have to check if a prioritize list will suffice, it would be simple, I > agree. > But you still do bad stuff and hope it''ll go unnoticed, sounds like a recipe > for voodoo, it should be well tested at least.This whole PCI passthru feature is a recipe for voodoo ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel