I''ve just realized a few days ago, when I get back to xen/plan 9, that I''m not getting interrupts after the first few. This with a very recent pull. What''s amazing that it got as far as it did, but I am processing pending interrupt stuff in spllo() so that explains a lot. What I''m not getting is the asynchronous calls to evtchn_do_upcall. The mask is zero. I''ve enabled VIRQ_TIMER. Yet I''m only getting one set of interrupts and it looks like no more. My loop for picking up events is pretty much the same as the linux loop -- I just took that code. I am clearing out evtchn_upcall_pending and evtchn_pending_sel. I am clearing the mask to 0 at the end of the interrupt. What''s a reasonable set of things to look for? I''m stumped. ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > I''ve just realized a few days ago, when I get back to xen/plan 9, that I''m > not getting interrupts after the first few. This with a very recent pull. > What''s amazing that it got as far as it did, but I am processing pending > interrupt stuff in spllo() so that explains a lot. What I''m not getting is > the asynchronous calls to evtchn_do_upcall. > > The mask is zero. I''ve enabled VIRQ_TIMER. Yet I''m only getting one set of > interrupts and it looks like no more. My loop for picking up events is > pretty much the same as the linux loop -- I just took that code. I am > clearing out evtchn_upcall_pending and evtchn_pending_sel. I am clearing > the mask to 0 at the end of the interrupt. > > What''s a reasonable set of things to look for? I''m stumped.The Linux code sets the evtchn_mask before clearing evtchn_pending, then clears the evtchn_mask after calling the interrupt handler. Are you doing the setting but forgetting the clearing? The order that Linux has for this stuff, to avoid races, is: 1. Test-and-clear evtchn_upcall_pending flag 2. Read-and-clear (XCHG) the evtchn_pending_sel 3. For each set bit @i in the sel: 4. Read evtchn_pending[@i] 5. For each set bit @j in the word: 6. Set evtchn_mask[@i*32+@j] 7. Clear evtchn_pending[@i*32+@j] 8. ....do interrupt work... 9. Clear evtchn_msk[@i*32+@j] The fact that step 2 is a real XCHG instruction is important, as it also acts as a memory barrier (not important if you''re not running on an SMP machine). Also, all your bit-munging instructions must have the LOCK prefix if you''re running on an SMP machine. Unmasking evtchn_upcall_mask and evtchn_mask[] need special attention because a pending interrupt will not automatically get raised as it would on real hardware ---- think of it like an edge-triggered interrupt where you lost the edge because the line was masked. So what we do in Linux is: Clearing evtchn_upcall_mask: 1. Clear evtchn_upcall_mask 2. Barrier [just a compiler barrier, not a CPU barrier] 3. If ( evtchn_upcall_pending) do_evtchn_processing() Clearing evtchn_mask[]: A bit more involved; see unmask_evtchn() in include/asm-xen/evtchn.h Sticking close to the Linux code, and making sure the underlying bitops are LOCKed, is important! I guess yours is unlikely to be a subtle race if you only ever receive precisly one VIRQ. :-) -- Keir ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, 9 Aug 2004, Keir Fraser wrote:> > The Linux code sets the evtchn_mask before clearing evtchn_pending, > then clears the evtchn_mask after calling the interrupt handler. Are > you doing the setting but forgetting the clearing?sadly, no.> > The order that Linux has for this stuff, to avoid races, is: > > 1. Test-and-clear evtchn_upcall_pending flag > 2. Read-and-clear (XCHG) the evtchn_pending_sel > 3. For each set bit @i in the sel: > 4. Read evtchn_pending[@i] > 5. For each set bit @j in the word: > 6. Set evtchn_mask[@i*32+@j] > 7. Clear evtchn_pending[@i*32+@j] > 8. ....do interrupt work... > 9. Clear evtchn_msk[@i*32+@j]yeah, I''m actually using that code. oh well. ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, 9 Aug 2004, Keir Fraser wrote:> Unmasking evtchn_upcall_mask and evtchn_mask[] need special attention > because a pending interrupt will not automatically get raised as it > would on real hardware ---- think of it like an edge-triggered > interrupt where you lost the edge because the line was masked. So what > we do in Linux is:in any event, if clock interrupts are enabled, shouldn''t I get an interrupt 500 ms later? Or am I misreading the code. ron -- LANL CCS-1 email flavor: ***** Correspondence [X] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [ ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> On Mon, 9 Aug 2004, Keir Fraser wrote: > > > Unmasking evtchn_upcall_mask and evtchn_mask[] need special attention > > because a pending interrupt will not automatically get raised as it > > would on real hardware ---- think of it like an edge-triggered > > interrupt where you lost the edge because the line was masked. So what > > we do in Linux is: > > in any event, if clock interrupts are enabled, shouldn''t I get an > interrupt 500 ms later? Or am I misreading the code.Yes, you''ll get the interrupt sometime later, the next time Xen is returning execution to you. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, 9 Aug 2004, Keir Fraser wrote:> > in any event, if clock interrupts are enabled, shouldn''t I get an > > interrupt 500 ms later? Or am I misreading the code. > > Yes, you''ll get the interrupt sometime later, the next time Xen is > returning execution to you.So that''s the really weird problem. I''m not worried at this point that they are delayed: the mask is 0 and they''re not happening at all. Anyway, I''ve got a way to dump some info, so more later . ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
OK, some more data. in a timer interrupt (first and only) I see this: xentimerread: hz 0 cpu_hz 598736060 shadow_system_time 333ac5398180 settin timer to 0x333bfecedc00 So the time is 333ac5398180, I set the timer to 0x333bfecedc00 via HYPERVISOR_set_timer_op. I''m assuming that once the system time gets to 0x333bfecedc00 then I''ll get an interrupt. Now we wait ... in this loop, I''m print time and the values of the mask, the pending, and the pending_sel values in shared info: ipending: @0x333bfe262600 islo 0x0, pending 0x0, pending_sel 0x0 ipending: @0x333bfebebc80 islo 0x0, pending 0x0, pending_sel 0x0 ipending: @0x333bff575300 islo 0x0, pending 0x0, pending_sel 0x0 ipending: @0x333bffefe980 islo 0x0, pending 0x0, pending_sel 0x0 ipending: @0x333c00888000 islo 0x0, pending 0x0, pending_sel 0x0 note that there is no changing in pending as we cross the time for an interrupt to have happened. I would expect to see pending to get set to some non-zero value once the system time had passed 0x333bfecedc00. My reading of HYPERVISOR_set_timer_op is that you set an absolute value, and when the time is > than that value, you get called with an interrupt; is that wrong? ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
This is correct, and you should take an interrupt every 10ms in any case. Interesting values to print are the master mask, master pending, pending_sel, and the words in the pend and mask arrays that contain the bits for the event channel that you are interested in. (Remember that the event channel will have a different index to the VIRQ number). -- Keir> OK, some more data. > > in a timer interrupt (first and only) I see this: > xentimerread: hz 0 cpu_hz 598736060 shadow_system_time 333ac5398180 > settin timer to 0x333bfecedc00 > > So the time is 333ac5398180, I set the timer to 0x333bfecedc00 via > HYPERVISOR_set_timer_op. I''m assuming that once the system time gets to > 0x333bfecedc00 then I''ll get an interrupt. > > Now we wait ... in this loop, I''m print time and the values of the mask, > the pending, and the pending_sel values in shared info: > > ipending: @0x333bfe262600 islo 0x0, pending 0x0, pending_sel 0x0 > ipending: @0x333bfebebc80 islo 0x0, pending 0x0, pending_sel 0x0 > ipending: @0x333bff575300 islo 0x0, pending 0x0, pending_sel 0x0 > ipending: @0x333bffefe980 islo 0x0, pending 0x0, pending_sel 0x0 > ipending: @0x333c00888000 islo 0x0, pending 0x0, pending_sel 0x0 > > note that there is no changing in pending as we cross the time for an > interrupt to have happened. I would expect to see pending to get set to > some non-zero value once the system time had passed 0x333bfecedc00. > > My reading of HYPERVISOR_set_timer_op is that you set an absolute value, > and when the time is > than that value, you get called with an interrupt; > is that wrong? > > ron > > -- > LANL CCS-1 email flavor: > ***** Correspondence [] > ***** DUSA LACSI-HW [ ] > ***** DUSA LACSI-OS [x ] > ***** DUSA LACSI-CS [ ] > >------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
thanks again, you cleared it up for me; pilot error. I''m amazed it got this far, as in ''How did this *EVER* work". Oh well, off to work, fix it tonight. ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, 9 Aug 2004, Keir Fraser wrote:> Interesting values to print are the master mask, master pending, > pending_sel, and the words in the pend and mask arrays that contain > the bits for the event channel that you are interested in. > (Remember that the event channel will have a different index to the > VIRQ number).I understand my confusion better. cli on linux on xen is this: HYPERVISOR_shared_info->vcpu_data[0].evtchn_upcall_mask = 1; That disables all interrupts? I''m confused on that, how does this relate to the evtchn_mask? For cpu 0 do I need to clear BOTH of these for interrupts to happen, and then in the domain itself only mess with the one for vcpu 0? I''m looking at linux U kernel code but want to make sure I get this right. Is this stuff really firmly tested and laid out or still somewhat tentative due to the fact that it''s not really tested with vcpu > 0? thanks ron -- LANL CCS-1 email flavor: ***** Correspondence [X] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [ ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
One last note: I am (weirdly) getting further, a proc is running and asking for input. But I still get to this weird state: global irupt mask is 0xfffffff8, global pending is 7, pending_sel is 0, vcpu_data[0].mask is 0, and vcpu_data[0].evtchn_upcall_pending is 0. Seems to me I should be taking some async upcalls. ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> One last note: > > I am (weirdly) getting further, a proc is running and asking for input. > > But I still get to this weird state: > global irupt mask is 0xfffffff8, global pending is 7, pending_sel is 0, > vcpu_data[0].mask is 0, and vcpu_data[0].evtchn_upcall_pending is 0. > > Seems to me I should be taking some async upcalls.What''s ''global irupt mask'' and ''global pending''? You won''t get an sync upcall until evtchn_upcall_pending becomes non-zero. That won''t occur until one of the bits in pending_sel becomes set. Which, in turn, won''t occur until an event-channel has its bit set in the evtchn_pend[] array, which doesn''t have its evtchn_mask[] array bit set. -- Keir ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> I understand my confusion better. > > cli on linux on xen is this: > HYPERVISOR_shared_info->vcpu_data[0].evtchn_upcall_mask = 1; > > That disables all interrupts? I''m confused on that, how does this relate > to the evtchn_mask?The purpose of the evtchn_mask[] array is to disallow callbacks at per-channel granularity. e.g., for scheduling purposes. The evtchn_upcall_mask is intended to disallow callbacks in general, where your OS is in a state that it cannot handle them. i.e., it''s to allow easy reeentrancy control in your OS.> For cpu 0 do I need to clear BOTH of these for interrupts to happen, and > then in the domain itself only mess with the one for vcpu 0? I''m looking > at linux U kernel code but want to make sure I get this right. Is this > stuff really firmly tested and laid out or still somewhat tentative due to > the fact that it''s not really tested with vcpu > 0?For a particular event channel @e to fire you an async callback, you need: 1. The original value of bit @e in evtchn_pending[] must be zero. 2. The value of bit @e in evtchn_mask[] must be zero. 3. The original value of bit (@e>>5) in evtchn_pending_sel must be zero. 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. If these 5 requirements are satisfied then you _will_ receive a callback. -- Keir ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I''ve got a question about this code. static inline void evtchn_set_pending(struct domain *d, int port) { shared_info_t *s = d->shared_info; if ( !test_and_set_bit(port, &s->evtchn_pending[0]) && !test_bit (port, &s->evtchn_mask[0]) && !test_and_set_bit(port>>5, &s->evtchn_pending_sel) ) { /* The VCPU pending flag must be set /after/ update to evtchn-pend. */ s->vcpu_data[0].evtchn_upcall_pending = 1; guest_async_callback(d); } } So you''ll get an upcall IFF: the bit ''port'' in evtchn_pending WAS 0, the bit ''port'' in the mask IS 0, and the bit ''port >> 5'' in the evtchn_pending_sel WAS 0. OK, here''s my question: suppose the first test_and_set_bit fails because the bit in evtchn_pending[0] was already set? You''ll never get called, that''s what, as far as I can tell. And this is exactly what I''m seeing. I''ve got bits 0,1,2 set in evtchn_pending, but the guest_async_callback is never happening, since the test_and_set_bit returns 1. I''m missing an interrupt, due to a plethora of debug prints in my kernel, and I''m not seeing another one. To me, it looks like I''m exercising a race condition in this function shown above. Here is my question: why isn''t this code something like: static inline void evtchn_set_pending(struct domain *d, int port) { shared_info_t *s = d->shared_info; set_bit(port, &s->evtchn_pending[0]); if ( !test_bit (port, &s->evtchn_mask[0]) && !test_and_set_bit(port>>5, &s->evtchn_pending_sel) ) { /* The VCPU pending flag must be set /after/ update to evtchn-pend. */ s->vcpu_data[0].evtchn_upcall_pending = 1; guest_async_callback(d); } } In other words, I don''t see the reason for the first test_and_set_bit, given that the bit may have been set by an earlier call to evtchn_set_pending, masked by the mask, and then the next time you call the first test_and_set_bit will fail. So, what''s the reason for that first TAS? thanks ron -- LANL CCS-1 email flavor: ***** Correspondence [X] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [ ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tue, 10 Aug 2004, Keir Fraser wrote:> For a particular event channel @e to fire you an async callback, you > need: > 1. The original value of bit @e in evtchn_pending[] must be zero. > 2. The value of bit @e in evtchn_mask[] must be zero.Is there a race condition here? Let''s pretend this is the 10ms interrupt @e gets set in evtchn_pending @e in evtchn_mask is set (not zero) because it is still masked as Plan is still in the interrupt printing lots of info for me.>From my reading of this, if your interrupt handler takes more than 10 ms,then you''ll never get another timer interrupt @e, since the evtchn_pending is now 1. Is there any way in which a callback for timer ints will occur if @e in evtchn_pending was set and the mask is now zero? I can''t see it.> 3. The original value of bit (@e>>5) in evtchn_pending_sel must be zero.OK, this I can see.> 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero.OK, this I don''t totally see. From the code I posted before, it seems to me only the first three conditions matter. Thanks! ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Further thinking about the whole interrupt thing. It seems to me that all the interrupts are edge-triggered, see: * 2. MASK -- if this bit is clear then a 0->1 transition of PENDING * will cause an asynchronous upcall to be scheduled. This bit is only * updated by the guest. It is read-only within Xen. If a channel * becomes pending while the channel is masked then the ''edge'' is lost * (i.e., when the channel is unmasked, the guest must manually handle * pending notifications as no upcall will be scheduled by Xen). But what we want in some cases (timer in particular) are level interrupts. So this code: static inline void evtchn_set_pending(struct domain *d, int port) { shared_info_t *s = d->shared_info; if ( !test_and_set_bit(port, &s->evtchn_pending[0]) && !test_bit (port, &s->evtchn_mask[0]) && !test_and_set_bit(port>>5, &s->evtchn_pending_sel) ) etc. is really testing for edges (which is fine) but in some cases we really do want a level. Sadly this does complicate life but at the same time I''d argue that VIRQ_TIMER should be a level interrupt. I can''t see any way out of this race condition otherwise. Does this make sense or am I totally off base? I do think the comment above (from hypervisor-if.h) very clearly explains the potential race condition. I''ve fallen into it in a big way, but I think it is a problem others may fall into as well. I''m going to add a trivial function evtchn_set_pending_level and call it out of send_guest_virq and see if it helps my problem. My guess is it will. Thanks for your patience on this one, Keir. ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Sadly this does complicate life but at the same time I''d argue that > VIRQ_TIMER should be a level interrupt. I can''t see any way out of this > race condition otherwise. > > Does this make sense or am I totally off base? I do think the comment > above (from hypervisor-if.h) very clearly explains the potential race > condition. I''ve fallen into it in a big way, but I think it is a problem > others may fall into as well. > > I''m going to add a trivial function evtchn_set_pending_level and call it > out of send_guest_virq and see if it helps my problem. My guess is it > will.If you use this function (from evtchn.h) to unmask individual event channels then you will not experience the race. Changes to Xen are not required: [NB. sync_*_bit forces uses of SMP-safe atomic bit operations (ie., on x86 they will use the LOCK prefix). I need to deliberately specify this because the guest OS is UP, and so it''s usual *_bit operations are not SMP-safe! You may want to watch out for this one yourself.] static inline void unmask_evtchn(int port) { shared_info_t *s = HYPERVISOR_shared_info; synch_clear_bit(port, &s->evtchn_mask[0]); /* * The following is basically the equivalent of ''hw_resend_irq''. Just * like a real IO-APIC we ''lose the interrupt edge'' if the channel is * masked. */ if ( synch_test_bit (port, &s->evtchn_pending[0]) && !synch_test_and_set_bit(port>>5, &s->evtchn_pending_sel) ) { s->vcpu_data[0].evtchn_upcall_pending = 1; if ( !s->vcpu_data[0].evtchn_upcall_mask ) force_evtchn_callback(); } } ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Mon, Aug 09, 2004 at 09:24:44PM -0600, ron minnich wrote:> > For a particular event channel @e to fire you an async callback, you > > need: > > 1. The original value of bit @e in evtchn_pending[] must be zero. > > 2. The value of bit @e in evtchn_mask[] must be zero. > > Is there a race condition here? Let''s pretend this is the 10ms interrupt > @e gets set in evtchn_pending > @e in evtchn_mask is set (not zero) because it is still masked > as Plan is still in the interrupt printing lots of info > for me.You have to check for a pending interrupt when you unmask an interrupt. This can be done atomically by disabling interrupts with evtchn_upcall_mask.> > 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. > > OK, this I don''t totally see. From the code I posted before, it seems to > me only the first three conditions matter.4 is not true, we don''t test it and set it unconditionally. christian ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. > > OK, this I don''t totally see. From the code I posted before, it seems to > me only the first three conditions matter.The first three conditions cause us to decide whether or not to schedule the target domain, sending a cross-cpu interrupt if necessary. The final two are checks just before calling back to the guest OS, just to check whether it is in a position to receive async callbacks. The final two are only ever accessed on the CPU that is running the guest, which is why we can access/update them using non-atomic operations and compiler barriers (rather than atomic ops and CPU barriers). -- Keir> Thanks! > > ron > > -- > LANL CCS-1 email flavor: > ***** Correspondence [] > ***** DUSA LACSI-HW [ ] > ***** DUSA LACSI-OS [x ] > ***** DUSA LACSI-CS [ ] > >------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tue, 10 Aug 2004, Keir Fraser wrote:> > > > 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > > > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. > > > > OK, this I don''t totally see. From the code I posted before, it seems to > > me only the first three conditions matter. > > The first three conditions cause us to decide whether or not to > schedule the target domain, sending a cross-cpu interrupt if > necessary. The final two are checks just before calling back to the > guest OS, just to check whether it is in a position to receive async > callbacks.Keir, I don''t see that in the code and Christian sent a note that left me thinking it does not work that way. as Christian said, (4) doesn''t do anything conditional, it does this: /* The VCPU pending flag must be set /after/ update to evtchn-pend. */ s->vcpu_data[0].evtchn_upcall_pending = 1; guest_async_callback(d); which looks pretty unconditional to me. Is there something else I''m missing? thanks ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [x ] ***** DUSA LACSI-OS [ ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> On Tue, 10 Aug 2004, Keir Fraser wrote: > > > > > > > 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > > > > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. > > > > > > OK, this I don''t totally see. From the code I posted before, it seems to > > > me only the first three conditions matter. > > > > The first three conditions cause us to decide whether or not to > > schedule the target domain, sending a cross-cpu interrupt if > > necessary. The final two are checks just before calling back to the > > guest OS, just to check whether it is in a position to receive async > > callbacks. > > Keir, I don''t see that in the code and Christian sent a note that left me > thinking it does not work that way. > > as Christian said, (4) doesn''t do anything conditional, it does this: > /* The VCPU pending flag must be set /after/ update to > evtchn-pend. */ > s->vcpu_data[0].evtchn_upcall_pending = 1; > guest_async_callback(d); > > which looks pretty unconditional to me. Is there something else I''m > missing?No, I''d forgotten how the code worked -- Christian is correct. evtchn_upcall_pending is set unconditionally on the CPU that is transmitting the event. evtchn_upcall_mask is only checked immediately before return to your guest OS to determine whether or not to create an async callback frame. -- Keir ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Tue, Aug 10, 2004 at 02:35:08PM +0100, Keir Fraser wrote:> > On Tue, 10 Aug 2004, Keir Fraser wrote: > > > > > > > > > > 4. The original value of vcpu_data[0].evtchn_upcall_pending must be zero. > > > > > 5. The value of vcpu_data[0].evtchn_upcall_mask must be zero. > > > > > > > > OK, this I don''t totally see. From the code I posted before, it seems to > > > > me only the first three conditions matter. > > > > > > The first three conditions cause us to decide whether or not to > > > schedule the target domain, sending a cross-cpu interrupt if > > > necessary. The final two are checks just before calling back to the > > > guest OS, just to check whether it is in a position to receive async > > > callbacks. > > > > Keir, I don''t see that in the code and Christian sent a note that left me > > thinking it does not work that way. > > > > as Christian said, (4) doesn''t do anything conditional, it does this: > > /* The VCPU pending flag must be set /after/ update to > > evtchn-pend. */ > > s->vcpu_data[0].evtchn_upcall_pending = 1; > > guest_async_callback(d); > > > > which looks pretty unconditional to me. Is there something else I''m > > missing? > > No, I''d forgotten how the code worked -- Christian is correct. > evtchn_upcall_pending is set unconditionally on the CPU that is > transmitting the event. evtchn_upcall_mask is only checked immediately > before return to your guest OS to determine whether or not to create > an async callback frame.For completeness sake: evtchn_upcall_pending is also checked immediately before return to your guest OS to determine whether or not to create an async callback frame. Unlike evtchn_upcall_mask evtchn_upcall_pending needs to be set while evtchn_upcall_mask needs to be clear. See /*test_guest_events:*/ in xen/arch/x86/x86_32/entry.S christian ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
it looks like I''m falling into the gap between how linux does interrupts and how Plan 9 does them. I''m still seeing the race condition, even with the suggestions you posted in place in my interrupt handler. I''ll try to work this a bit more but it does seem to me that if you''re taking too long in your interrupt handler you will get bit by the race -- Xen will I assume pre-empt dom1 in the event of a clock interrupt, and at that point it is game over. I think long term a "level interrupt" construct may be needed, but that is conjecture on my part. Trying to latch levels in a UP is a lot different than latching them in hardware, since the hardware is more or less running all the time, but the Xen/Dom0/Dom1 is all time shared. Looking at the race sequence I think I''m seeing I''m still not convinced it is completely avoidable, but I will hope I am wrong. thanks for the good suggestions! ron -- LANL CCS-1 email flavor: ***** Correspondence [] ***** DUSA LACSI-HW [ ] ***** DUSA LACSI-OS [x ] ***** DUSA LACSI-CS [ ] ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel