PowerPC''s timer interrupt (called the decrementer) is a one-shot timer, not periodic. When it goes off, entering the hypervisor, we first set it very high so it won''t interrupt hypervisor code, then raise_softirq(TIMER_SOFTIRQ). We know that timer_softirq_action() will then call reprogam_timer(), which will reprogram the decrementer to the appropriate value. We recently uncovered a bug on PowerPC where if a timer tick arrives just inside schedule() while interrupts are still enabled, the decrementer is never reprogrammed to that appropriate value. This is because once inside schedule(), we never handle any subsequent softirqs: we call context_switch() and resume the guest. I believe the tick problem affects periodic timers (i.e. x86) as well, though less drastically. With a CPU-bound guest, it would result in dropped ticks: TIMER_SOFTIRQ is set and not handled, and when the timer expires again it is re-set. In other cases, it would result in some timer ticks being delivered very late. I don''t know what effect this might have on guests, perhaps with sensitive time-slewing code. In addition, when SCHEDULE_SOFTIRQ is set, all "greater" softirqs (including NMI) will not be handled until the next hypervisor invocation. This is pretty anti-social behavior for a softirq handler. One solution would be to have schedule() *not* call context_switch() directly, but rather set a flag (or a "next vcpu" pointer) and return. That would allow other softirqs to be processed normally. Once do_softirq() returns to assembly, we can check the "next vcpu" pointer and call context_switch(). (This solution would enable a PowerPC optimization as well: we would like to lazily save non-volatile registers. We can''t do this unless the exception handler regains control from do_softirq() before context_switch() is called.) Thoughts? -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 17:27, "Hollis Blanchard" <hollisb@us.ibm.com> wrote:> We recently uncovered a bug on PowerPC where if a timer tick arrives > just inside schedule() while interrupts are still enabled, the > decrementer is never reprogrammed to that appropriate value. This is > because once inside schedule(), we never handle any subsequent softirqs: > we call context_switch() and resume the guest.Easily fixed. You need to handle softirqs in the exit path to guest context. You need to do this final check with interrupts disabled to avoid races. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2006-12-15 at 17:36 +0000, Keir Fraser wrote:> On 15/12/06 17:27, "Hollis Blanchard" <hollisb@us.ibm.com> wrote: > > > We recently uncovered a bug on PowerPC where if a timer tick arrives > > just inside schedule() while interrupts are still enabled, the > > decrementer is never reprogrammed to that appropriate value. This is > > because once inside schedule(), we never handle any subsequent softirqs: > > we call context_switch() and resume the guest. > > Easily fixed. You need to handle softirqs in the exit path to guest context. > You need to do this final check with interrupts disabled to avoid races.Ah OK, I see now how x86 is doing that. I don''t think that code flow really makes sense: why would you jump out of do_softirq() into assembly just to call do_softirq() again? Also, that doesn''t solve the lazy register saving problem. However, I think I see how we can implement our desired context_switch() scheme in arch-specific code. The context_switch() call in schedule() will return, so please don''t add a BUG() after that. :) -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 19:09, "Hollis Blanchard" <hollisb@us.ibm.com> wrote:> Ah OK, I see now how x86 is doing that. I don''t think that code flow > really makes sense: why would you jump out of do_softirq() into assembly > just to call do_softirq() again?Well, that''s the way it works out on x86. It is a bit odd, but it works and is unlikely to affect performance. I think returning from schedule() would have its own problems (e.g., context switch from idle domain to guest would return to the idle loop, which we''d need explicit code to bail from, etc).> Also, that doesn''t solve the lazy register saving problem.I assume this is a PPC-specific issue?> However, I think I see how we can implement our desired context_switch() > scheme in arch-specific code. The context_switch() call in schedule() > will return, so please don''t add a BUG() after that. :)We already support this mode of operation for IA64 which always returns from schedule(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2006-12-15 at 20:00 +0000, Keir Fraser wrote:> > > Also, that doesn''t solve the lazy register saving problem. > > I assume this is a PPC-specific issue?It''s an issue with any architecture with a large number of registers which aren''t automatically saved by hardware (and a C ABI that makes some of them non-volatile). x86 has a small number of registers. ia64 automatically saves them (from what I understand). So of the currently-supported architectures, yes, that leaves PowerPC. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 20:41, "Hollis Blanchard" <hollisb@us.ibm.com> wrote:> It''s an issue with any architecture with a large number of registers > which aren''t automatically saved by hardware (and a C ABI that makes > some of them non-volatile). > > x86 has a small number of registers. ia64 automatically saves them (from > what I understand). So of the currently-supported architectures, yes, > that leaves PowerPC.I see. It sounds like returning from context_switch() is perhaps the right thing for powerpc. That would be easier if you have per-cpu stacks (like ia64). If not there are issues in saving register state later (and hence delaying your call to context_saved()) as there are calls to do_softirq() outside your asm code (well, not many, but there is one in domain.c for example) where you won''t end up executing your do_softirq() wrapper. In general we''d like to reserve the right to include voluntary yield points, and that won''t mix well with lazy register saves and per-physical-cpu stacks. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2006-12-15 at 21:39 +0000, Keir Fraser wrote:> On 15/12/06 20:41, "Hollis Blanchard" <hollisb@us.ibm.com> wrote: > > > It''s an issue with any architecture with a large number of registers > > which aren''t automatically saved by hardware (and a C ABI that makes > > some of them non-volatile). > > > > x86 has a small number of registers. ia64 automatically saves them (from > > what I understand). So of the currently-supported architectures, yes, > > that leaves PowerPC. > > I see. It sounds like returning from context_switch() is perhaps the right > thing for powerpc. That would be easier if you have per-cpu stacks (like > ia64).Yup, we have per-cpu stacks.> If not there are issues in saving register state later (and hence > delaying your call to context_saved()) as there are calls to do_softirq() > outside your asm code (well, not many, but there is one in domain.c for > example) where you won''t end up executing your do_softirq() wrapper. In > general we''d like to reserve the right to include voluntary yield points, > and that won''t mix well with lazy register saves and per-physical-cpu > stacks.Oh, we have per-physical-cpu stacks. We can do that because there''s no such thing as a "hypervisor thread" which could block in hypervisor space and need to be restored later. Are you saying in the future you want to have hypervisor threads, and so we''ll need per-VIRTUAL-cpu stacks? -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel