I recently sent off a fix for lazy vmalloc faults which can happen under paravirt when lazy mode is enabled. Unfortunately, I jumped the gun a bit on fixing this. I neglected to notice that since the new call to flush the MMU update queue is called from the page fault handler, it can be pre-empted. Both VMI and Xen use per-cpu variables to track lazy mode state, as all previous calls to set, disable, or flush lazy mode happened from a non-preemptable state. I have no idea how to convincingly produce the problem, as generating a kernel pre-emption at the required point is, um, difficult, but it is most certainly a real possibility, and potentially more likely than the bug I fixed originally. Rusty, you may have to modify lguest code if you use lazy mode and rely on per-cpu variables during the callout for paravirt_ops.set_lazy_mode. I have tested as best as I can, and am trying to write a suite destined for LTP which will help catch and debug these issues. Zach -------------- next part -------------- A non-text attachment was scrubbed... Name: i386-paravirt-preempt-fix.patch Type: text/x-patch Size: 2256 bytes Desc: not available Url : http://lists.linux-foundation.org/pipermail/virtualization/attachments/20070823/991afcb7/i386-paravirt-preempt-fix.bin
Zachary Amsden wrote:> I recently sent off a fix for lazy vmalloc faults which can happen > under paravirt when lazy mode is enabled. Unfortunately, I jumped the > gun a bit on fixing this. I neglected to notice that since the new > call to flush the MMU update queue is called from the page fault > handler, it can be pre-empted. Both VMI and Xen use per-cpu variables > to track lazy mode state, as all previous calls to set, disable, or > flush lazy mode happened from a non-preemptable state.Hm. Doing any kind of lazy-state operation with preemption enabled is fundamentally meaningless. How does it get into a preemptable state with a lazy mode enabled now? If a sequence of code with preempt disabled touches a missing vmalloc mapping, it gets a fault to fix up the mapping, and the fault handler can end up preempting the thread? That sounds like a larger bug than just paravirt lazy mode problems. J
Jeremy Fitzhardinge wrote:> Hm. Doing any kind of lazy-state operation with preemption enabled is > fundamentally meaningless. How does it get into a preemptable state >Agree 100%. It is the lazy mode flush that might happen when preempt is enabled, but lazy mode is disabled. In that case, the code relies on per-cpu variables, which is a bad thing to do in preemtible code. This can happen in the current code path. Thinking slightly deeper about it, it might be the case that there is no bug, because the local lazy mode variables are only _modified_ in the preemptible state, and guaranteed to be zero in the non-preemtible state; but it was not clear to me that this is always the case, and I got very nervous about reading per-cpu variables with preempt enabled. It would, in any case, fire a BUG_ON in the Xen code, which I did fix. Do you agree it is better to be safe than sorry in this case? The kind of bugs introduced by getting this wrong are really hard to find, and I would rather err on the side of an extra increment and decrement of preempt_count that causing a regression. Zach
Zachary Amsden wrote:> Do you agree it is better to be safe than sorry in this case? The > kind of bugs introduced by getting this wrong are really hard to find, > and I would rather err on the side of an extra increment and decrement > of preempt_count that causing a regression.I think this patch is the direction we should go. I this this would work equally well for the other pv implementations; it would probably go into the common lazy mode logic when we get around to doing it. J diff -r b3fcc228c531 arch/i386/xen/enlighten.c --- a/arch/i386/xen/enlighten.c Mon Aug 20 14:20:15 2007 -0700 +++ b/arch/i386/xen/enlighten.c Mon Aug 27 13:40:24 2007 -0700 @@ -250,6 +250,9 @@ static void xen_halt(void) static void xen_set_lazy_mode(enum paravirt_lazy_mode mode) { + if (preemptible() && mode == PARAVIRT_LAZY_FLUSH) + return; /* nothing to flush with preempt on */ + BUG_ON(preemptible()); switch (mode) {
On Thu, 2007-08-23 at 22:46 -0700, Zachary Amsden wrote:> I recently sent off a fix for lazy vmalloc faults which can happen under > paravirt when lazy mode is enabled. Unfortunately, I jumped the gun a > bit on fixing this. I neglected to notice that since the new call to > flush the MMU update queue is called from the page fault handler, it can > be pre-empted. Both VMI and Xen use per-cpu variables to track lazy > mode state, as all previous calls to set, disable, or flush lazy mode > happened from a non-preemptable state.Hi Zach, I don't think this patch does anything. The flush is because we want the just-completed "set_pte" to have immediate effect, so if preempt is enabled we're already screwed because we can be moved between set_pte and the arch_flush_lazy_mmu_mode() call. Now, where's the problem caller? By my reading or rc4, vmalloc faults are fixed up before enabling interrupts. Confused, Rusty.
Andi Kleen wrote:>> ; it does >> seem perfectly acceptable though for the mm code to use kmap or vmap >> (not kmap_atomic) internally somewhere in the pagetable code. >> > > i386 does it all the time for highmem pagetables in fact. >Yes, it uses kmap_atomic(_pte) all the time. Is that what you meant? J