Jan Beulich
2008-Jun-16 10:32 UTC
[Xen-devel] x86_32: spurious page faults in guest GDT area
While under long-during stress I can reproduce this issue back to at least c/s 16084, in older change sets it was apparently so rare that during normal work/testing I never noticed it or had to ignore it due to not being re-creatable. However, on recent change sets (tested with our 2.6.25- based kernels only so far) it happens much more frequently (and occasionally even while the machine boots). I inserted selector validation code in the context switch path to verify that a vcpu''s selectors are okay (or better, that the guest-provided part of the GDT is accessible). These checks never indicated a failure so far. The faults may happen in various places (hypervisor exit path as well as guest code), and always involve loading a selector register with a guest defined value (i.e. in the first page of the GDT). A page walk in the (hypervisor) fault handler shows that all levels of the translation exist (and are valid/consistent), and instrumentation of the selector manipulation functions shows that none of them get called spuriously. Hence I can only suspect some asynchronous page table manipulation (but I''m not aware of anything like that) lacking proper TLB flushing, or some very rare issue with the CR3 reloading code. The same 32-bit kernel used with a 64-bit hypervisor so far did not show similar problems - while I first thought this would help narrow the problem, I''m pretty clueless at this point because the candidate areas where 32-bit code is different from 64-bit all don''t look troublesome to me (most notably TLB flushing is identical between the two). Any ideas on how to narrow the problem would be appreciated. Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Jun-16 10:41 UTC
Re: [Xen-devel] x86_32: spurious page faults in guest GDT area
What''s the #PF error code -- is it a not-present or an access-violation fault; read/write access; etc? Do these faults happen under stable workload (by which I mean no domains being created/destroyed -- all VMs are booted and just running normal kinds of stuff)? -- Keir On 16/6/08 11:32, "Jan Beulich" <jbeulich@novell.com> wrote:> While under long-during stress I can reproduce this issue back to at least > c/s 16084, in older change sets it was apparently so rare that during > normal work/testing I never noticed it or had to ignore it due to not being > re-creatable. However, on recent change sets (tested with our 2.6.25- > based kernels only so far) it happens much more frequently (and > occasionally even while the machine boots). > > I inserted selector validation code in the context switch path to verify > that a vcpu''s selectors are okay (or better, that the guest-provided > part of the GDT is accessible). These checks never indicated a failure > so far. > > The faults may happen in various places (hypervisor exit path as well > as guest code), and always involve loading a selector register with a > guest defined value (i.e. in the first page of the GDT). A page walk > in the (hypervisor) fault handler shows that all levels of the translation > exist (and are valid/consistent), and instrumentation of the selector > manipulation functions shows that none of them get called spuriously. > > Hence I can only suspect some asynchronous page table manipulation > (but I''m not aware of anything like that) lacking proper TLB flushing, or > some very rare issue with the CR3 reloading code. > > The same 32-bit kernel used with a 64-bit hypervisor so far did not > show similar problems - while I first thought this would help narrow > the problem, I''m pretty clueless at this point because the candidate > areas where 32-bit code is different from 64-bit all don''t look > troublesome to me (most notably TLB flushing is identical between > the two). > > Any ideas on how to narrow the problem would be appreciated. > Thanks, Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel