In debugging the sles9 port on 64 bit MP machines, I am seeing a problem where the hypervisor takes a fault in loading fs in the context switch code (load_segments()). The selector is one of the TLS selectors. It appears that the cpu in question has updated this selector with a value of 0 just prior to the problem I am seeing. Looking at the Linux context switch code, we first update the TLS selector values of the incoming context before we load the segment registers. So, if we preempt the CPU after it has modified the gdt table but before it loads up the segment registers, we could get into a situation where when the hypervisor resumes the preempted domain on this cpu, we could fault on the segment register load. I am curious to understand why this is not an issue. How are such windows closed. Regards, K. Y _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Looking more at the generic Linux CS code, saving the selector values of the outgoing context and setting the segment registers values to zero in prepare_arch_switch() we think deals with the problem I have listed below (thanks to Jan for pointing this out). While this expensive trick may solve this problem, a simpler solution perhaps might be to have an efficient mechanism for the guest to manage hypervisor preemptions. We could build this mechanism in a way that does not compromise the ability of the hypervisor to deal with buggy guests while still supporting efficient implementation of guests. This preemption management framework also would be useful in dealing with bad preemption problems in SMP guests. Would there be an interest in implementing this preemption management framework. Regards, K. Y>>> "Ky Srinivasan" <ksrinivasan@novell.com> 03/28/06 11:31 am >>>In debugging the sles9 port on 64 bit MP machines, I am seeing a problem where the hypervisor takes a fault in loading fs in the context switch code (load_segments()). The selector is one of the TLS selectors. It appears that the cpu in question has updated this selector with a value of 0 just prior to the problem I am seeing. Looking at the Linux context switch code, we first update the TLS selector values of the incoming context before we load the segment registers. So, if we preempt the CPU after it has modified the gdt table but before it loads up the segment registers, we could get into a situation where when the hypervisor resumes the preempted domain on this cpu, we could fault on the segment register load. I am curious to understand why this is not an issue. How are such windows closed. Regards, K. Y _______________________________________________ Xen- devel mailing list Xen- devel@lists.xensource.com http://lists.xensource.com/xen- devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Mar 2006, at 18:33, Ky Srinivasan wrote:> Looking more at the generic Linux CS code, saving the selector values > of the outgoing context and setting the segment registers values to > zero > in prepare_arch_switch() we think deals with the problem I have listed > below (thanks to Jan for pointing this out).That''s what our own trees do already.> While this expensive trick > may solve this problem, a simpler solution perhaps might be to have an > efficient mechanism for the guest to manage hypervisor preemptions.Why is it expensive? The updates to zero only happen if the previous selector value was non-zero, which is usually not the case for 64-bit apps. Things should work okay even without the zeroing, by the way. It just avoids an unnecessary failsafe callback into the guest kernel. I fixed the failsafe handler for x86/64 earlier today. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Mar 2006, at 18:40, Keir Fraser wrote:> Why is it expensive? The updates to zero only happen if the previous > selector value was non-zero, which is usually not the case for 64-bit > apps. > > Things should work okay even without the zeroing, by the way. It just > avoids an unnecessary failsafe callback into the guest kernel. I fixed > the failsafe handler for x86/64 earlier today.We should probably just rely on the failsafe_handler actually (assuming it now works :-) ). That ''slow path'' will be taken so infrequently it''s not worth having a special prepare_arch_switch() for Xen. It''s really a hangover from the initial port from i386. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Mar 2006, at 20:35, Keir Fraser wrote:> We should probably just rely on the failsafe_handler actually > (assuming it now works :-) ). > > That ''slow path'' will be taken so infrequently it''s not worth having a > special prepare_arch_switch() for Xen. It''s really a hangover from the > initial port from i386.Actually, we do still need to be sure to save the segment values before switching TLS/LDT, so I guess we do need most of prepare_arch_switch even on x86/64. We can''t do the work in switch_mm() since lazy tlb logic may cause it to not be executed. And switch_to() is too late. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On Tue, Mar 28, 2006 at 12:40 pm, in message<af304b148f305da9c9b21ff4622322a3@cl.cam.ac.uk>, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> On 28 Mar 2006, at 18:33, Ky Srinivasan wrote: > >> Looking more at the generic Linux CS code, saving the selectorvalues>> of the outgoing context and setting the segment registers values to>> zero >> in prepare_arch_switch() we think deals with the problem I havelisted>> below (thanks to Jan for pointing this out). > > That''s what our own trees do already. > >> While this expensive trick >> may solve this problem, a simpler solution perhaps might be to havean>> efficient mechanism for the guest to manage hypervisorpreemptions.> > Why is it expensive? The updates to zero only happen if the previous> selector value was non- zero, which is usually not the case for 64-bit> apps.The expense I was referring to is the selector loads (to zero them out). The prepare_arch_switch() is also used on the 32 bit side as well (for fs and gs).> > Things should work okay even without the zeroing, by the way. It just> avoids an unnecessary failsafe callback into the guest kernel. Ifixed> the failsafe handler for x86/64 earlier today. > > -- KeirK. Y _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kier, What are your thoughts on having a mechanism to manage hypervisor preemption from guest kernels? Regards, K. Y>>> On Tue, Mar 28, 2006 at 3:56 pm, in message<8faa95f64ad61af657bdda6b115b5fd4@cl.cam.ac.uk>, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> On 28 Mar 2006, at 20:35, Keir Fraser wrote: > >> We should probably just rely on the failsafe_handler actually >> (assuming it now works :- ) ). >> >> That ''slow path'' will be taken so infrequently it''s not worth havinga>> special prepare_arch_switch() for Xen. It''s really a hangover fromthe>> initial port from i386. > > Actually, we do still need to be sure to save the segment valuesbefore> switching TLS/LDT, so I guess we do need most of prepare_arch_switch> even on x86/64. > > We can''t do the work in switch_mm() since lazy tlb logic may cause it> to not be executed. And switch_to() is too late. > > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Mar 2006, at 01:49, Ky Srinivasan wrote:> What are your thoughts on having a mechanism to manage hypervisor > preemption from guest kernels?Potentially a scheduler activations interface would be nice for SMP guests, if it''s not too hard to modify the OS to support it. I don''t see how a preemption aware interface would improve prepare_arch_switch() though -- I already said that failsafe_handler() should pick up the slack if you remove selector zeroing from prepare_arch_switch(). You may get some noise from Xen on a debug build, but things should carry on working just fine. If I''m missing your point, you need to give some more details about what you''re proposing. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel