Recently I ran some experiments on newer hardware and realized that when booting any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail to bring up any APs (message about CPU Stuck). I was able to normally bisect into a range of realmode changes and then manually drill down to the following commit: commit cda846f101fb1396b6924f1d9b68ac3d42de5403 Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Date: Tue May 8 21:22:46 2012 +0300 x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline This patch changes 64-bit trampoline so that CR4 and EFER are provided by the kernel instead of using fixed values. From the Xen debugging console it was possible to gather a bit more data which pointed to a failure very close to setting CR4 in startup_32. On this particular hardware the saved CR4 (about to be set) was 0x1407f0. This would set two flags that somehow feel dangerous: PGE (page global enable) and SMEP (supervisor mode execution protection). SMEP turns out to be the main offender and the following change allows the APs to start: --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -93,7 +93,9 @@ ENTRY(startup_32) movl %edx, %fs movl %edx, %gs - movl pa_tr_cr4, %eax + movl $X86_CR4_SMEP, %eax + notl %eax + andl pa_tr_cr4, %eax movl %eax, %cr4 # Enable PAE mode # Setup trampoline 4 level pagetables Now I am not completely convinced that this is really the way to go. Likely the Xen hypervisor should not start up the guest with CR4 on the BP containing those flags. But maybe it still makes sense to mask some dangerous ones off in the realmode code (btw, it seemed that masking the assignments in arch_setup or setup_realmode did not work). And finally I am wondering why the SMEP flag in CR4 is set anyway. My understanding would be that this should only be done if cpuid[7].ebx has bit7 set. And this does not seem to be the case at least on the one box I was doing the bisection on. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 27.03.2013 16:26, Stefan Bader wrote:> Recently I ran some experiments on newer hardware and realized that when booting > any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail > to bring up any APs (message about CPU Stuck). I was able to normally bisect > into a range of realmode changes and then manually drill down to the following > commit: > > commit cda846f101fb1396b6924f1d9b68ac3d42de5403 > Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com> > Date: Tue May 8 21:22:46 2012 +0300 > > x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline > > This patch changes 64-bit trampoline so that CR4 and > EFER are provided by the kernel instead of using fixed > values. > > From the Xen debugging console it was possible to gather a bit more data which > pointed to a failure very close to setting CR4 in startup_32. On this particular > hardware the saved CR4 (about to be set) was 0x1407f0. > > This would set two flags that somehow feel dangerous: PGE (page global enable) > and SMEP (supervisor mode execution protection). SMEP turns out to be the main > offender and the following change allows the APs to start: > > --- a/arch/x86/realmode/rm/trampoline_64.S > +++ b/arch/x86/realmode/rm/trampoline_64.S > @@ -93,7 +93,9 @@ ENTRY(startup_32) > movl %edx, %fs > movl %edx, %gs > > - movl pa_tr_cr4, %eax > + movl $X86_CR4_SMEP, %eax > + notl %eax > + andl pa_tr_cr4, %eax > movl %eax, %cr4 # Enable PAE mode > > # Setup trampoline 4 level pagetables > > Now I am not completely convinced that this is really the way to go. Likely the > Xen hypervisor should not start up the guest with CR4 on the BP containing those > flags. But maybe it still makes sense to mask some dangerous ones off in the > realmode code (btw, it seemed that masking the assignments in arch_setup or > setup_realmode did not work). > > And finally I am wondering why the SMEP flag in CR4 is set anyway. My > understanding would be that this should only be done if cpuid[7].ebx has bit7 > set. And this does not seem to be the case at least on the one box I was doing > the bisection on.Seems that I was relying on the wrong source of information when checking SMEP support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that at least explains where that comes from... sorry for that.> > -Stefan > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote:> On 27.03.2013 16:26, Stefan Bader wrote: > > Recently I ran some experiments on newer hardware and realized that when booting > > any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail > > to bring up any APs (message about CPU Stuck). I was able to normally bisect > > into a range of realmode changes and then manually drill down to the following > > commit: > > > > commit cda846f101fb1396b6924f1d9b68ac3d42de5403 > > Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com> > > Date: Tue May 8 21:22:46 2012 +0300 > > > > x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline > > > > This patch changes 64-bit trampoline so that CR4 and > > EFER are provided by the kernel instead of using fixed > > values. > > > > From the Xen debugging console it was possible to gather a bit more data which > > pointed to a failure very close to setting CR4 in startup_32. On this particular > > hardware the saved CR4 (about to be set) was 0x1407f0. > > > > This would set two flags that somehow feel dangerous: PGE (page global enable) > > and SMEP (supervisor mode execution protection). SMEP turns out to be the main > > offender and the following change allows the APs to start: > > > > --- a/arch/x86/realmode/rm/trampoline_64.S > > +++ b/arch/x86/realmode/rm/trampoline_64.S > > @@ -93,7 +93,9 @@ ENTRY(startup_32) > > movl %edx, %fs > > movl %edx, %gs > > > > - movl pa_tr_cr4, %eax > > + movl $X86_CR4_SMEP, %eax > > + notl %eax > > + andl pa_tr_cr4, %eax > > movl %eax, %cr4 # Enable PAE mode > > > > # Setup trampoline 4 level pagetables > > > > Now I am not completely convinced that this is really the way to go. Likely the > > Xen hypervisor should not start up the guest with CR4 on the BP containing those > > flags. But maybe it still makes sense to mask some dangerous ones off in the > > realmode code (btw, it seemed that masking the assignments in arch_setup or > > setup_realmode did not work). > > > > And finally I am wondering why the SMEP flag in CR4 is set anyway. My > > understanding would be that this should only be done if cpuid[7].ebx has bit7 > > set. And this does not seem to be the case at least on the one box I was doing > > the bisection on. > > Seems that I was relying on the wrong source of information when checking SMEP > support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that > at least explains where that comes from... sorry for that.OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag) that would work fine? CC-ing the Intel folks who added this in.
On 03/27/2013 09:04 AM, Konrad Rzeszutek Wilk wrote:>>> >>> From the Xen debugging console it was possible to gather a bit more data which >>> pointed to a failure very close to setting CR4 in startup_32. On this particular >>> hardware the saved CR4 (about to be set) was 0x1407f0. >>> >>> This would set two flags that somehow feel dangerous: PGE (page global enable) >>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main >>> offender and the following change allows the APs to start: >>> >>> --- a/arch/x86/realmode/rm/trampoline_64.S >>> +++ b/arch/x86/realmode/rm/trampoline_64.S >>> @@ -93,7 +93,9 @@ ENTRY(startup_32) >>> movl %edx, %fs >>> movl %edx, %gs >>> >>> - movl pa_tr_cr4, %eax >>> + movl $X86_CR4_SMEP, %eax >>> + notl %eax >>> + andl pa_tr_cr4, %eax >>> movl %eax, %cr4 # Enable PAE mode >>> >>> # Setup trampoline 4 level pagetables >>> >>> Now I am not completely convinced that this is really the way to go. Likely the >>> Xen hypervisor should not start up the guest with CR4 on the BP containing those >>> flags. But maybe it still makes sense to mask some dangerous ones off in the >>> realmode code (btw, it seemed that masking the assignments in arch_setup or >>> setup_realmode did not work). >>> >>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My >>> understanding would be that this should only be done if cpuid[7].ebx has bit7 >>> set. And this does not seem to be the case at least on the one box I was doing >>> the bisection on. >> >> Seems that I was relying on the wrong source of information when checking SMEP >> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that >> at least explains where that comes from... sorry for that. > > OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag) > that would work fine? > > CC-ing the Intel folks who added this in. >If it is present in /proc/cpuinfo and not in cpuid it means the kernel thinks it has SMEP but the CPU doesn''t... an obvious case of fail. However, *where the hell* does the bit come from in the first place? That is what we need to track down. When you say Xen HVM, am I correct in assuming that neither CPUID nor CR4 operations in the main kernel are run through paravirt_ops? -hpa
On 03/27/2013 08:53 AM, Stefan Bader wrote:> Seems that I was relying on the wrong source of information when > checking SMEP support. The cpuid command seems at fail. But > /proc/cpuinfo reports it. So that at least explains where that > comes from... sorry for that.What does /proc/cpuinfo and cpuid (or x86info) have for the BSP and APs, respectively? Any instance here of inconsistencies? -hpa
On 27.03.2013 17:09, H. Peter Anvin wrote:> On 03/27/2013 09:04 AM, Konrad Rzeszutek Wilk wrote: >>>> >>>> From the Xen debugging console it was possible to gather a bit more data which >>>> pointed to a failure very close to setting CR4 in startup_32. On this particular >>>> hardware the saved CR4 (about to be set) was 0x1407f0. >>>> >>>> This would set two flags that somehow feel dangerous: PGE (page global enable) >>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main >>>> offender and the following change allows the APs to start: >>>> >>>> --- a/arch/x86/realmode/rm/trampoline_64.S >>>> +++ b/arch/x86/realmode/rm/trampoline_64.S >>>> @@ -93,7 +93,9 @@ ENTRY(startup_32) >>>> movl %edx, %fs >>>> movl %edx, %gs >>>> >>>> - movl pa_tr_cr4, %eax >>>> + movl $X86_CR4_SMEP, %eax >>>> + notl %eax >>>> + andl pa_tr_cr4, %eax >>>> movl %eax, %cr4 # Enable PAE mode >>>> >>>> # Setup trampoline 4 level pagetables >>>> >>>> Now I am not completely convinced that this is really the way to go. Likely the >>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those >>>> flags. But maybe it still makes sense to mask some dangerous ones off in the >>>> realmode code (btw, it seemed that masking the assignments in arch_setup or >>>> setup_realmode did not work). >>>> >>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My >>>> understanding would be that this should only be done if cpuid[7].ebx has bit7 >>>> set. And this does not seem to be the case at least on the one box I was doing >>>> the bisection on. >>> >>> Seems that I was relying on the wrong source of information when checking SMEP >>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that >>> at least explains where that comes from... sorry for that. >> >> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag) >> that would work fine? >> >> CC-ing the Intel folks who added this in. >> > > If it is present in /proc/cpuinfo and not in cpuid it means the kernel > thinks it has SMEP but the CPU doesn''t... an obvious case of fail. > However, *where the hell* does the bit come from in the first place?I did not yet have time to track down all sources but I thought that /proc/cpuinfo is in some way assembled from whatever cpuid info the kernel has. I am more suspicious of the cpuid command I was using. Let me check for x86info.> > That is what we need to track down. > > When you say Xen HVM, am I correct in assuming that neither CPUID nor > CR4 operations in the main kernel are run through paravirt_ops?Not paravirt ops likely but the hypervisor traps access. At least cpuid from within a hvm guest I expect to be filtered. So when checking things I went to bare-metal. Will fetch more info and get back. -Stefan> > -hpa > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/27/2013 09:24 AM, Stefan Bader wrote:>> >> When you say Xen HVM, am I correct in assuming that neither CPUID >> nor CR4 operations in the main kernel are run through >> paravirt_ops? > > Not paravirt ops likely but the hypervisor traps access. At least > cpuid from within a hvm guest I expect to be filtered. So when > checking things I went to bare-metal. > > Will fetch more info and get back. >Hypervisor traps is one thing... they should be consistent no matter where in the code they happen... unless they are broken. Try this CPUID program. This uses the kernel /dev interface which may be somewhat suboptimal in case CPUID in userspace actually differs, but it would be interesting to know what it outputs. -hpa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 27 Mar 2013, Stefan Bader wrote:> On 27.03.2013 17:09, H. Peter Anvin wrote: > > When you say Xen HVM, am I correct in assuming that neither CPUID nor > > CR4 operations in the main kernel are run through paravirt_ops? > > Not paravirt ops likely but the hypervisor traps access. At least cpuid from > within a hvm guest I expect to be filtered. So when checking things I went to > bare-metal.That''s right. Both cr4 and cpuid are trapped, so from the Linux POV they look like native ops, no paravirt_ops involved.
On 27.03.2013 17:04, Konrad Rzeszutek Wilk wrote:> On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote: >> On 27.03.2013 16:26, Stefan Bader wrote: >>> Recently I ran some experiments on newer hardware and realized that when booting >>> any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail >>> to bring up any APs (message about CPU Stuck). I was able to normally bisect >>> into a range of realmode changes and then manually drill down to the following >>> commit: >>> >>> commit cda846f101fb1396b6924f1d9b68ac3d42de5403 >>> Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com> >>> Date: Tue May 8 21:22:46 2012 +0300 >>> >>> x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline >>> >>> This patch changes 64-bit trampoline so that CR4 and >>> EFER are provided by the kernel instead of using fixed >>> values. >>> >>> From the Xen debugging console it was possible to gather a bit more data which >>> pointed to a failure very close to setting CR4 in startup_32. On this particular >>> hardware the saved CR4 (about to be set) was 0x1407f0. >>> >>> This would set two flags that somehow feel dangerous: PGE (page global enable) >>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main >>> offender and the following change allows the APs to start: >>> >>> --- a/arch/x86/realmode/rm/trampoline_64.S >>> +++ b/arch/x86/realmode/rm/trampoline_64.S >>> @@ -93,7 +93,9 @@ ENTRY(startup_32) >>> movl %edx, %fs >>> movl %edx, %gs >>> >>> - movl pa_tr_cr4, %eax >>> + movl $X86_CR4_SMEP, %eax >>> + notl %eax >>> + andl pa_tr_cr4, %eax >>> movl %eax, %cr4 # Enable PAE mode >>> >>> # Setup trampoline 4 level pagetables >>> >>> Now I am not completely convinced that this is really the way to go. Likely the >>> Xen hypervisor should not start up the guest with CR4 on the BP containing those >>> flags. But maybe it still makes sense to mask some dangerous ones off in the >>> realmode code (btw, it seemed that masking the assignments in arch_setup or >>> setup_realmode did not work). >>> >>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My >>> understanding would be that this should only be done if cpuid[7].ebx has bit7 >>> set. And this does not seem to be the case at least on the one box I was doing >>> the bisection on. >> >> Seems that I was relying on the wrong source of information when checking SMEP >> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that >> at least explains where that comes from... sorry for that. > > OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag) > that would work fine?Rebooting with smep=1 as a hv argument does not fix it. But I would be careful since I just quickly did this without checking whether Xen 4.2.1 undestands the flag already. Second using x86info --all on bare metal does show bits set for cpuid[7] and /proc/cpuinfo values are consistent across BP and APs. So I am a tool for using the wrong tool there. So I would say the main issue to look at is why reading cr4 as a HVM guest produces the flags on boot. Surely the hypervisor itself has set certain things up but likely there are some epxectations about the initial setup on boot.> > CC-ing the Intel folks who added this in. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/27/2013 09:45 AM, Stefan Bader wrote:> > Rebooting with smep=1 as a hv argument does not fix it. But I > would be careful since I just quickly did this without checking > whether Xen 4.2.1 undestands the flag already. > > Second using x86info --all on bare metal does show bits set for > cpuid[7] and /proc/cpuinfo values are consistent across BP and > APs. So I am a tool for using the wrong tool there. > > So I would say the main issue to look at is why reading cr4 as a > HVM guest produces the flags on boot. Surely the hypervisor itself > has set certain things up but likely there are some epxectations > about the initial setup on boot. >What does x86info and /proc/cpuinfo show in HVM? The inbound %cr4 shouldn''t matter at all, we try to not rely on it. If the hypervisor presents SMEP to the guest then the guest is pretty obviously going to try to use it. -hpa
On 27.03.2013 17:52, H. Peter Anvin wrote:> On 03/27/2013 09:45 AM, Stefan Bader wrote: >> >> Rebooting with smep=1 as a hv argument does not fix it. But I >> would be careful since I just quickly did this without checking >> whether Xen 4.2.1 undestands the flag already. >> >> Second using x86info --all on bare metal does show bits set for >> cpuid[7] and /proc/cpuinfo values are consistent across BP and >> APs. So I am a tool for using the wrong tool there. >> >> So I would say the main issue to look at is why reading cr4 as a >> HVM guest produces the flags on boot. Surely the hypervisor itself >> has set certain things up but likely there are some epxectations >> about the initial setup on boot. >> > > What does x86info and /proc/cpuinfo show in HVM?x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep set.> > The inbound %cr4 shouldn''t matter at all, we try to not rely on it. > > If the hypervisor presents SMEP to the guest then the guest is pretty > obviously going to try to use it.To me it looks like when bootstrapping the APs things are not yet ready to use it. If I did not miss something, the only place that the saved contents of cr4 are used is in startup_32 when the cpus are brought up. And then just stop dead. Would need to read more code but a bit weird why the BP is not affected.> > -hpa > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/27/2013 10:17 AM, Stefan Bader wrote:>> What does x86info and /proc/cpuinfo show in HVM? > > x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep > set.On all CPUs?>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >> it. >> >> If the hypervisor presents SMEP to the guest then the guest is >> pretty obviously going to try to use it. > > To me it looks like when bootstrapping the APs things are not yet > ready to use it. If I did not miss something, the only place that > the saved contents of cr4 are used is in startup_32 when the cpus > are brought up. And then just stop dead. Would need to read more > code but a bit weird why the BP is not affected.This feels like a bug in Xen, but I don''t know for sure yet. Either which way, it is odd. That write to cr4 should be entirely legitimate. -hpa
On 27.03.2013 17:45, Stefan Bader wrote:> On 27.03.2013 17:04, Konrad Rzeszutek Wilk wrote: >> On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote: >>> On 27.03.2013 16:26, Stefan Bader wrote: >>>> Recently I ran some experiments on newer hardware and realized that when booting >>>> any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail >>>> to bring up any APs (message about CPU Stuck). I was able to normally bisect >>>> into a range of realmode changes and then manually drill down to the following >>>> commit: >>>> >>>> commit cda846f101fb1396b6924f1d9b68ac3d42de5403 >>>> Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com> >>>> Date: Tue May 8 21:22:46 2012 +0300 >>>> >>>> x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline >>>> >>>> This patch changes 64-bit trampoline so that CR4 and >>>> EFER are provided by the kernel instead of using fixed >>>> values. >>>> >>>> From the Xen debugging console it was possible to gather a bit more data which >>>> pointed to a failure very close to setting CR4 in startup_32. On this particular >>>> hardware the saved CR4 (about to be set) was 0x1407f0. >>>> >>>> This would set two flags that somehow feel dangerous: PGE (page global enable) >>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main >>>> offender and the following change allows the APs to start: >>>> >>>> --- a/arch/x86/realmode/rm/trampoline_64.S >>>> +++ b/arch/x86/realmode/rm/trampoline_64.S >>>> @@ -93,7 +93,9 @@ ENTRY(startup_32) >>>> movl %edx, %fs >>>> movl %edx, %gs >>>> >>>> - movl pa_tr_cr4, %eax >>>> + movl $X86_CR4_SMEP, %eax >>>> + notl %eax >>>> + andl pa_tr_cr4, %eax >>>> movl %eax, %cr4 # Enable PAE mode >>>> >>>> # Setup trampoline 4 level pagetables >>>> >>>> Now I am not completely convinced that this is really the way to go. Likely the >>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those >>>> flags. But maybe it still makes sense to mask some dangerous ones off in the >>>> realmode code (btw, it seemed that masking the assignments in arch_setup or >>>> setup_realmode did not work). >>>> >>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My >>>> understanding would be that this should only be done if cpuid[7].ebx has bit7 >>>> set. And this does not seem to be the case at least on the one box I was doing >>>> the bisection on. >>> >>> Seems that I was relying on the wrong source of information when checking SMEP >>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that >>> at least explains where that comes from... sorry for that. >> >> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag) >> that would work fine? > > Rebooting with smep=1 as a hv argument does not fix it. But I would be careful > since I just quickly did this without checking whether Xen 4.2.1 undestands the > flag already.I will need more time to look into this (and unlikely today) but it feels like at least the cpuid flags passed on to HVM guest may be not influenced by the smep boot argument. Probably rather something I could do by masking in the config of the guest (which could be another pain as I normally configure those via libvirt).> > Second using x86info --all on bare metal does show bits set for cpuid[7] and > /proc/cpuinfo values are consistent across BP and APs. So I am a tool for using > the wrong tool there. > > So I would say the main issue to look at is why reading cr4 as a HVM guest > produces the flags on boot. Surely the hypervisor itself has set certain things > up but likely there are some epxectations about the initial setup on boot. > >> >> CC-ing the Intel folks who added this in. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/27/2013 10:28 AM, Stefan Bader wrote:> > I will need more time to look into this (and unlikely today) but it > feels like at least the cpuid flags passed on to HVM guest may be > not influenced by the smep boot argument. Probably rather something > I could do by masking in the config of the guest (which could be > another pain as I normally configure those via libvirt). >There is an "nosmep" kernel command line option. -hpa
On 27.03.2013 18:23, H. Peter Anvin wrote:> On 03/27/2013 10:17 AM, Stefan Bader wrote: >>> What does x86info and /proc/cpuinfo show in HVM? >> >> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep >> set. > > On all CPUs?x86info thinks its one core with ht so only one cpuid line for that.> >>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >>> it. >>> >>> If the hypervisor presents SMEP to the guest then the guest is >>> pretty obviously going to try to use it. >> >> To me it looks like when bootstrapping the APs things are not yet >> ready to use it. If I did not miss something, the only place that >> the saved contents of cr4 are used is in startup_32 when the cpus >> are brought up. And then just stop dead. Would need to read more >> code but a bit weird why the BP is not affected. > > This feels like a bug in Xen, but I don''t know for sure yet. Either > which way, it is odd. That write to cr4 should be entirely legitimate.Could likely be. Unfortunately one where a change in the kernel triggers it. Not exactly your problem but a pita nonetheless. -Stefan> > -hpa > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 27.03.2013 18:30, H. Peter Anvin wrote:> On 03/27/2013 10:28 AM, Stefan Bader wrote: >> >> I will need more time to look into this (and unlikely today) but it >> feels like at least the cpuid flags passed on to HVM guest may be >> not influenced by the smep boot argument. Probably rather something >> I could do by masking in the config of the guest (which could be >> another pain as I normally configure those via libvirt). >> > > There is an "nosmep" kernel command line option.Ignoring it on that side does help.> > -hpa > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/27/2013 10:40 AM, Stefan Bader wrote:> On 27.03.2013 18:30, H. Peter Anvin wrote: >> On 03/27/2013 10:28 AM, Stefan Bader wrote: >>> >>> I will need more time to look into this (and unlikely today) >>> but it feels like at least the cpuid flags passed on to HVM >>> guest may be not influenced by the smep boot argument. Probably >>> rather something I could do by masking in the config of the >>> guest (which could be another pain as I normally configure >>> those via libvirt). >>> >> >> There is an "nosmep" kernel command line option. > > Ignoring it on that side does help. >As one would expect. Are CPUID and /proc/cpuinfo still consistent across all CPUs inside the HVM? -hpa
On 27/03/2013 16:45, "Stefan Bader" <stefan.bader@canonical.com> wrote:>>> Seems that I was relying on the wrong source of information when checking >>> SMEP >>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So >>> that >>> at least explains where that comes from... sorry for that. >> >> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of >> counterintuive flag) >> that would work fine? > > Rebooting with smep=1 as a hv argument does not fix it. But I would be careful > since I just quickly did this without checking whether Xen 4.2.1 undestands > the > flag already.Yes, the flag is understood by all Xen 4.2 releases. However it is not inverted as you believe: it really is smep=0 or smep=off or even no-smep to disable SMEP. smep=1 will enable SMEP (which is the default anyway). I also checked how CPUID.SMEP gets set for an HVM guest, and it is very obviously masked off if SMEP support has been disabled or is unavailable. So I do not think we can be erroneously passing the CPUID flag to the guest. -- Keir
>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote: > On 03/27/2013 10:17 AM, Stefan Bader wrote: >>> What does x86info and /proc/cpuinfo show in HVM? >> >> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep >> set. > > On all CPUs? > >>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >>> it. >>> >>> If the hypervisor presents SMEP to the guest then the guest is >>> pretty obviously going to try to use it. >> >> To me it looks like when bootstrapping the APs things are not yet >> ready to use it. If I did not miss something, the only place that >> the saved contents of cr4 are used is in startup_32 when the cpus >> are brought up. And then just stop dead. Would need to read more >> code but a bit weird why the BP is not affected. > > This feels like a bug in Xen, but I don''t know for sure yet. Either > which way, it is odd. That write to cr4 should be entirely legitimate.And I would guess one that got fixed already. Stefan, please try 4.2.2-rc1, or (separately) http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661 (which I think requires the immediately preceding http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b too). Jan
On 28.03.2013 14:34, Jan Beulich wrote:>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote: >> On 03/27/2013 10:17 AM, Stefan Bader wrote: >>>> What does x86info and /proc/cpuinfo show in HVM? >>> >>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep >>> set. >> >> On all CPUs? >> >>>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >>>> it. >>>> >>>> If the hypervisor presents SMEP to the guest then the guest is >>>> pretty obviously going to try to use it. >>> >>> To me it looks like when bootstrapping the APs things are not yet >>> ready to use it. If I did not miss something, the only place that >>> the saved contents of cr4 are used is in startup_32 when the cpus >>> are brought up. And then just stop dead. Would need to read more >>> code but a bit weird why the BP is not affected. >> >> This feels like a bug in Xen, but I don''t know for sure yet. Either >> which way, it is odd. That write to cr4 should be entirely legitimate. > > And I would guess one that got fixed already. > > Stefan, please try 4.2.2-rc1, or (separately) > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661 > (which I think requires the immediately preceding > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b > too).The backing explanation does make a lot of sense in reasoning what is going wrong. Unfortunately the two patches above on their own do not fix the problem (I will try to make another go with 4.2.2-rc1). For a bit more info I am running a kernel inside the HVM guest which shows the contents of the cr4 shadow used in the trampoline. Out of interest I compared those values to the ones used on a bare metal boot and both are identical (0x1407F0). That somehow gives some explanation for the patch above failing. Looking at the code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP handling, there already was code which would clear the PAE flag when paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP flag should get cleared. And the PAE flag was (and has to be) set before. Will be looking into this further. -Stefan> > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 27.03.2013 21:24, Keir Fraser wrote:> On 27/03/2013 16:45, "Stefan Bader" <stefan.bader@canonical.com> wrote: > >>>> Seems that I was relying on the wrong source of information when checking >>>> SMEP >>>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So >>>> that >>>> at least explains where that comes from... sorry for that. >>> >>> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of >>> counterintuive flag) >>> that would work fine? >> >> Rebooting with smep=1 as a hv argument does not fix it. But I would be careful >> since I just quickly did this without checking whether Xen 4.2.1 undestands >> the >> flag already. > > Yes, the flag is understood by all Xen 4.2 releases. However it is not > inverted as you believe: it really is smep=0 or smep=off or even no-smep to > disable SMEP. smep=1 will enable SMEP (which is the default anyway). > > I also checked how CPUID.SMEP gets set for an HVM guest, and it is very > obviously masked off if SMEP support has been disabled or is unavailable. So > I do not think we can be erroneously passing the CPUID flag to the guest.No you are completely right. The inverse boolean got me for good. So to summarize: - smep=0 as hypervisor argument avoids the problem for all guests - nosmep as hvm guest arguement avoids the problem for that guest - /proc/cpuinfo correctly reflects whether smep has been masked off or not> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/28/2013 08:06 AM, Stefan Bader wrote:> > No you are completely right. The inverse boolean got me for good. > So to summarize: > > - smep=0 as hypervisor argument avoids the problem for all guests - > nosmep as hvm guest arguement avoids the problem for that guest - > /proc/cpuinfo correctly reflects whether smep has been masked off > or not >Please try to patch Jan pointed to. -hpa
On 28.03.2013 16:42, H. Peter Anvin wrote:> On 03/28/2013 08:06 AM, Stefan Bader wrote: >> >> No you are completely right. The inverse boolean got me for good. >> So to summarize: >> >> - smep=0 as hypervisor argument avoids the problem for all guests - >> nosmep as hvm guest arguement avoids the problem for that guest - >> /proc/cpuinfo correctly reflects whether smep has been masked off >> or not >> > > Please try to patch Jan pointed to.I did, but it did not work. Elaborating on it a bit more in the reply I wrote to his mail. In short, I think the code that would clear smep is not reached. -Stefan> > -hpa > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 28.03.2013 16:02, Stefan Bader wrote:> On 28.03.2013 14:34, Jan Beulich wrote: >>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote: >>> On 03/27/2013 10:17 AM, Stefan Bader wrote: >>>>> What does x86info and /proc/cpuinfo show in HVM? >>>> >>>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep >>>> set. >>> >>> On all CPUs? >>> >>>>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >>>>> it. >>>>> >>>>> If the hypervisor presents SMEP to the guest then the guest is >>>>> pretty obviously going to try to use it. >>>> >>>> To me it looks like when bootstrapping the APs things are not yet >>>> ready to use it. If I did not miss something, the only place that >>>> the saved contents of cr4 are used is in startup_32 when the cpus >>>> are brought up. And then just stop dead. Would need to read more >>>> code but a bit weird why the BP is not affected. >>> >>> This feels like a bug in Xen, but I don''t know for sure yet. Either >>> which way, it is odd. That write to cr4 should be entirely legitimate. >> >> And I would guess one that got fixed already. >> >> Stefan, please try 4.2.2-rc1, or (separately) >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661 >> (which I think requires the immediately preceding >> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b >> too). > > The backing explanation does make a lot of sense in reasoning what is going > wrong. Unfortunately the two patches above on their own do not fix the problem > (I will try to make another go with 4.2.2-rc1).The whole of 4.2.2-rc1 has the same (smep still present in trampoline_cr4_features) outcome.> > For a bit more info I am running a kernel inside the HVM guest which shows the > contents of the cr4 shadow used in the trampoline. Out of interest I compared > those values to the ones used on a bare metal boot and both are identical > (0x1407F0). > > That somehow gives some explanation for the patch above failing. Looking at the > code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP > handling, there already was code which would clear the PAE flag when > paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP > flag should get cleared. And the PAE flag was (and has to be) set before. >> Will be looking into this further.Going back to gather more info and to find some fix. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 28.03.2013 17:39, Stefan Bader wrote:> On 28.03.2013 16:02, Stefan Bader wrote: >> On 28.03.2013 14:34, Jan Beulich wrote: >>>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote: >>>> On 03/27/2013 10:17 AM, Stefan Bader wrote: >>>>>> What does x86info and /proc/cpuinfo show in HVM? >>>>> >>>>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep >>>>> set. >>>> >>>> On all CPUs? >>>> >>>>>> The inbound %cr4 shouldn''t matter at all, we try to not rely on >>>>>> it. >>>>>> >>>>>> If the hypervisor presents SMEP to the guest then the guest is >>>>>> pretty obviously going to try to use it. >>>>> >>>>> To me it looks like when bootstrapping the APs things are not yet >>>>> ready to use it. If I did not miss something, the only place that >>>>> the saved contents of cr4 are used is in startup_32 when the cpus >>>>> are brought up. And then just stop dead. Would need to read more >>>>> code but a bit weird why the BP is not affected. >>>> >>>> This feels like a bug in Xen, but I don''t know for sure yet. Either >>>> which way, it is odd. That write to cr4 should be entirely legitimate. >>> >>> And I would guess one that got fixed already. >>> >>> Stefan, please try 4.2.2-rc1, or (separately) >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661 >>> (which I think requires the immediately preceding >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b >>> too). >> >> The backing explanation does make a lot of sense in reasoning what is going >> wrong. Unfortunately the two patches above on their own do not fix the problem >> (I will try to make another go with 4.2.2-rc1). > > The whole of 4.2.2-rc1 has the same (smep still present in > trampoline_cr4_features) outcome. >> >> For a bit more info I am running a kernel inside the HVM guest which shows the >> contents of the cr4 shadow used in the trampoline. Out of interest I compared >> those values to the ones used on a bare metal boot and both are identical >> (0x1407F0). >> >> That somehow gives some explanation for the patch above failing. Looking at the >> code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP >> handling, there already was code which would clear the PAE flag when >> paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP >> flag should get cleared. And the PAE flag was (and has to be) set before. >> > >> Will be looking into this further. > Going back to gather more info and to find some fix. >I added some more debugging output to the hypervisor to verify the state of HAP. This showed that while HAP is available on the system, it is not used for the HVM guests. It looks like this would require some flags to be set when creating the guest domains and I assume this is not happening because I have to stay with the xm stack for the libvirt setup for now (requires some repackaging which hasn''t been done, yet). So the guest isn''t using HAP but does seem to use some form of paging even if the guest VCPU is not using paging. So I changed the vmx_update_guest_cr() function in that way and that seems to prevent the hangs. Does this look like a reasonable upstream Xen change? From eccbc4cf0916c6d4388f658965c79770bd0ba10f Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Wed, 3 Apr 2013 12:06:24 +0200 Subject: [PATCH] VMX: Always disable SMEP when guest is in non-paging mode commit e7dda8ec9fc9020e4f53345cdbb18a2e82e54a65 VMX: disable SMEP feature when guest is in non-paging mode disabled the SMEP bit if a guest VCPU was using HAP and was not in paging mode. However I could observe VCPUs getting stuck in the trampoline after the following patch in the Linux kernel changed the way CR4 gets set up: x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline The change will set CR4 from already set flags which includes the SMEP bit. On bare metal this does not matter as the CPU is in non- paging mode at that time. But Xen seems to use the emulated non- paging mode regardless of HAP (I verified that on the guests I was seeing the issue, HAP was not used). Therefor it seems right to unset the SMEP bit for a VCPU that is not in paging-mode, regardless of its HAP usage. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- xen/arch/x86/hvm/vmx/vmx.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 04dbefb..a869ed4 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1161,13 +1161,16 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) if ( paging_mode_hap(v->domain) && !hvm_paging_enabled(v) ) { v->arch.hvm_vcpu.hw_cr[4] |= X86_CR4_PSE; v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_PAE; + } + if ( !hvm_paging_enabled(v) ) + { /* * SMEP is disabled if CPU is in non-paging mode in hardware. * However Xen always uses paging mode to emulate guest non-paging - * mode with HAP. To emulate this behavior, SMEP needs to be - * manually disabled when guest switches to non-paging mode. + * mode. To emulate this behavior, SMEP needs to be manually + * disabled when guest VCPU is in non-paging mode. */ v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_SMEP; } __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com> wrote: > I added some more debugging output to the hypervisor to verify the state of HAP. > This showed that while HAP is available on the system, it is not used for the > HVM guests. It looks like this would require some flags to be set when creating > the guest domains and I assume this is not happening because I have to stay with > the xm stack for the libvirt setup for now (requires some repackaging which hasn''t been done, yet). > > So the guest isn''t using HAP but does seem to use some form of paging even if > the guest VCPU is not using paging. So I changed the vmx_update_guest_cr() > function in that way and that seems to prevent the hangs. Does this look like a > reasonable upstream Xen change?Yes, it looks appropriate. But I''d like this to be confirmed by the authors of the original change and/or the VMX maintainers (added to Cc). Nevertheless it''s very odd to not use HAP on a machine capable of it... Jan
On 03/04/2013 13:43, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com> wrote: >> I added some more debugging output to the hypervisor to verify the state of >> HAP. >> This showed that while HAP is available on the system, it is not used for the >> HVM guests. It looks like this would require some flags to be set when >> creating >> the guest domains and I assume this is not happening because I have to stay >> with >> the xm stack for the libvirt setup for now (requires some repackaging which >> hasn''t been done, yet). >> >> So the guest isn''t using HAP but does seem to use some form of paging even if >> the guest VCPU is not using paging. So I changed the vmx_update_guest_cr() >> function in that way and that seems to prevent the hangs. Does this look like >> a >> reasonable upstream Xen change? > > Yes, it looks appropriate. But I''d like this to be confirmed by the > authors of the original change and/or the VMX maintainers (added > to Cc).It can have my ack straight away. Acked-by: Keir Fraser <keir@xen.org> Nonetheless it would be nice to get a VMX maintainer ack too, though I''m pretty sure this patch is correct. -- Keir> Nevertheless it''s very odd to not use HAP on a machine capable > of it... > > Jan > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Wednesday, April 03, 2013 10:28 PM > To: Jan Beulich; Stefan Bader > Cc: xen-devel@lists.xensource.com; Konrad Rzeszutek Wilk; Dong, Eddie; > wei.y.yang@intel.com; Shan, Haitao; Xu, Dongxiao; xin.li@intel.com; Nakajima, > Jun; H. Peter Anvin; Zhang, Xiantao > Subject: Re: [Xen-devel] Xen HVM regression on certain Intel CPUs > > On 03/04/2013 13:43, "Jan Beulich" <JBeulich@suse.com> wrote: > > >>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com> > wrote: > >> I added some more debugging output to the hypervisor to verify the state of > >> HAP. > >> This showed that while HAP is available on the system, it is not used for the > >> HVM guests. It looks like this would require some flags to be set when > >> creating > >> the guest domains and I assume this is not happening because I have to stay > >> with > >> the xm stack for the libvirt setup for now (requires some repackaging which > >> hasn''t been done, yet). > >> > >> So the guest isn''t using HAP but does seem to use some form of paging even > if > >> the guest VCPU is not using paging. So I changed the > vmx_update_guest_cr() > >> function in that way and that seems to prevent the hangs. Does this look like > >> a > >> reasonable upstream Xen change? > > > > Yes, it looks appropriate. But I''d like this to be confirmed by the > > authors of the original change and/or the VMX maintainers (added > > to Cc). > > It can have my ack straight away. > > Acked-by: Keir Fraser <keir@xen.org> > > Nonetheless it would be nice to get a VMX maintainer ack too, though I''m > pretty sure this patch is correct.Yes, it is a good fix. Thank you! I didn''t test non-HAP case when I made the patch to fix this SMEP issue. Acked-by: Dongxiao Xu <dongxiao.xu@intel.com> Thanks, Dongxiao> > -- Keir > > > Nevertheless it''s very odd to not use HAP on a machine capable > > of it... > > > > Jan > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel >
On 04/03/2013 08:00 AM, Xu, Dongxiao wrote:> > Yes, it is a good fix. Thank you! > I didn''t test non-HAP case when I made the patch to fix this SMEP issue. >Now, won''t SMAP have exactly the same issue (and so need to be added to the same mask?) -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
>>> On 03.04.13 at 17:48, "H. Peter Anvin" <hpa@zytor.com> wrote: > On 04/03/2013 08:00 AM, Xu, Dongxiao wrote: >> >> Yes, it is a good fix. Thank you! >> I didn''t test non-HAP case when I made the patch to fix this SMEP issue. >> > > Now, won''t SMAP have exactly the same issue (and so need to be added to > the same mask?)Whenever the hypervisor starts supporting SMAP, yes. Jan