Andre Przywara
2012-May-30 13:10 UTC
[PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
Because we are behind a family check before tweaking the topology bit, we can use the standard rd/wrmsr variants for the CPUID feature register. This fixes a crash when using the kernel as a Xen Dom0 on affected Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized yet (this will be fixed in another patch). Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # 3.4+ --- arch/x86/kernel/cpu/amd.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 146bb62..80ccd99 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) !cpu_has(c, X86_FEATURE_TOPOEXT)) { u64 val; - if (!rdmsrl_amd_safe(0xc0011005, &val)) { + if (!rdmsrl_safe(0xc0011005, &val)) { val |= 1ULL << 54; - wrmsrl_amd_safe(0xc0011005, val); + checking_wrmsrl(0xc0011005, val); rdmsrl(0xc0011005, val); if (val & (1ULL << 54)) { set_cpu_cap(c, X86_FEATURE_TOPOEXT); -- 1.7.4.4
Jan Beulich
2012-May-30 13:33 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 15:10, Andre Przywara <andre.przywara@amd.com> wrote: > Because we are behind a family check before tweaking the topology > bit, we can use the standard rd/wrmsr variants for the CPUID feature > register. > This fixes a crash when using the kernel as a Xen Dom0 on affected > Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > yet (this will be fixed in another patch).I''m not following: If the AMD variants (putting a special value into %edi) can be freely replaced by the non-AMD variants, why did the AMD special ones get used in the first place? Further, I can''t see how checking_wrmsrl() is being paravirtualized any better than wrmsrl_amd_safe() - both have nothing but an exception handling fixup attached to the wrmsr invocation. Care to point out what actual crash it is that was seen? Finally, I would question whether re-enabling the topology extensions under Xen shouldn''t be skipped altogether, perhaps even on Dom0 (as the hypervisor is controlling this MSR, but in any case on DomU - the hypervisor won''t allow (read: ignore, not fault on) the write anyway (and will log a message for each (v)CPU that attempts this). Jan> Signed-off-by: Andre Przywara <andre.przywara@amd.com> > Cc: stable@vger.kernel.org # 3.4+ > --- > arch/x86/kernel/cpu/amd.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index 146bb62..80ccd99 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) > !cpu_has(c, X86_FEATURE_TOPOEXT)) { > u64 val; > > - if (!rdmsrl_amd_safe(0xc0011005, &val)) { > + if (!rdmsrl_safe(0xc0011005, &val)) { > val |= 1ULL << 54; > - wrmsrl_amd_safe(0xc0011005, val); > + checking_wrmsrl(0xc0011005, val); > rdmsrl(0xc0011005, val); > if (val & (1ULL << 54)) { > set_cpu_cap(c, X86_FEATURE_TOPOEXT);
Andre Przywara
2012-May-30 14:02 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 03:33 PM, Jan Beulich wrote:>>>> On 30.05.12 at 15:10, Andre Przywara<andre.przywara@amd.com> wrote: >> Because we are behind a family check before tweaking the topology >> bit, we can use the standard rd/wrmsr variants for the CPUID feature >> register. >> This fixes a crash when using the kernel as a Xen Dom0 on affected >> Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized >> yet (this will be fixed in another patch). > > I''m not following: If the AMD variants (putting a special value into > %edi) can be freely replaced by the non-AMD variants, why did > the AMD special ones get used in the first place?Older CPUs (K8) needed the AMD variants, starting with family 10h we can use the normal versions.> Further, I can''t see how checking_wrmsrl() is being paravirtualized > any better than wrmsrl_amd_safe() - both have nothing but an > exception handling fixup attached to the wrmsr invocation. Care > to point out what actual crash it is that was seen?AFAIK, the difference is between the "l" and the regs version for rd/wrmsr. We have a patch already here to fix this. Will send it out soon. Jacob, can you comment on this? The crash dump: [ 1.601021] ------------[ cut here ]------------ [ 1.601025] kernel BUG at /buildrepos/linux-2.6-stable/arch/x86/include/asm/paravirt.h:133! [ 1.601031] invalid opcode: 0000 [#1] SMP [ 1.601038] CPU 0 [ 1.601041] Modules linked in: [ 1.601047] [ 1.601050] Pid: 0, comm: swapper/0 Not tainted 3.4.0 #1 AMD Virgo Platform/Annapurna [ 1.601061] RIP: e030:[<ffffffff8169b4fe>] [<ffffffff8169b4fe>] init_amd+0x21d/0x603 [ 1.601072] RSP: e02b:ffffffff81c01df8 EFLAGS: 00010246 [ 1.601076] RAX: 0000000000000000 RBX: ffffffff81ca7500 RCX: 0000000000000000 [ 1.601081] RDX: 0000000000000020 RSI: 000000000000c000 RDI: ffffffff81c01e78 [ 1.601086] RBP: ffffffff81c01ea8 R08: 0000000000004003 R09: 00000000ffffffff [ 1.601090] R10: 0000000000000000 R11: 00000000ffffffff R12: ffffffff81ca7574 [ 1.601095] R13: ffffffff81ca7514 R14: ffffffff81ca754c R15: ffffffff81ca756c [ 1.601103] FS: 0000000000000000(0000) GS:ffff8801ce600000(0000) knlGS:0000000000000000 [ 1.601108] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1.601112] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 0000000000040660 [ 1.601117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.601122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1.601127] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c14020) [ 1.601131] Stack: [ 1.601134] ffffffff81c01ea8 ffffffff8169a7d8 0000001600000017 0000000000000000 [ 1.601146] ffff8801c1c3ac00 ffff8801c1c002d0 ffffffff81c01e48 ffffffff81140118 [ 1.601157] ffffffff811415cc ffff8801c1c04220 ffff8801c1c02480 00000000000000d0 [ 1.601169] Call Trace: [ 1.601175] [<ffffffff8169a7d8>] ? get_cpu_cap+0x1f2/0x201 [ 1.601183] [<ffffffff81140118>] ? init_kmem_cache_cpus+0x3e/0x4d [ 1.601189] [<ffffffff811415cc>] ? sysfs_slab_alias+0x60/0x8d [ 1.601195] [<ffffffff8169aa3d>] identify_cpu+0x256/0x34d [ 1.601201] [<ffffffff81141649>] ? kmem_cache_alloc+0x50/0xe4 [ 1.601209] [<ffffffff81cd1c51>] identify_boot_cpu+0x10/0x3c [ 1.601216] [<ffffffff81cd2052>] check_bugs+0x9/0x2d [ 1.601222] [<ffffffff81cc7e08>] start_kernel+0x445/0x461 [ 1.601227] [<ffffffff81cc77d6>] ? kernel_init+0x1d8/0x1d8 [ 1.601233] [<ffffffff81cc72cf>] x86_64_start_reservations+0xba/0xc1 [ 1.601240] [<ffffffff81041797>] ? xen_setup_runstate_info+0x2c/0x36 [ 1.601247] [<ffffffff81ccbdc4>] xen_start_kernel+0x58b/0x592 [ 1.601251] Code: 0f 85 d3 00 00 00 31 c0 48 83 3d cd 5c 58 00 00 48 8d 7d b0 b9 08 00 00 00 f3 ab c7 45 b4 05 10 01 c0 c7 45 cc 3a 20 5a 9c 75 04 <0f> 0b eb fe 4c 8d 75 b0 4c 89 f7 ff 14 25 b0 11 c2 81 85 c0 8b [ 1.601374] RIP [<ffffffff8169b4fe>] init_amd+0x21d/0x603 [ 1.601381] RSP <ffffffff81c01df8> [ 1.601391] ---[ end trace a7919e7f17c0a725 ]--- [ 1.601397] Kernel panic - not syncing: Attempted to kill the idle task!> Finally, I would question whether re-enabling the topology > extensions under Xen shouldn''t be skipped altogether, perhaps > even on Dom0 (as the hypervisor is controlling this MSR, but in > any case on DomU - the hypervisor won''t allow (read: ignore, > not fault on) the write anyway (and will log a message for each > (v)CPU that attempts this).This is probably right. Let me think about this. Thanks for picking this up. Regards, Andre.> >> Signed-off-by: Andre Przywara<andre.przywara@amd.com> >> Cc: stable@vger.kernel.org # 3.4+ >> --- >> arch/x86/kernel/cpu/amd.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c >> index 146bb62..80ccd99 100644 >> --- a/arch/x86/kernel/cpu/amd.c >> +++ b/arch/x86/kernel/cpu/amd.c >> @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) >> !cpu_has(c, X86_FEATURE_TOPOEXT)) { >> u64 val; >> >> - if (!rdmsrl_amd_safe(0xc0011005,&val)) { >> + if (!rdmsrl_safe(0xc0011005,&val)) { >> val |= 1ULL<< 54; >> - wrmsrl_amd_safe(0xc0011005, val); >> + checking_wrmsrl(0xc0011005, val); >> rdmsrl(0xc0011005, val); >> if (val& (1ULL<< 54)) { >> set_cpu_cap(c, X86_FEATURE_TOPOEXT); > > >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
Jan Beulich
2012-May-30 14:23 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 16:02, Andre Przywara <andre.przywara@amd.com> wrote: > On 05/30/2012 03:33 PM, Jan Beulich wrote: >> Further, I can''t see how checking_wrmsrl() is being paravirtualized >> any better than wrmsrl_amd_safe() - both have nothing but an >> exception handling fixup attached to the wrmsr invocation. Care >> to point out what actual crash it is that was seen? > > AFAIK, the difference is between the "l" and the regs version for > rd/wrmsr. We have a patch already here to fix this. Will send it out > soon. Jacob, can you comment on this?I see - the Xen code blindly overwrites pv_cpu_ops, despite not having initialized all members. That''s an obvious oversight of the patch that introduced the _regs variants. Plus having secondary instances of things like rdmsrl_amd_safe() in asm/paravirt.h seems pretty strange an approach (which was why initially I didn''t spot how a crash could happen there) - only the lowest level functions should get re-implemented here.>> Finally, I would question whether re-enabling the topology >> extensions under Xen shouldn''t be skipped altogether, perhaps >> even on Dom0 (as the hypervisor is controlling this MSR, but in >> any case on DomU - the hypervisor won''t allow (read: ignore, >> not fault on) the write anyway (and will log a message for each >> (v)CPU that attempts this). > > This is probably right. Let me think about this.I''ll submit a respective hypervisor side patch soonish. Jan
Konrad Rzeszutek Wilk
2012-May-30 14:39 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 03:10:02PM +0200, Andre Przywara wrote:> Because we are behind a family check before tweaking the topology > bit, we can use the standard rd/wrmsr variants for the CPUID feature > register. > This fixes a crash when using the kernel as a Xen Dom0 on affected > Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > yet (this will be fixed in another patch).So with a rdmsrl_amd_safe and wrmsrl_amd_safe being implemented in the pv_cpu_ops - would this patch even be neccessary?> > Signed-off-by: Andre Przywara <andre.przywara@amd.com> > Cc: stable@vger.kernel.org # 3.4+ > --- > arch/x86/kernel/cpu/amd.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index 146bb62..80ccd99 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) > !cpu_has(c, X86_FEATURE_TOPOEXT)) { > u64 val; > > - if (!rdmsrl_amd_safe(0xc0011005, &val)) { > + if (!rdmsrl_safe(0xc0011005, &val)) { > val |= 1ULL << 54; > - wrmsrl_amd_safe(0xc0011005, val); > + checking_wrmsrl(0xc0011005, val); > rdmsrl(0xc0011005, val); > if (val & (1ULL << 54)) { > set_cpu_cap(c, X86_FEATURE_TOPOEXT); > -- > 1.7.4.4 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
H. Peter Anvin
2012-May-30 14:42 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 06:10 AM, Andre Przywara wrote:> Because we are behind a family check before tweaking the topology > bit, we can use the standard rd/wrmsr variants for the CPUID feature > register.That is not what the *msr*_amd*() functions do. NAK. This is a totally bogus patch. -hpa
H. Peter Anvin
2012-May-30 14:42 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 07:23 AM, Jan Beulich wrote:> > I see - the Xen code blindly overwrites pv_cpu_ops, despite not > having initialized all members. That''s an obvious oversight of the > patch that introduced the _regs variants. > > Plus having secondary instances of things like rdmsrl_amd_safe() > in asm/paravirt.h seems pretty strange an approach (which was > why initially I didn''t spot how a crash could happen there) - only > the lowest level functions should get re-implemented here. >This kinds of things are part of why Xen makes me want to cry regularly. -hpa
Jacob Shin
2012-May-30 14:48 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 04:02:48PM +0200, Andre Przywara wrote:> On 05/30/2012 03:33 PM, Jan Beulich wrote: > >>>>On 30.05.12 at 15:10, Andre Przywara<andre.przywara@amd.com> wrote: > >>Because we are behind a family check before tweaking the topology > >>bit, we can use the standard rd/wrmsr variants for the CPUID feature > >>register. > >>This fixes a crash when using the kernel as a Xen Dom0 on affected > >>Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > >>yet (this will be fixed in another patch). > > > >I''m not following: If the AMD variants (putting a special value into > >%edi) can be freely replaced by the non-AMD variants, why did > >the AMD special ones get used in the first place? > > Older CPUs (K8) needed the AMD variants, starting with family 10h we > can use the normal versions. > > >Further, I can''t see how checking_wrmsrl() is being paravirtualized > >any better than wrmsrl_amd_safe() - both have nothing but an > >exception handling fixup attached to the wrmsr invocation. Care > >to point out what actual crash it is that was seen? > > AFAIK, the difference is between the "l" and the regs version for > rd/wrmsr. We have a patch already here to fix this. Will send it out > soon. Jacob, can you comment on this?Right, the checking_wrmsrl turns into wrmsr_safe which is paravirtualized but the rdmsrl_amd_safe which turns into rdmsr_regs is not paravirtualized by enlighten.> > The crash dump: > > [ 1.601021] ------------[ cut here ]------------ > [ 1.601025] kernel BUG at > /buildrepos/linux-2.6-stable/arch/x86/include/asm/paravirt.h:133! > [ 1.601031] invalid opcode: 0000 [#1] SMP > [ 1.601038] CPU 0 > [ 1.601041] Modules linked in: > [ 1.601047] > [ 1.601050] Pid: 0, comm: swapper/0 Not tainted 3.4.0 #1 AMD > Virgo Platform/Annapurna > [ 1.601061] RIP: e030:[<ffffffff8169b4fe>] [<ffffffff8169b4fe>] > init_amd+0x21d/0x603 > [ 1.601072] RSP: e02b:ffffffff81c01df8 EFLAGS: 00010246 > [ 1.601076] RAX: 0000000000000000 RBX: ffffffff81ca7500 RCX: > 0000000000000000 > [ 1.601081] RDX: 0000000000000020 RSI: 000000000000c000 RDI: > ffffffff81c01e78 > [ 1.601086] RBP: ffffffff81c01ea8 R08: 0000000000004003 R09: > 00000000ffffffff > [ 1.601090] R10: 0000000000000000 R11: 00000000ffffffff R12: > ffffffff81ca7574 > [ 1.601095] R13: ffffffff81ca7514 R14: ffffffff81ca754c R15: > ffffffff81ca756c > [ 1.601103] FS: 0000000000000000(0000) GS:ffff8801ce600000(0000) > knlGS:0000000000000000 > [ 1.601108] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1.601112] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: > 0000000000040660 > [ 1.601117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1.601122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 1.601127] Process swapper/0 (pid: 0, threadinfo > ffffffff81c00000, task ffffffff81c14020) > [ 1.601131] Stack: > [ 1.601134] ffffffff81c01ea8 ffffffff8169a7d8 0000001600000017 > 0000000000000000 > [ 1.601146] ffff8801c1c3ac00 ffff8801c1c002d0 ffffffff81c01e48 > ffffffff81140118 > [ 1.601157] ffffffff811415cc ffff8801c1c04220 ffff8801c1c02480 > 00000000000000d0 > [ 1.601169] Call Trace: > [ 1.601175] [<ffffffff8169a7d8>] ? get_cpu_cap+0x1f2/0x201 > [ 1.601183] [<ffffffff81140118>] ? init_kmem_cache_cpus+0x3e/0x4d > [ 1.601189] [<ffffffff811415cc>] ? sysfs_slab_alias+0x60/0x8d > [ 1.601195] [<ffffffff8169aa3d>] identify_cpu+0x256/0x34d > [ 1.601201] [<ffffffff81141649>] ? kmem_cache_alloc+0x50/0xe4 > [ 1.601209] [<ffffffff81cd1c51>] identify_boot_cpu+0x10/0x3c > [ 1.601216] [<ffffffff81cd2052>] check_bugs+0x9/0x2d > [ 1.601222] [<ffffffff81cc7e08>] start_kernel+0x445/0x461 > [ 1.601227] [<ffffffff81cc77d6>] ? kernel_init+0x1d8/0x1d8 > [ 1.601233] [<ffffffff81cc72cf>] x86_64_start_reservations+0xba/0xc1 > [ 1.601240] [<ffffffff81041797>] ? xen_setup_runstate_info+0x2c/0x36 > [ 1.601247] [<ffffffff81ccbdc4>] xen_start_kernel+0x58b/0x592 > [ 1.601251] Code: 0f 85 d3 00 00 00 31 c0 48 83 3d cd 5c 58 00 00 > 48 8d 7d b0 b9 08 00 00 00 f3 > ab c7 45 b4 05 10 01 c0 c7 45 cc 3a 20 5a 9c 75 04 <0f> 0b eb fe 4c > 8d 75 b0 4c 89 f7 ff 14 25 b0 11 > c2 81 85 c0 8b > [ 1.601374] RIP [<ffffffff8169b4fe>] init_amd+0x21d/0x603 > [ 1.601381] RSP <ffffffff81c01df8> > [ 1.601391] ---[ end trace a7919e7f17c0a725 ]--- > [ 1.601397] Kernel panic - not syncing: Attempted to kill the idle task! > > >Finally, I would question whether re-enabling the topology > >extensions under Xen shouldn''t be skipped altogether, perhaps > >even on Dom0 (as the hypervisor is controlling this MSR, but in > >any case on DomU - the hypervisor won''t allow (read: ignore, > >not fault on) the write anyway (and will log a message for each > >(v)CPU that attempts this). > > This is probably right. Let me think about this. > > Thanks for picking this up. > > Regards, > Andre. > > > > >>Signed-off-by: Andre Przywara<andre.przywara@amd.com> > >>Cc: stable@vger.kernel.org # 3.4+ > >>--- > >> arch/x86/kernel/cpu/amd.c | 4 ++-- > >> 1 files changed, 2 insertions(+), 2 deletions(-) > >> > >>diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > >>index 146bb62..80ccd99 100644 > >>--- a/arch/x86/kernel/cpu/amd.c > >>+++ b/arch/x86/kernel/cpu/amd.c > >>@@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) > >> !cpu_has(c, X86_FEATURE_TOPOEXT)) { > >> u64 val; > >> > >>- if (!rdmsrl_amd_safe(0xc0011005,&val)) { > >>+ if (!rdmsrl_safe(0xc0011005,&val)) { > >> val |= 1ULL<< 54; > >>- wrmsrl_amd_safe(0xc0011005, val); > >>+ checking_wrmsrl(0xc0011005, val); > >> rdmsrl(0xc0011005, val); > >> if (val& (1ULL<< 54)) { > >> set_cpu_cap(c, X86_FEATURE_TOPOEXT); > > > > > > > > > -- > Andre Przywara > AMD-Operating System Research Center (OSRC), Dresden, Germany
Konrad Rzeszutek Wilk
2012-May-30 14:49 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 07:42:56AM -0700, H. Peter Anvin wrote:> On 05/30/2012 07:23 AM, Jan Beulich wrote: > > > > I see - the Xen code blindly overwrites pv_cpu_ops, despite not > > having initialized all members. That''s an obvious oversight of the > > patch that introduced the _regs variants. > > > > Plus having secondary instances of things like rdmsrl_amd_safe() > > in asm/paravirt.h seems pretty strange an approach (which was > > why initially I didn''t spot how a crash could happen there) - only > > the lowest level functions should get re-implemented here. > > > > This kinds of things are part of why Xen makes me want to cry regularly.It looks like an oversight by Borislav (177fed1ee8d727c39601ce9fc2299b4cb25a718e and 132ec92f) and Yinghai (b05f78f5c713eda2c34e495d92495ee4f1c3b5e1) where they added these wrappers way back in 2009!
Konrad Rzeszutek Wilk
2012-May-30 14:50 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 09:48:51AM -0500, Jacob Shin wrote:> On Wed, May 30, 2012 at 04:02:48PM +0200, Andre Przywara wrote: > > On 05/30/2012 03:33 PM, Jan Beulich wrote: > > >>>>On 30.05.12 at 15:10, Andre Przywara<andre.przywara@amd.com> wrote: > > >>Because we are behind a family check before tweaking the topology > > >>bit, we can use the standard rd/wrmsr variants for the CPUID feature > > >>register. > > >>This fixes a crash when using the kernel as a Xen Dom0 on affected > > >>Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > > >>yet (this will be fixed in another patch). > > > > > >I''m not following: If the AMD variants (putting a special value into > > >%edi) can be freely replaced by the non-AMD variants, why did > > >the AMD special ones get used in the first place? > > > > Older CPUs (K8) needed the AMD variants, starting with family 10h we > > can use the normal versions. > > > > >Further, I can''t see how checking_wrmsrl() is being paravirtualized > > >any better than wrmsrl_amd_safe() - both have nothing but an > > >exception handling fixup attached to the wrmsr invocation. Care > > >to point out what actual crash it is that was seen? > > > > AFAIK, the difference is between the "l" and the regs version for > > rd/wrmsr. We have a patch already here to fix this. Will send it out > > soon. Jacob, can you comment on this? > > Right, the checking_wrmsrl turns into wrmsr_safe which is paravirtualized > but the rdmsrl_amd_safe which turns into rdmsr_regs is not paravirtualized > by enlighten.So would a patch to implements the rdmsr_regs fix this crash?
H. Peter Anvin
2012-May-30 14:50 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 07:39 AM, Konrad Rzeszutek Wilk wrote:> On Wed, May 30, 2012 at 03:10:02PM +0200, Andre Przywara wrote: >> Because we are behind a family check before tweaking the topology >> bit, we can use the standard rd/wrmsr variants for the CPUID feature >> register. >> This fixes a crash when using the kernel as a Xen Dom0 on affected >> Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized >> yet (this will be fixed in another patch). > > So with a rdmsrl_amd_safe and wrmsrl_amd_safe being implemented in > the pv_cpu_ops - would this patch even be neccessary? >That is still bogus; a better thing would be to implement the _regs interface. Even better would be to trap and emulate rdmsr/wrmsr! -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
Konrad Rzeszutek Wilk
2012-May-30 14:51 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 07:50:15AM -0700, H. Peter Anvin wrote:> On 05/30/2012 07:39 AM, Konrad Rzeszutek Wilk wrote: > > On Wed, May 30, 2012 at 03:10:02PM +0200, Andre Przywara wrote: > >> Because we are behind a family check before tweaking the topology > >> bit, we can use the standard rd/wrmsr variants for the CPUID feature > >> register. > >> This fixes a crash when using the kernel as a Xen Dom0 on affected > >> Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > >> yet (this will be fixed in another patch). > > > > So with a rdmsrl_amd_safe and wrmsrl_amd_safe being implemented in > > the pv_cpu_ops - would this patch even be neccessary? > > > > That is still bogus; a better thing would be to implement the _regs > interface. Even better would be to trap and emulate rdmsr/wrmsr!That is what I meant - implement these two: rdmsr_regs = native_rdmsr_safe_regs, .wrmsr_regs = native_wrmsr_safe_regs, Xen already traps the rdmsr/wrms - I believe it just didn''t do anything for this specific MSR.
Borislav Petkov
2012-May-30 14:55 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 07:42:27AM -0700, H. Peter Anvin wrote:> On 05/30/2012 06:10 AM, Andre Przywara wrote: > > Because we are behind a family check before tweaking the topology > > bit, we can use the standard rd/wrmsr variants for the CPUID feature > > register. > > That is not what the *msr*_amd*() functions do. > > NAK. This is a totally bogus patch.The *msr*_amd*() variants were used instead of the normal *msrl_safe variants although the AMD variants weren''t needed there at all. This has no issue on baremetal but breaks xen and this is how we caught this. So the patch corrects the original patch so that xen is happy too. -- Regards/Gruss, Boris.
H. Peter Anvin
2012-May-30 14:58 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 07:55 AM, Borislav Petkov wrote:> On Wed, May 30, 2012 at 07:42:27AM -0700, H. Peter Anvin wrote: >> On 05/30/2012 06:10 AM, Andre Przywara wrote: >>> Because we are behind a family check before tweaking the topology >>> bit, we can use the standard rd/wrmsr variants for the CPUID feature >>> register. >> >> That is not what the *msr*_amd*() functions do. >> >> NAK. This is a totally bogus patch. > > The *msr*_amd*() variants were used instead of the normal *msrl_safe > variants although the AMD variants weren''t needed there at all. > > This has no issue on baremetal but breaks xen and this is how we caught > this. > > So the patch corrects the original patch so that xen is happy too. >If so, then fix the description to match reality and we can take the patch. -hpa
Borislav Petkov
2012-May-30 15:00 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 07:58:23AM -0700, H. Peter Anvin wrote:> If so, then fix the description to match reality and we can take the patch.Andre, would you please do that? Thanks. -- Regards/Gruss, Boris.
H. Peter Anvin
2012-May-30 15:01 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 08:00 AM, Borislav Petkov wrote:> On Wed, May 30, 2012 at 07:58:23AM -0700, H. Peter Anvin wrote: >> If so, then fix the description to match reality and we can take the patch. > > Andre, would you please do that? >Ah, but now we get:>> I''m not following: If the AMD variants (putting a special value into >> %edi) can be freely replaced by the non-AMD variants, why did >> the AMD special ones get used in the first place?> Older CPUs (K8) needed the AMD variants, starting with family 10h we > can use the normal versions.So no, not correct. -hpa
Jacob Shin
2012-May-30 15:03 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 10:50:05AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, May 30, 2012 at 09:48:51AM -0500, Jacob Shin wrote: > > On Wed, May 30, 2012 at 04:02:48PM +0200, Andre Przywara wrote: > > > On 05/30/2012 03:33 PM, Jan Beulich wrote: > > > >>>>On 30.05.12 at 15:10, Andre Przywara<andre.przywara@amd.com> wrote: > > > >>Because we are behind a family check before tweaking the topology > > > >>bit, we can use the standard rd/wrmsr variants for the CPUID feature > > > >>register. > > > >>This fixes a crash when using the kernel as a Xen Dom0 on affected > > > >>Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > > > >>yet (this will be fixed in another patch). > > > > > > > >I''m not following: If the AMD variants (putting a special value into > > > >%edi) can be freely replaced by the non-AMD variants, why did > > > >the AMD special ones get used in the first place? > > > > > > Older CPUs (K8) needed the AMD variants, starting with family 10h we > > > can use the normal versions. > > > > > > >Further, I can''t see how checking_wrmsrl() is being paravirtualized > > > >any better than wrmsrl_amd_safe() - both have nothing but an > > > >exception handling fixup attached to the wrmsr invocation. Care > > > >to point out what actual crash it is that was seen? > > > > > > AFAIK, the difference is between the "l" and the regs version for > > > rd/wrmsr. We have a patch already here to fix this. Will send it out > > > soon. Jacob, can you comment on this? > > > > Right, the checking_wrmsrl turns into wrmsr_safe which is paravirtualized > > but the rdmsrl_amd_safe which turns into rdmsr_regs is not paravirtualized > > by enlighten. > > So would a patch to implements the rdmsr_regs fix this crash?Yes, the following patch also fixed the crash for us: Implement rdmsr_regs and wrmsr_regs for Xen pvops. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: stable@vger.kernel.org # 3.4+ diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 75f33b2..f3ae5ec 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -945,9 +945,12 @@ static void xen_write_cr4(unsigned long cr4) native_write_cr4(cr4); } -static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) +static int xen_wrmsr_safe_regs(u32 regs[8]) { int ret; + unsigned int msr = regs[1]; + unsigned low = regs[0]; + unsigned high = regs[2]; ret = 0; @@ -985,12 +988,23 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) break; default: - ret = native_write_msr_safe(msr, low, high); + ret = native_wrmsr_safe_regs(regs); } return ret; } +static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) +{ + u32 regs[8] = { 0 }; + + regs[0] = low; + regs[1] = msr; + regs[2] = high; + + return xen_wrmsr_safe_regs(regs); +} + void xen_setup_shared_info(void) { if (!xen_feature(XENFEAT_auto_translated_physmap)) { @@ -1116,7 +1130,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { .wbinvd = native_wbinvd, .read_msr = native_read_msr_safe, + .rdmsr_regs = native_rdmsr_safe_regs, .write_msr = xen_write_msr_safe, + .wrmsr_regs = xen_wrmsr_safe_regs, .read_tsc = native_read_tsc, .read_pmc = native_read_pmc,
Borislav Petkov
2012-May-30 15:05 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 08:01:19AM -0700, H. Peter Anvin wrote:> >> I''m not following: If the AMD variants (putting a special value into > >> %edi) can be freely replaced by the non-AMD variants, why did > >> the AMD special ones get used in the first place? > > > Older CPUs (K8) needed the AMD variants, starting with family 10h we > > can use the normal versions. > > So no, not correct.I don''t see what the problem is: the amd* variants shouldn''t have been used at all in the first place. Fullstop. The patch should''ve been doing *msrl_safe from the get-go. Regardless of family. So what''s the issue? -- Regards/Gruss, Boris.
Jan Beulich
2012-May-30 15:08 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 16:50, "H. Peter Anvin" <hpa@zytor.com> wrote: > On 05/30/2012 07:39 AM, Konrad Rzeszutek Wilk wrote: >> On Wed, May 30, 2012 at 03:10:02PM +0200, Andre Przywara wrote: >>> Because we are behind a family check before tweaking the topology >>> bit, we can use the standard rd/wrmsr variants for the CPUID feature >>> register. >>> This fixes a crash when using the kernel as a Xen Dom0 on affected >>> Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized >>> yet (this will be fixed in another patch). >> >> So with a rdmsrl_amd_safe and wrmsrl_amd_safe being implemented in >> the pv_cpu_ops - would this patch even be neccessary? >> > > That is still bogus; a better thing would be to implement the _regs > interface. Even better would be to trap and emulate rdmsr/wrmsr!The crash is not on the wrmsr instruction, but on the paravirt layer finding a NULL pointer in one of the methods. Xen does trap and emulate (possibly just ignore) both instructions. Jan
Borislav Petkov
2012-May-30 15:12 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 10:49:29AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, May 30, 2012 at 07:42:56AM -0700, H. Peter Anvin wrote: > > On 05/30/2012 07:23 AM, Jan Beulich wrote: > > > > > > I see - the Xen code blindly overwrites pv_cpu_ops, despite not > > > having initialized all members. That''s an obvious oversight of the > > > patch that introduced the _regs variants. > > > > > > Plus having secondary instances of things like rdmsrl_amd_safe() > > > in asm/paravirt.h seems pretty strange an approach (which was > > > why initially I didn''t spot how a crash could happen there) - only > > > the lowest level functions should get re-implemented here. > > > > > > > This kinds of things are part of why Xen makes me want to cry regularly. > > It looks like an oversight by Borislav (177fed1ee8d727c39601ce9fc2299b4cb25a718e > and 132ec92f) and Yinghai (b05f78f5c713eda2c34e495d92495ee4f1c3b5e1) where > they added these wrappers way back in 2009!This is exactly why xen has nothing to do in arch/x86/. It is not an oversight - I simply didn''t test it on xen because I don''t care about it. Remember our last discussion about mcelog? This current case should be a perfect example for why xen shouldn''t be sprinkling code all over the place. -- Regards/Gruss, Boris.
H. Peter Anvin
2012-May-30 15:15 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 08:08 AM, Jan Beulich wrote:> > Xen does trap and emulate (possibly just ignore) both instructions. >Then why the fsck is there paravirt_crap on this? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
Jan Beulich
2012-May-30 15:35 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 17:15, "H. Peter Anvin" <hpa@zytor.com> wrote: > On 05/30/2012 08:08 AM, Jan Beulich wrote: >> >> Xen does trap and emulate (possibly just ignore) both instructions. >> > > Then why the fsck is there paravirt_crap on this?I have no clue. Jeremy, Konrad? Jan
Jan Beulich
2012-May-30 15:40 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 17:12, Borislav Petkov <bp@alien8.de> wrote: > This current case should be a perfect example for why xen shouldn''t be > sprinkling code all over the place.Which means you''re denying the benefits of para-virtualization (at the base system level; perhaps it''s less of a problem for you when it comes to pv device drivers, which are generally standalone entities) as that''s what distinguishes Xen from all other virtualization solutions Linux supports. Jan
H. Peter Anvin
2012-May-30 15:45 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 08:40 AM, Jan Beulich wrote:>>>> On 30.05.12 at 17:12, Borislav Petkov <bp@alien8.de> wrote: >> This current case should be a perfect example for why xen shouldn''t be >> sprinkling code all over the place. > > Which means you''re denying the benefits of para-virtualization > (at the base system level; perhaps it''s less of a problem for you > when it comes to pv device drivers, which are generally > standalone entities) as that''s what distinguishes Xen from all > other virtualization solutions Linux supports. >And it also denies the just atrocious cost of Xen grabbing random kernel internals and turning them into ABIs that we have to work around from that point on. This is particularly obnoxious because the cost is largely not borne by the Xen community but by the general Linux development. -hpa
Borislav Petkov
2012-May-30 15:58 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 04:40:37PM +0100, Jan Beulich wrote:> >>> On 30.05.12 at 17:12, Borislav Petkov <bp@alien8.de> wrote: > > This current case should be a perfect example for why xen shouldn''t be > > sprinkling code all over the place. > > Which means you''re denying the benefits of para-virtualizationDoes it really mean that?> (at the base system level; perhaps it''s less of a problem for > you when it comes to pv device drivers, which are generally > standalone entities) as that''s what distinguishes Xen from all other > virtualization solutions Linux supports.All I''m saying is, xen should solve the whole deal of what it wants to do (whatever that is) _without_ and _agnostic_ from arch/x86/. Otherwise x86 changes break xen. I couldn''t care less about what distinguishes xen from all other solutions. -- Regards/Gruss, Boris.
Konrad Rzeszutek Wilk
2012-May-30 16:48 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 04:35:41PM +0100, Jan Beulich wrote:> >>> On 30.05.12 at 17:15, "H. Peter Anvin" <hpa@zytor.com> wrote: > > On 05/30/2012 08:08 AM, Jan Beulich wrote: > >> > >> Xen does trap and emulate (possibly just ignore) both instructions. > >> > > > > Then why the fsck is there paravirt_crap on this? > > I have no clue. Jeremy, Konrad?No idea. The git 132ec92f says: Borislav Petkov <petkovbb@googlemail.com> Date: Mon Aug 31 09:50:09 2009 +0200 x86, msr: Add rd/wrmsr interfaces with preset registers native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow presetting of a subset of eight x86 GPRs before executing the rd/wrmsr instructions. This is needed at least on AMD K8 for accessing an erratum workaround MSR. Originally based on an idea by H. Peter Anvin.
Konrad Rzeszutek Wilk
2012-May-30 17:17 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
> Yes, the following patch also fixed the crash for us: > > Implement rdmsr_regs and wrmsr_regs for Xen pvops.That needs more data. Such as the reason for it, the crash tombstone, and an analysis of the bug. But at this point I am not sure what we are going to do. I think Peter leans towards ripping the .rdmsr_regs/wdmsr_regs function out altogether (so altering the amd_rdmsr... to use the .rdmsr/.wrdmsr). At which point I think this would all work just fine? I am tempted to write a patch that checks all the pv-cpu-ops to see if there are any that are NULL and throw a warning so that this does not hit us in the future - to be at least more proactive about this sort of thing.> > Signed-off-by: Jacob Shin <jacob.shin@amd.com> > Cc: stable@vger.kernel.org # 3.4+ > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index 75f33b2..f3ae5ec 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -945,9 +945,12 @@ static void xen_write_cr4(unsigned long cr4) > native_write_cr4(cr4); > } > > -static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) > +static int xen_wrmsr_safe_regs(u32 regs[8]) > { > int ret; > + unsigned int msr = regs[1]; > + unsigned low = regs[0]; > + unsigned high = regs[2]; > > ret = 0; > > @@ -985,12 +988,23 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) > break; > > default: > - ret = native_write_msr_safe(msr, low, high); > + ret = native_wrmsr_safe_regs(regs); > } > > return ret; > } > > +static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) > +{ > + u32 regs[8] = { 0 }; > + > + regs[0] = low; > + regs[1] = msr; > + regs[2] = high; > + > + return xen_wrmsr_safe_regs(regs); > +} > + > void xen_setup_shared_info(void) > { > if (!xen_feature(XENFEAT_auto_translated_physmap)) { > @@ -1116,7 +1130,9 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > .wbinvd = native_wbinvd, > > .read_msr = native_read_msr_safe, > + .rdmsr_regs = native_rdmsr_safe_regs, > .write_msr = xen_write_msr_safe, > + .wrmsr_regs = xen_wrmsr_safe_regs, > .read_tsc = native_read_tsc, > .read_pmc = native_read_pmc, > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
H. Peter Anvin
2012-May-30 17:31 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 10:17 AM, Konrad Rzeszutek Wilk wrote:>> Yes, the following patch also fixed the crash for us: >> >> Implement rdmsr_regs and wrmsr_regs for Xen pvops. > > That needs more data. Such as the reason for it, the crash > tombstone, and an analysis of the bug. > > But at this point I am not sure what we are going to do. > > I think Peter leans towards ripping the .rdmsr_regs/wdmsr_regs > function out altogether (so altering the amd_rdmsr... to use the > .rdmsr/.wrdmsr). At which point I think this would all work > just fine? >No, you can''t just do that. Rip them out as in trap and emulate. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
Borislav Petkov
2012-May-30 17:32 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 01:17:54PM -0400, Konrad Rzeszutek Wilk wrote:> > Yes, the following patch also fixed the crash for us: > > > > Implement rdmsr_regs and wrmsr_regs for Xen pvops. > > That needs more data. Such as the reason for it, the crash > tombstone, and an analysis of the bug. > > But at this point I am not sure what we are going to do. > > I think Peter leans towards ripping the .rdmsr_regs/wdmsr_regs > function out altogether (so altering the amd_rdmsr... to use the > .rdmsr/.wrdmsr). At which point I think this would all work > just fine?I wouldn''t do that. Andre''s patch switches to the rdmsrl_safe/checking_wrmsrl functions so you don''t need to implement the _regs functions for pvops. I''ll send it now with corrected commit message.> I am tempted to write a patch that checks all the pv-cpu-ops to see if > there are any that are NULL and throw a warning so that this does not > hit us in the future - to be at least more proactive about this sort > of thing.This could be a prudent thing to do. -- Regards/Gruss, Boris.
Borislav Petkov
2012-May-30 17:47 UTC
[PATCH] x86, AMD: Fix crash as Xen Dom0 on AMD Trinity systems
From: Andre Przywara <andre.przywara@amd.com> Switch to the standard {rd,wr}msr*_safe* variants which should''ve been used in the first place anyway and avoided unneeded excitation with xen. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # 3.4+ Link: <http://lkml.kernel.org/r/1338383402-3838-1-git-send-email-andre.przywara@amd.com> [Boris: correct commit message] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/kernel/cpu/amd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 146bb6218eec..80ccd99542e6 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -586,9 +586,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) !cpu_has(c, X86_FEATURE_TOPOEXT)) { u64 val; - if (!rdmsrl_amd_safe(0xc0011005, &val)) { + if (!rdmsrl_safe(0xc0011005, &val)) { val |= 1ULL << 54; - wrmsrl_amd_safe(0xc0011005, val); + checking_wrmsrl(0xc0011005, val); rdmsrl(0xc0011005, val); if (val & (1ULL << 54)) { set_cpu_cap(c, X86_FEATURE_TOPOEXT); -- 1.7.9.3.362.g71319
H. Peter Anvin
2012-May-30 17:47 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 10:32 AM, Borislav Petkov wrote:> > I wouldn''t do that. Andre''s patch switches to the > rdmsrl_safe/checking_wrmsrl functions so you don''t need to implement the > _regs functions for pvops. >But we just stated - it is wrong for K8? -hpa
Borislav Petkov
2012-May-30 17:51 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 10:47:32AM -0700, H. Peter Anvin wrote:> > I wouldn''t do that. Andre''s patch switches to the > > rdmsrl_safe/checking_wrmsrl functions so you don''t need to implement the > > _regs functions for pvops. > > > But we just stated - it is wrong for K8?Not executed on K8 - only on F15h, models 0x10 - 0x1f. -- Regards/Gruss, Boris.
H. Peter Anvin
2012-May-30 18:00 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 10:51 AM, Borislav Petkov wrote:> On Wed, May 30, 2012 at 10:47:32AM -0700, H. Peter Anvin wrote: >>> I wouldn''t do that. Andre''s patch switches to the >>> rdmsrl_safe/checking_wrmsrl functions so you don''t need to implement the >>> _regs functions for pvops. >>> >> But we just stated - it is wrong for K8? > > Not executed on K8 - only on F15h, models 0x10 - 0x1f. >OK. But there is still the general problem, no? -hpa
Borislav Petkov
2012-May-30 18:17 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 11:00:23AM -0700, H. Peter Anvin wrote:> OK. But there is still the general problem, no?With this patch xen crashes go away because they paravirt native_{read,write}_msr_safe. The other place where we use the amd_safe variants is an obscure K8, revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet so I''m assuming people haven''t run xen on such boxes yet. Does it need fixing? Probably, if we really really have to. Now, someone probably needs to paravirt the *safe_regs variants in case something else decides to use them. I don''t know what to do here, do I want more paravirt code in there? No. I guess if this is done carefully and cleanly, then it should be ok but it can''t be done like that - it needs to adhere to the current pv_cpu_ops thing which is already there. Hmm... -- Regards/Gruss, Boris.
Borislav Petkov
2012-May-30 18:19 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 08:17:23PM +0200, Borislav Petkov wrote:> The other place where we use the amd_safe variants is an obscure K8, > revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet > so I''m assuming people haven''t run xen on such boxes yet. Does it need > fixing? Probably, if we really really have to.I''m being told we''re safe here. Those boxes didn''t have virtualization support yet (SVM) so this one doesn''t need fixing. -- Regards/Gruss, Boris.
H. Peter Anvin
2012-May-30 18:20 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 11:17 AM, Borislav Petkov wrote:> On Wed, May 30, 2012 at 11:00:23AM -0700, H. Peter Anvin wrote: >> OK. But there is still the general problem, no? > > With this patch xen crashes go away because they paravirt > native_{read,write}_msr_safe. > > The other place where we use the amd_safe variants is an obscure K8, > revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet > so I''m assuming people haven''t run xen on such boxes yet. Does it need > fixing? Probably, if we really really have to. > > Now, someone probably needs to paravirt the *safe_regs variants in case > something else decides to use them. I don''t know what to do here, do I > want more paravirt code in there? No. I guess if this is done carefully > and cleanly, then it should be ok but it can''t be done like that - it > needs to adhere to the current pv_cpu_ops thing which is already there. >I thought I was being told that Xen would trap and emulate the rdmsr/wrmsr instructions. I guess they don''t want to do that for the handful of performance-sensitive MSRs there are, but those don''t use the *_regs variants. -hpa
H. Peter Anvin
2012-May-30 18:21 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 11:19 AM, Borislav Petkov wrote:> On Wed, May 30, 2012 at 08:17:23PM +0200, Borislav Petkov wrote: >> The other place where we use the amd_safe variants is an obscure K8, >> revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet >> so I''m assuming people haven''t run xen on such boxes yet. Does it need >> fixing? Probably, if we really really have to. > > I''m being told we''re safe here. Those boxes didn''t have virtualization > support yet (SVM) so this one doesn''t need fixing. >This is for PV, not for HVM (VT-x/SVM). HVM doesn''t have this kind of drain bamage. -hpa
Borislav Petkov
2012-May-30 18:29 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 11:21:00AM -0700, H. Peter Anvin wrote:> This is for PV, not for HVM (VT-x/SVM). HVM doesn''t have this kind of > drain bamage.Frankly, I wouldn''t want to fix this for K8 only - they''re going away anyway. And besides, this is a BIOS fix for a very small subset of very early K8s. It probably shouldnt''ve been there in the first place but at the time I did it I wanted to fix the whole world (I''m much more reasonable now :-)). IOW, I''d like to leave sleeping dogs lie if you don''t mind. -- Regards/Gruss, Boris.
Konrad Rzeszutek Wilk
2012-May-30 22:23 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, May 30, 2012 at 10:31:41AM -0700, H. Peter Anvin wrote:> On 05/30/2012 10:17 AM, Konrad Rzeszutek Wilk wrote: > >> Yes, the following patch also fixed the crash for us: > >> > >> Implement rdmsr_regs and wrmsr_regs for Xen pvops. > > > > That needs more data. Such as the reason for it, the crash > > tombstone, and an analysis of the bug. > > > > But at this point I am not sure what we are going to do. > > > > I think Peter leans towards ripping the .rdmsr_regs/wdmsr_regs > > function out altogether (so altering the amd_rdmsr... to use the > > .rdmsr/.wrdmsr). At which point I think this would all work > > just fine? > > > > No, you can''t just do that. Rip them out as in trap and emulate.Then this should do it: diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 75f33b2..e74df95 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { .wbinvd = native_wbinvd, .read_msr = native_read_msr_safe, + .rdmsr_regs = native_rdmsr_safe_regs, .write_msr = xen_write_msr_safe, + .wrmsr_regs = native_wrmsr_safe_regs, + .read_tsc = native_read_tsc, .read_pmc = native_read_pmc,
Konrad Rzeszutek Wilk
2012-May-30 22:33 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
> > Now, someone probably needs to paravirt the *safe_regs variants in case > > something else decides to use them. I don''t know what to do here, do I > > want more paravirt code in there? No. I guess if this is done carefully > > and cleanly, then it should be ok but it can''t be done like that - it > > needs to adhere to the current pv_cpu_ops thing which is already there.Using the native variant seems the right thing to do.> > > > I thought I was being told that Xen would trap and emulate the > rdmsr/wrmsr instructions. I guess they don''t want to do that for theIt does.> handful of performance-sensitive MSRs there are, but those don''t use the > *_regs variants.The underlaying issue (as I understand) was that .rdmsr_regs (and the corresponding write) was NULL and that caused the crash. This tiny patch should do it: diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 75f33b2..e74df95 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { .wbinvd = native_wbinvd, .read_msr = native_read_msr_safe, + .rdmsr_regs = native_rdmsr_safe_regs, .write_msr = xen_write_msr_safe, + .wrmsr_regs = native_wrmsr_safe_regs, + .read_tsc = native_read_tsc, .read_pmc = native_read_pmc,
H. Peter Anvin
2012-May-30 23:09 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 03:33 PM, Konrad Rzeszutek Wilk wrote:> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index 75f33b2..e74df95 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > .wbinvd = native_wbinvd, > > .read_msr = native_read_msr_safe, > + .rdmsr_regs = native_rdmsr_safe_regs, > .write_msr = xen_write_msr_safe, > + .wrmsr_regs = native_wrmsr_safe_regs, > + > .read_tsc = native_read_tsc, > .read_pmc = native_read_pmc, >Okay, tell me again: why do we have these hooks again? Can we please weed out paravirt methods that have no users? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
H. Peter Anvin
2012-May-30 23:31 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/30/2012 06:10 AM, Andre Przywara wrote:> Because we are behind a family check before tweaking the topology > bit, we can use the standard rd/wrmsr variants for the CPUID feature > register. > This fixes a crash when using the kernel as a Xen Dom0 on affected > Trinity systems. The wrmsrl_amd_safe is not properly paravirtualized > yet (this will be fixed in another patch). > > Signed-off-by: Andre Przywara <andre.przywara@amd.com> > Cc: stable@vger.kernel.org # 3.4+With Konrad''s patch, this is no longer relevant, correct? -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
Jan Beulich
2012-May-31 07:17 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 19:17, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > I am tempted to write a patch that checks all the pv-cpu-ops > to see if there are any that are NULL and throw a warning so > that this does not hit us in the future - to be at least more > proactive about this sort of thing.Perhaps rather than using C99 initializers, using old-style ones would be an alternative (assuming that the signatures of the respective entries [or at least immediately neighboring ones] are different), with a sentinel that is required to remain last (i.e. adding at the very end would be prohibited)? Or rather than doing a full structure assignment, assign individual members directly to pv_cpu_ops (thus leaving everything that''s not explicitly overridden at its "native" default)? After all, this is being done on __init code, so the few extra code bytes shouldn''t matter much? (All this of course in the context of hpa''s valid request that there be no unused paravirt hooks in the first place.) Jan
Jan Beulich
2012-May-31 07:39 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
>>> On 30.05.12 at 20:17, Borislav Petkov <bp@alien8.de> wrote: > On Wed, May 30, 2012 at 11:00:23AM -0700, H. Peter Anvin wrote: > The other place where we use the amd_safe variants is an obscure K8, > revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet > so I''m assuming people haven''t run xen on such boxes yet. Does it need > fixing? Probably, if we really really have to.This again is something that shouldn''t even be attempted under Xen. The hypervisor, unless really old, does this (and wouldn''t allow the write by any domain - privileged or not - anyway). There''s one more user though - the code triggered by the "show_msr=" command line option. This one indeed requires rdmsr_safe_regs to be functional (albeit under Xen, once again, this won''t work currently anyway for those MRS on old CPUs where the special key in %edi is required, which the emulation code in Xen doesn''t support). Jan
Andre Przywara
2012-May-31 12:24 UTC
Re: [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/31/2012 12:33 AM, Konrad Rzeszutek Wilk wrote:>>> Now, someone probably needs to paravirt the *safe_regs variants in case >>> something else decides to use them. I don''t know what to do here, do I >>> want more paravirt code in there? No. I guess if this is done carefully >>> and cleanly, then it should be ok but it can''t be done like that - it >>> needs to adhere to the current pv_cpu_ops thing which is already there. > > Using the native variant seems the right thing to do. >>> >> >> I thought I was being told that Xen would trap and emulate the >> rdmsr/wrmsr instructions. I guess they don''t want to do that for the > > It does. >> handful of performance-sensitive MSRs there are, but those don''t use the >> *_regs variants. > > The underlaying issue (as I understand) was that .rdmsr_regs > (and the corresponding write) was NULL and that caused the crash. > This tiny patch should do it: > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index 75f33b2..e74df95 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > .wbinvd = native_wbinvd, > > .read_msr = native_read_msr_safe, > + .rdmsr_regs = native_rdmsr_safe_regs, > .write_msr = xen_write_msr_safe, > + .wrmsr_regs = native_wrmsr_safe_regs, > + > .read_tsc = native_read_tsc, > .read_pmc = native_read_pmc, > >Acked-by: Andre Przywara <andre.przywara@amd.com> This works on the test machine. Though I''d still like to have my original patch applied, because it makes the thing a bit cleaner. And I made a patch to remove the {rd,wr}msr_regs hooks from paravirt_ops completely. Shall I send it out or do you want to make this part of larger patch series to clean up pvops? Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
Konrad Rzeszutek Wilk
2012-May-31 15:27 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
> >diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > >index 75f33b2..e74df95 100644 > >--- a/arch/x86/xen/enlighten.c > >+++ b/arch/x86/xen/enlighten.c > >@@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > > .wbinvd = native_wbinvd, > > > > .read_msr = native_read_msr_safe, > >+ .rdmsr_regs = native_rdmsr_safe_regs, > > .write_msr = xen_write_msr_safe, > >+ .wrmsr_regs = native_wrmsr_safe_regs, > >+ > > .read_tsc = native_read_tsc, > > .read_pmc = native_read_pmc, > > > > > > Acked-by: Andre Przywara <andre.przywara@amd.com> > > This works on the test machine.Great! Thanks for doing a quick test for this.> > Though I''d still like to have my original patch applied, because it > makes the thing a bit cleaner.OK. Please re-send with an up-to-date git commit as suggested by Peter.> > And I made a patch to remove the {rd,wr}msr_regs hooks from > paravirt_ops completely. Shall I send it out or do you want to make > this part of larger patch series to clean up pvops?Please do send it out. Thanks again!
H. Peter Anvin
2012-May-31 15:59 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On 05/31/2012 12:17 AM, Jan Beulich wrote:>>>> On 30.05.12 at 19:17, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> I am tempted to write a patch that checks all the pv-cpu-ops >> to see if there are any that are NULL and throw a warning so >> that this does not hit us in the future - to be at least more >> proactive about this sort of thing. > > Perhaps rather than using C99 initializers, using old-style ones > would be an alternative (assuming that the signatures of the > respective entries [or at least immediately neighboring ones] > are different), with a sentinel that is required to remain last > (i.e. adding at the very end would be prohibited)? > > Or rather than doing a full structure assignment, assign > individual members directly to pv_cpu_ops (thus leaving > everything that''s not explicitly overridden at its "native" > default)? After all, this is being done on __init code, so the > few extra code bytes shouldn''t matter much? (All this of > course in the context of hpa''s valid request that there be > no unused paravirt hooks in the first place.) >Actually there is a really easy way to do this with C99 initializers: create a macro with all the default assignments, and put that one first. This is because it is legal to have more than one C99 initializer for the same member, the last one is the one that takes effect. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf.
Borislav Petkov
2012-May-31 16:55 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Thu, May 31, 2012 at 08:39:15AM +0100, Jan Beulich wrote:> >>> On 30.05.12 at 20:17, Borislav Petkov <bp@alien8.de> wrote: > > On Wed, May 30, 2012 at 11:00:23AM -0700, H. Peter Anvin wrote: > > The other place where we use the amd_safe variants is an obscure K8, > > revC and earlier fix for _some_ BIOSen and this hasn''t bitten us yet > > so I''m assuming people haven''t run xen on such boxes yet. Does it need > > fixing? Probably, if we really really have to. > > This again is something that shouldn''t even be attempted under > Xen. The hypervisor, unless really old, does this (and wouldn''t > allow the write by any domain - privileged or not - anyway). > > There''s one more user though - the code triggered by the > "show_msr=" command line option. This one indeed requires > rdmsr_safe_regs to be functional (albeit under Xen, once > again, this won''t work currently anyway for those MRS on > old CPUs where the special key in %edi is required, which the > emulation code in Xen doesn''t support).This doesn''t look right. Yinghai, why does generic x86 code use rdmsrl_amd_safe - it should simply use rdmsrl_safe. -- Regards/Gruss, Boris.
Ingo Molnar
2012-Jun-06 09:27 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
* H. Peter Anvin <hpa@zytor.com> wrote:> On 05/30/2012 03:33 PM, Konrad Rzeszutek Wilk wrote: > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > > index 75f33b2..e74df95 100644 > > --- a/arch/x86/xen/enlighten.c > > +++ b/arch/x86/xen/enlighten.c > > @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > > .wbinvd = native_wbinvd, > > > > .read_msr = native_read_msr_safe, > > + .rdmsr_regs = native_rdmsr_safe_regs, > > .write_msr = xen_write_msr_safe, > > + .wrmsr_regs = native_wrmsr_safe_regs, > > + > > .read_tsc = native_read_tsc, > > .read_pmc = native_read_pmc, > > > > Okay, tell me again: > > why do we have these hooks again? Can we please weed out > paravirt methods that have no users?ping? Ingo
Borislav Petkov
2012-Jun-06 09:42 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
On Wed, Jun 06, 2012 at 11:27:13AM +0200, Ingo Molnar wrote:> > * H. Peter Anvin <hpa@zytor.com> wrote: > > > On 05/30/2012 03:33 PM, Konrad Rzeszutek Wilk wrote: > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > > > index 75f33b2..e74df95 100644 > > > --- a/arch/x86/xen/enlighten.c > > > +++ b/arch/x86/xen/enlighten.c > > > @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > > > .wbinvd = native_wbinvd, > > > > > > .read_msr = native_read_msr_safe, > > > + .rdmsr_regs = native_rdmsr_safe_regs, > > > .write_msr = xen_write_msr_safe, > > > + .wrmsr_regs = native_wrmsr_safe_regs, > > > + > > > .read_tsc = native_read_tsc, > > > .read_pmc = native_read_pmc, > > > > > > > Okay, tell me again: > > > > why do we have these hooks again? Can we please weed out > > paravirt methods that have no users? > > ping?I think we solved all that and hpa wanted to queue them up shortly: http://marc.info/?l=linux-kernel&m=133856342618027&w=2 If it''s easier, I can send you a pull request later. Thanks. -- Regards/Gruss, Boris.
Ingo Molnar
2012-Jun-06 09:45 UTC
Re: [Xen-devel] [PATCH] x86/amd: fix crash as Xen Dom0 on AMD Trinity systems
* Borislav Petkov <bp@alien8.de> wrote:> On Wed, Jun 06, 2012 at 11:27:13AM +0200, Ingo Molnar wrote: > > > > * H. Peter Anvin <hpa@zytor.com> wrote: > > > > > On 05/30/2012 03:33 PM, Konrad Rzeszutek Wilk wrote: > > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > > > > index 75f33b2..e74df95 100644 > > > > --- a/arch/x86/xen/enlighten.c > > > > +++ b/arch/x86/xen/enlighten.c > > > > @@ -1116,7 +1116,10 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { > > > > .wbinvd = native_wbinvd, > > > > > > > > .read_msr = native_read_msr_safe, > > > > + .rdmsr_regs = native_rdmsr_safe_regs, > > > > .write_msr = xen_write_msr_safe, > > > > + .wrmsr_regs = native_wrmsr_safe_regs, > > > > + > > > > .read_tsc = native_read_tsc, > > > > .read_pmc = native_read_pmc, > > > > > > > > > > Okay, tell me again: > > > > > > why do we have these hooks again? Can we please weed out > > > paravirt methods that have no users? > > > > ping? > > I think we solved all that and hpa wanted to queue them up shortly: > > http://marc.info/?l=linux-kernel&m=133856342618027&w=2Great, will wait for hpa then. Thanks, Ingo