Andrew Cooper
2013-Dec-03 20:34 UTC
[PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
IBM System x3530 M4 BIOSes (including the latest available at the time of this patch) will corrupt a byte at physical address 0x105ff1 to the value of 0x86 if %esp has the value 0x00080000 when issuing an `int $0x15 (ax=0xec00)` to inform the system about our intended operating mode. Xen gets unhappy when the bootloader has placed it''s .text section in over this specific region of RAM. After dropping into 16bit mode, initialise as much state as we possibly can to sane values. This includes 0 for all the GPRs and %cs for %fs and %gs which would otherwise be unreal segment selectors. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: George Dunlap <george.dunlap@eu.citrix.com> --- George: * This fixes a memory corruption issue, so counts towards both #1 and #2 as far as a freeze exception goes. --- xen/arch/x86/boot/trampoline.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S index 827f412..f4dfb94 100644 --- a/xen/arch/x86/boot/trampoline.S +++ b/xen/arch/x86/boot/trampoline.S @@ -140,10 +140,12 @@ trampoline_boot_cpu_entry: 1: mov %cs,%ax mov %ax,%ds mov %ax,%es + mov %ax,%fs + mov %ax,%gs mov %ax,%ss /* Initialise stack pointer and IDT, and enable irqs. */ - xor %sp,%sp + xor %esp,%esp lidt bootsym(rm_idt) sti @@ -151,6 +153,11 @@ trampoline_boot_cpu_entry: * Declare that our target operating mode is long mode. * Initialise 32-bit registers since some buggy BIOSes depend on it. */ + xor %ecx,%ecx + xor %edx,%edx + xor %esi,%esi + xor %edi,%edi + xor %ebp,%ebp movl $0xec00,%eax # declare target operating mode movl $0x0002,%ebx # long mode int $0x15 -- 1.7.10.4
Keir Fraser
2013-Dec-04 07:17 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
On 03/12/2013 20:34, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:> IBM System x3530 M4 BIOSes (including the latest available at the time of this > patch) will corrupt a byte at physical address 0x105ff1 to the value of 0x86 > if %esp has the value 0x00080000 when issuing an `int $0x15 (ax=0xec00)` to > inform the system about our intended operating mode. > > Xen gets unhappy when the bootloader has placed it''s .text section in over > this specific region of RAM. > > After dropping into 16bit mode, initialise as much state as we possibly can to > sane values. This includes 0 for all the GPRs and %cs for %fs and %gs which > would otherwise be unreal segment selectors. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: George Dunlap <george.dunlap@eu.citrix.com>Acked-by: Keir Fraser <keir@xen.org>> --- > > George: > * This fixes a memory corruption issue, so counts towards both #1 and #2 as > far as a freeze exception goes. > --- > xen/arch/x86/boot/trampoline.S | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S > index 827f412..f4dfb94 100644 > --- a/xen/arch/x86/boot/trampoline.S > +++ b/xen/arch/x86/boot/trampoline.S > @@ -140,10 +140,12 @@ trampoline_boot_cpu_entry: > 1: mov %cs,%ax > mov %ax,%ds > mov %ax,%es > + mov %ax,%fs > + mov %ax,%gs > mov %ax,%ss > > /* Initialise stack pointer and IDT, and enable irqs. */ > - xor %sp,%sp > + xor %esp,%esp > lidt bootsym(rm_idt) > sti > > @@ -151,6 +153,11 @@ trampoline_boot_cpu_entry: > * Declare that our target operating mode is long mode. > * Initialise 32-bit registers since some buggy BIOSes depend on it. > */ > + xor %ecx,%ecx > + xor %edx,%edx > + xor %esi,%esi > + xor %edi,%edi > + xor %ebp,%ebp > movl $0xec00,%eax # declare target operating mode > movl $0x0002,%ebx # long mode > int $0x15
Jan Beulich
2013-Dec-04 10:03 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
>>> On 03.12.13 at 21:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > --- a/xen/arch/x86/boot/trampoline.S > +++ b/xen/arch/x86/boot/trampoline.S > @@ -140,10 +140,12 @@ trampoline_boot_cpu_entry: > 1: mov %cs,%ax > mov %ax,%ds > mov %ax,%es > + mov %ax,%fs > + mov %ax,%gs > mov %ax,%ss > > /* Initialise stack pointer and IDT, and enable irqs. */ > - xor %sp,%sp > + xor %esp,%espAccording to your findings this one line change is really all that''s needed. While I may be willing to accept the setting of %fs and %gs, despite them being set to BOOT_PSEUDORM_DS right before leaving protected mode (albeit I think it would be better to clear them than to make them match %cs), ...> @@ -151,6 +153,11 @@ trampoline_boot_cpu_entry: > * Declare that our target operating mode is long mode. > * Initialise 32-bit registers since some buggy BIOSes depend on it. > */ > + xor %ecx,%ecx > + xor %edx,%edx > + xor %esi,%esi > + xor %edi,%edi > + xor %ebp,%ebp > movl $0xec00,%eax # declare target operating mode > movl $0x0002,%ebx # long mode > int $0x15... I can''t really see the value of the change here: If we''re to work around theoretical BIOS bugs, we''d need to do this prior to each BIOS call. That''s surely overkill. Therefore let''s focus on what is needed to work around _known_ BIOS bugs. Jan
George Dunlap
2013-Dec-04 10:09 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
On 12/03/2013 08:34 PM, Andrew Cooper wrote:> IBM System x3530 M4 BIOSes (including the latest available at the time of this > patch) will corrupt a byte at physical address 0x105ff1 to the value of 0x86 > if %esp has the value 0x00080000 when issuing an `int $0x15 (ax=0xec00)` to > inform the system about our intended operating mode. > > Xen gets unhappy when the bootloader has placed it''s .text section in over > this specific region of RAM. > > After dropping into 16bit mode, initialise as much state as we possibly can to > sane values. This includes 0 for all the GPRs and %cs for %fs and %gs which > would otherwise be unreal segment selectors. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: George Dunlap <george.dunlap@eu.citrix.com> > > --- > > George: > * This fixes a memory corruption issue, so counts towards both #1 and #2 as > far as a freeze exception goes.The general rule during the code freeze is that bug fixes are exempt from needing a freeze exception, unless the maintainers think that it''s particularly risky. I suppose one could quibble about whether this is a "bug fix" or a "work-around for broken BIOS" though, so just in case: Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Andrew Cooper
2013-Dec-04 10:35 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
On 04/12/13 10:03, Jan Beulich wrote:>>>> On 03.12.13 at 21:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> --- a/xen/arch/x86/boot/trampoline.S >> +++ b/xen/arch/x86/boot/trampoline.S >> @@ -140,10 +140,12 @@ trampoline_boot_cpu_entry: >> 1: mov %cs,%ax >> mov %ax,%ds >> mov %ax,%es >> + mov %ax,%fs >> + mov %ax,%gs >> mov %ax,%ss >> >> /* Initialise stack pointer and IDT, and enable irqs. */ >> - xor %sp,%sp >> + xor %esp,%esp > According to your findings this one line change is really all that''s > needed.I believe this to be the case, yes.> While I may be willing to accept the setting of %fs and > %gs, despite them being set to BOOT_PSEUDORM_DS right > before leaving protected mode (albeit I think it would be better > to clear them than to make them match %cs), ...The set to BOOT_PSEUDORM_DS in 32bit mode is quite pointless, as they are never used and reloaded moments later in 16bit mode. I have already queued it up in my Xen-4.5 improvements series to the early boot code which I have been collecting while debugging this issue.> >> @@ -151,6 +153,11 @@ trampoline_boot_cpu_entry: >> * Declare that our target operating mode is long mode. >> * Initialise 32-bit registers since some buggy BIOSes depend on it. >> */ >> + xor %ecx,%ecx >> + xor %edx,%edx >> + xor %esi,%esi >> + xor %edi,%edi >> + xor %ebp,%ebp >> movl $0xec00,%eax # declare target operating mode >> movl $0x0002,%ebx # long mode >> int $0x15 > ... I can''t really see the value of the change here: If we''re to > work around theoretical BIOS bugs, we''d need to do this prior to > each BIOS call. That''s surely overkill. Therefore let''s focus on > what is needed to work around _known_ BIOS bugs. > > Jan >I admit that I was leaning on the cautious side with these changes. I can take them out if you think that would be better, but given this int was already flagged as buggy in some BIOSes, and we have found another case, I think covering all GPRs is the safer option. ~Andrew
Jan Beulich
2013-Dec-04 10:47 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
>>> On 04.12.13 at 11:35, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 04/12/13 10:03, Jan Beulich wrote: >>>>> On 03.12.13 at 21:34, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/arch/x86/boot/trampoline.S >>> +++ b/xen/arch/x86/boot/trampoline.S >>> @@ -140,10 +140,12 @@ trampoline_boot_cpu_entry: >>> 1: mov %cs,%ax >>> mov %ax,%ds >>> mov %ax,%es >>> + mov %ax,%fs >>> + mov %ax,%gs >>> mov %ax,%ss >>> >>> /* Initialise stack pointer and IDT, and enable irqs. */ >>> - xor %sp,%sp >>> + xor %esp,%esp >> According to your findings this one line change is really all that''s >> needed. > > I believe this to be the case, yes. > >> While I may be willing to accept the setting of %fs and >> %gs, despite them being set to BOOT_PSEUDORM_DS right >> before leaving protected mode (albeit I think it would be better >> to clear them than to make them match %cs), ... > > The set to BOOT_PSEUDORM_DS in 32bit mode is quite pointless, as they > are never used and reloaded moments later in 16bit mode. I have already > queued it up in my Xen-4.5 improvements series to the early boot code > which I have been collecting while debugging this issue.This isn''t pointless at all - it sets the segment attributes to real mode compatible values (which the reloading in real mode does _not_ do - this changes only the segment base). Since we don''t care about the particular values (read: base addresses) of %fs and %gs (but we do care about those in the other four selectors), not reloading them after entering real mode is actually quite sensible.>>> @@ -151,6 +153,11 @@ trampoline_boot_cpu_entry: >>> * Declare that our target operating mode is long mode. >>> * Initialise 32-bit registers since some buggy BIOSes depend on it. >>> */ >>> + xor %ecx,%ecx >>> + xor %edx,%edx >>> + xor %esi,%esi >>> + xor %edi,%edi >>> + xor %ebp,%ebp >>> movl $0xec00,%eax # declare target operating mode >>> movl $0x0002,%ebx # long mode >>> int $0x15 >> ... I can''t really see the value of the change here: If we''re to >> work around theoretical BIOS bugs, we''d need to do this prior to >> each BIOS call. That''s surely overkill. Therefore let''s focus on >> what is needed to work around _known_ BIOS bugs. > > I admit that I was leaning on the cautious side with these changes. > > I can take them out if you think that would be better, but given this > int was already flagged as buggy in some BIOSes, and we have found > another case, I think covering all GPRs is the safer option.As said - I doubt this would help much. I''d really prefer at least this part of the patch to be taken out again. Unless Keir is specifically of the opposite opinion... Jan
Keir Fraser
2013-Dec-04 18:59 UTC
Re: [PATCH] x86/boot: Fix BIOS memory corruption on certain IBM systems
On 04/12/2013 10:47, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> * Declare that our target operating mode is long mode. >>>> * Initialise 32-bit registers since some buggy BIOSes depend on >>>> it. >>>> */ >>>> + xor %ecx,%ecx >>>> + xor %edx,%edx >>>> + xor %esi,%esi >>>> + xor %edi,%edi >>>> + xor %ebp,%ebp >>>> movl $0xec00,%eax # declare target operating mode >>>> movl $0x0002,%ebx # long mode >>>> int $0x15 >>> ... I can''t really see the value of the change here: If we''re to >>> work around theoretical BIOS bugs, we''d need to do this prior to >>> each BIOS call. That''s surely overkill. Therefore let''s focus on >>> what is needed to work around _known_ BIOS bugs. >> >> I admit that I was leaning on the cautious side with these changes. >> >> I can take them out if you think that would be better, but given this >> int was already flagged as buggy in some BIOSes, and we have found >> another case, I think covering all GPRs is the safer option. > > As said - I doubt this would help much. I''d really prefer at least this > part of the patch to be taken out again. Unless Keir is specifically of > the opposite opinion...The change does kind of fit with the comment immediately above though. Overall I don''t really care that much either way. -- Keir