I am trying to boot xen-unstable 16157 x86_64 on a HP dc7800 with the E6750 CPUs and Xen simply reboots with no output of any kind after the grub loading info: Booting ''XEN'' root (hd0,5) Filesystem type is ext2fs, partition type 0x83 kernel /boot/xen.gz console=com1 com1=115200,8n1 loglvl=all [Multiboot-elf, <0x100000:0x101620:0x649e0>, shtab=0x266078, entry=0x100000] module /boot/vmlinuz-xen root=/dev/disk/by-id/scsi-SATA_Hitachi_HDS7216_PVD301Z 9R3DSTK-part6 resume=/dev/sda5 splash=silent showopts [Multiboot-module @ 0x267000, 0x508170 bytes] module /boot/initrd-xen [Multiboot-module @ 0x770000, 0x125e200 bytes] SLES 10 SP1 Xen works fine on the box. I''ve tried disabling the VT/VT-d features of the box; serial console; and building a debug version of Xen, hoping it would output something before it died: no joy. Does anyone know what is wrong or can tell me how to go about debugging this? The dmesg output from booting the Ubuntu 2.6.22-14-server kernel is attach in the hope that it will provide some useful information. Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-23 18:07 UTC
Re: [Xen-devel] xen-unstable on HP dc7800 simply reboots
On 23/10/07 17:49, "John Byrne" <john.l.byrne@hp.com> wrote:> Does anyone know what is wrong or can tell me how to go about debugging > this? The dmesg output from booting the Ubuntu 2.6.22-14-server kernel > is attach in the hope that it will provide some useful information.I suggest binary-chopping to find the offending changeset. Some early boot-time stuff went in over the past few days, so for example try reverting to 16130 and see if that has the same problem. Go back another 30, or forward 15, changesets depending on whether that works or not. I suspect this issue is quite new, so it shouldn''t take too long to narrow down. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 23/10/07 17:49, "John Byrne" <john.l.byrne@hp.com> wrote: > >> Does anyone know what is wrong or can tell me how to go about debugging >> this? The dmesg output from booting the Ubuntu 2.6.22-14-server kernel >> is attach in the hope that it will provide some useful information. > > I suggest binary-chopping to find the offending changeset. Some early > boot-time stuff went in over the past few days, so for example try reverting > to 16130 and see if that has the same problem. Go back another 30, or > forward 15, changesets depending on whether that works or not. I suspect > this issue is quite new, so it shouldn''t take too long to narrow down.Actually, it a few months old. xen-unstable 15236 is the last changeset that Xen doesn''t immediately reboot. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne wrote:> Keir Fraser wrote: >> On 23/10/07 17:49, "John Byrne" <john.l.byrne@hp.com> wrote: >> >>> Does anyone know what is wrong or can tell me how to go about debugging >>> this? The dmesg output from booting the Ubuntu 2.6.22-14-server kernel >>> is attach in the hope that it will provide some useful information. >> >> I suggest binary-chopping to find the offending changeset. Some early >> boot-time stuff went in over the past few days, so for example try >> reverting >> to 16130 and see if that has the same problem. Go back another 30, or >> forward 15, changesets depending on whether that works or not. I suspect >> this issue is quite new, so it shouldn''t take too long to narrow down. > > Actually, it a few months old. xen-unstable 15236 is the last changeset > that Xen doesn''t immediately reboot. >Littering the early boot code with putc() and comparing the code''s behavior on a machine that actually boots, I''ve found that the call to get_memory_map from trampoline.S is the root of all evil. If that is called, then the trampoline fails to make it back to protected mode from real mode. I am still working to identify the specific problem. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-24 07:12 UTC
Re: [Xen-devel] xen-unstable on HP dc7800 simply reboots
On 24/10/07 00:03, "John Byrne" <john.l.byrne@hp.com> wrote:>> Actually, it a few months old. xen-unstable 15236 is the last changeset >> that Xen doesn''t immediately reboot. >> > > Littering the early boot code with putc() and comparing the code''s > behavior on a machine that actually boots, I''ve found that the call to > get_memory_map from trampoline.S is the root of all evil. If that is > called, then the trampoline fails to make it back to protected mode from > real mode. I am still working to identify the specific problem.So if you remove that call, so Xen falls back to using the GRUB-supplied memory map, then Xen boots okay? Is tghat tru on current tip of xen-unstable too? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-24 08:03 UTC
Re: [Xen-devel] xen-unstable on HP dc7800 simply reboots
On 24/10/07 08:12, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote:>> Littering the early boot code with putc() and comparing the code''s >> behavior on a machine that actually boots, I''ve found that the call to >> get_memory_map from trampoline.S is the root of all evil. If that is >> called, then the trampoline fails to make it back to protected mode from >> real mode. I am still working to identify the specific problem. > > So if you remove that call, so Xen falls back to using the GRUB-supplied > memory map, then Xen boots okay? Is tghat tru on current tip of xen-unstable > too?Thinking some more my guess is that any BIOS call is causing you to crash, so removing just teh call to get_memory_map will be insufficient on tip -- you''ll have to remove calls to set video mode and get edd information too. Perhaps the IDT is either corrupted or not actually at address 0x0, like it''s supposed to be? What happens if you remove the ''lidt'' instruction from arch/x86/boot/trampoline.S? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 24/10/07 08:12, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote: > >>> Littering the early boot code with putc() and comparing the code''s >>> behavior on a machine that actually boots, I''ve found that the call to >>> get_memory_map from trampoline.S is the root of all evil. If that is >>> called, then the trampoline fails to make it back to protected mode from >>> real mode. I am still working to identify the specific problem. >> So if you remove that call, so Xen falls back to using the GRUB-supplied >> memory map, then Xen boots okay? Is tghat tru on current tip of xen-unstable >> too? > > Thinking some more my guess is that any BIOS call is causing you to crash, > so removing just teh call to get_memory_map will be insufficient on tip -- > you''ll have to remove calls to set video mode and get edd information too. > Perhaps the IDT is either corrupted or not actually at address 0x0, like > it''s supposed to be? What happens if you remove the ''lidt'' instruction from > arch/x86/boot/trampoline.S?Once I identified the problem revision, I went back to the tip to try to debug it. (Sorry for any ambiguity.) Taking out get_memory_map is sufficient; get_edd and video don''t seem to cause any problems. Removing the lidt changes nothing. Adding a "ret" after the .Lmem88 in mem.S confirms that the problem is in the e820 call. I''m currently trying to see if the descriptor table is getting corrupted during the BIOS calls. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Byrne
2007-Oct-25 02:13 UTC
Re: [Xen-devel] [PATCH] xen-unstable on HP dc7800 simply reboots
John Byrne wrote:> Keir Fraser wrote: >> On 24/10/07 08:12, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk> wrote: >> >>>> Littering the early boot code with putc() and comparing the code''s >>>> behavior on a machine that actually boots, I''ve found that the call to >>>> get_memory_map from trampoline.S is the root of all evil. If that is >>>> called, then the trampoline fails to make it back to protected mode >>>> from >>>> real mode. I am still working to identify the specific problem. >>> So if you remove that call, so Xen falls back to using the GRUB-supplied >>> memory map, then Xen boots okay? Is tghat tru on current tip of >>> xen-unstable >>> too? >> >> Thinking some more my guess is that any BIOS call is causing you to >> crash, >> so removing just teh call to get_memory_map will be insufficient on >> tip -- >> you''ll have to remove calls to set video mode and get edd information >> too. >> Perhaps the IDT is either corrupted or not actually at address 0x0, like >> it''s supposed to be? What happens if you remove the ''lidt'' instruction >> from >> arch/x86/boot/trampoline.S? > > Once I identified the problem revision, I went back to the tip to try to > debug it. (Sorry for any ambiguity.) Taking out get_memory_map is > sufficient; get_edd and video don''t seem to cause any problems. > Removing the lidt changes nothing. Adding a "ret" after the .Lmem88 in > mem.S confirms that the problem is in the e820 call. I''m currently > trying to see if the descriptor table is getting corrupted during the > BIOS calls. >A lot of work to find a one-line fix. There is no sign of any corruption in the GDT, but you do need to reload the GDT before transitioning back to real mode. I am asking a BIOS person might require this, but, in the meantime, I cannot see how this patch will cause trouble on any other system and it seems to fix mine. I am just a little uncomfortable because I don''t really understand why it is required. Signed-off-by: john.l.byrne@hp.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Oct-25 07:49 UTC
Re: [Xen-devel] [PATCH] xen-unstable on HP dc7800 simply reboots
On 25/10/07 03:13, "John Byrne" <john.l.byrne@hp.com> wrote:>> Once I identified the problem revision, I went back to the tip to try to >> debug it. (Sorry for any ambiguity.) Taking out get_memory_map is >> sufficient; get_edd and video don''t seem to cause any problems. >> Removing the lidt changes nothing. Adding a "ret" after the .Lmem88 in >> mem.S confirms that the problem is in the e820 call. I''m currently >> trying to see if the descriptor table is getting corrupted during the >> BIOS calls. > > A lot of work to find a one-line fix. There is no sign of any corruption > in the GDT, but you do need to reload the GDT before transitioning back > to real mode. I am asking a BIOS person might require this, but, in the > meantime, I cannot see how this patch will cause trouble on any other > system and it seems to fix mine. I am just a little uncomfortable > because I don''t really understand why it is required.This is a very reasonable thing to do actually, if for no other reason than all bootloaders appear to do it every time they enter protected mode. There''s safety in numbers! We should probably frob the a20 gate on entry/exit to real mode too, for the truly authentic real-mode experience. Nothing''s yet broken due to lack of it though. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel