Muli Ben-Yehuda
2006-Sep-07 13:28 UTC
[Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
I''m seeing this boot crash reliably on x86-64 with the current tip. Is anyone else seeing this? suggestions for debugging, other than bisection? muli@undeg:~/vanilla$ hg tip changeset: 11433:1de184deaa9c tag: tip user: ssmith@weybridge.uk.xensource.com date: Wed Sep 6 13:16:02 2006 +0100 summary: [XEN] gnttab: Initialise maptrack->flags (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen ). (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.0-unstable x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff80495000>] (XEN) RFLAGS: 0000000000000202 CONTEXT: guest (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: ffffffff81037000 rdi: ffffffff81037000 (XEN) rbp: 0000000000000000 rsp: ffffffff80493fb0 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000007038000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff80493fb0: (XEN) 0000000000000000 0000000000000000 0000000000000002 ffffffff80495000 (XEN) 000000010000e030 0000000000010002 ffffffff80493ff8 000000000000e02b (XEN) 0000000000000000 0000000000000000 (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. Full boot log: Xen version 3.0-unstable (muli@haifa.ibm.com) (gcc version 3.3.3 (SuSE Linux)) Thu Sep 7 18:24:38 IDT 2006 Latest ChangeSet: Wed Sep 6 13:16:02 2006 +0100 11433:1de184deaa9c (XEN) Console output is synchronous. (XEN) Command line: (hd0,1)/boot/xen.gz dom0_mem=6000000 console=com2,vga com2=1 9200 sync_console noreboot (XEN) Physical RAM map: (XEN) 0000000000000000 - 0000000000099000 (usable) (XEN) 0000000000099000 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000e7f9c640 (usable) (XEN) 00000000e7f9c640 - 00000000e7fa6a40 (ACPI data) (XEN) 00000000e7fa6a40 - 00000000e8000000 (reserved) (XEN) 00000000fec00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000198000000 (usable) (XEN) System RAM: 6143MB (6290644kB) (XEN) Xen heap: 13MB (14304kB) (XEN) found SMP MP-table at 00099140 (XEN) DMI 2.3 present. (XEN) Using APIC driver default (XEN) ACPI: RSDP (v000 IBM ) @ 0x00000000000fd cf0 (XEN) ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x00000000e 7fa69c0 (XEN) ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x00000000e 7fa6940 (XEN) ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x00000000e 7fa6880 (XEN) ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @ 0x00000000e 7fa67c0 (XEN) ACPI: DSDT (v001 IBM SER01ZEU 0x00001000 INTL 0x20030122) @ 0x000000000 0000000 (XEN) ACPI: Local APIC address 0xfee00000 (XEN) Switched to APIC driver `summit''. (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 15:4 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) (XEN) Processor #1 15:4 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled) (XEN) Processor #6 15:4 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled) (XEN) Processor #7 15:4 APIC version 20 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) (XEN) version 17, address 0xfec00000, GSI 0-35 (XEN) ACPI: IOAPIC (id[0x0e] address[0xfec01000] gsi_base[35]) (XEN) IOAPIC[1]: apic_id 14, version 17, address 0xfec01000, GSI 35-70 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 low edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ8 used by override. (XEN) ACPI: IRQ14 used by override. (XEN) Enabling APIC mode: Phys. Using 2 I/O APICs (XEN) Using ACPI (MADT) for SMP configuration information (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 3169.511 MHz processor. (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 1024K (XEN) CPU: Physical Processor ID: 0 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#0. (XEN) CPU0: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU0: Thermal monitoring enabled (XEN) CPU0: Intel(R) Xeon(TM) MP CPU 3.16GHz stepping 01 (XEN) Booting processor 1/1 eip 90000 (XEN) Initializing CPU#1 (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 1024K (XEN) CPU: Physical Processor ID: 0 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#1. (XEN) CPU1: Intel P4EN) CPU: L2 cache: 1024K (XEN) CPU: Physical Processor ID: 3 (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#2. (XEN) CPU2: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU2: Thermal monitoring enabled (XEN)heck architecture supported. (XEN) Intel machine check reporting enabled on CPU#3. (XEN) CPU3: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU3: Thermal monitoring enabled (XEN) CPU3: Intel(R) Pentium(R) 4 CPU 3.16GHz stepping 09 (XEN) Total of 4 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC (XEN) ...trying to set up timer (IRQ0) through the 8259A ... failed. (XEN) ...trying to set up timer as Virtual Wire IRQ... works. (XEN) checking TSC synchronization across 4 CPUs: passed. (XEN) Cyclone: Could not find valid CBAR value. (XEN) Platform timer is 1.193MHz PIT (XEN) Brought up 4 CPUs (XEN) Machine check exception polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) Domain 0 kernel supports features = { 0000001f }. (XEN) Domain 0 kernel requires features = { 00000000 }. (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000006000000->0000000008000000 (1491808 pages to be al located) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80100000->ffffffff804c4f68 (XEN) Init. ramdisk: ffffffff804c5000->ffffffff804c5000 (XEN) Phys-Mach map: ffffffff804c5000->ffffffff81036b00 (XEN) Start info: ffffffff81037000->ffffffff810374a8 (XEN) Page tables: ffffffff81038000->ffffffff81045000 (XEN) Boot stack: ffffffff81045000->ffffffff81046000 (XEN) TOTAL: ffffffff80000000->ffffffff81400000 (XEN) ENTRY ADDRESS: ffffffff80100000 (XEN) Dom0 has maximum 4 VCPUs (XEN) Scrubbing Free RAM: ...................................................... ............done. (XEN) Xen trace buffers: disabled (XEN) ********************************************** (XEN) ******* WARNING: CONSOLE OUTPUT IS SYCHRONOUS (XEN) ******* This option is intended to aid debugging of Xen by ensuring (XEN) ******* that all output is synchronously delivered on the serial line. (XEN) ******* However it can introduce SIGNIFICANT latencies and affect (XEN) ******* timekeeping. It is NOT recommended for production use! (XEN) ********************************************** (XEN) 3... 2... 1... (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen ). (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.0-unstable x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff80495000>] (XEN) RFLAGS: 0000000000000202 CONTEXT: guest (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: ffffffff81037000 rdi: ffffffff81037000 (XEN) rbp: 0000000000000000 rsp: ffffffff80493fb0 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000007038000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff80493fb0: (XEN) 0000000000000000 0000000000000000 0000000000000002 ffffffff80495000 (XEN) 000000010000e030 0000000000010002 ffffffff80493ff8 000000000000e02b (XEN) 0000000000000000 0000000000000000 (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-07 13:36 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On 7/9/06 14:28, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote:> I''m seeing this boot crash reliably on x86-64 with the current tip. Is > anyone else seeing this? suggestions for debugging, other than > bisection?Is the kernel from the same build as Xen (oughtn''t to matter, but worth checking)? Have you tried disassembling the kernel image to look at the address in the backtrace to see where the kernel is crashing? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-07 14:32 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Thu, Sep 07, 2006 at 02:36:44PM +0100, Keir Fraser wrote:> On 7/9/06 14:28, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote: > > > I''m seeing this boot crash reliably on x86-64 with the current tip. Is > > anyone else seeing this? suggestions for debugging, other than > > bisection? > > Is the kernel from the same build as Xen (oughtn''t to matter, but worth > checking)?Yes> Have you tried disassembling the kernel image to look at the address > in the backtrace to see where the kernel is crashing?Working on it now. FWIW, it''s an 8G Intel based machine, but playing with the ammount of memory dedicated to Xen/dom0 doesn''t make a different. Also, it doesn''t happen on a dual Opteron x86-64 machine. Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-07 17:06 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On 7/9/06 15:32, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote:>> Have you tried disassembling the kernel image to look at the address >> in the backtrace to see where the kernel is crashing? > > Working on it now. > > FWIW, it''s an 8G Intel based machine, but playing with the ammount of > memory dedicated to Xen/dom0 doesn''t make a different. Also, it > doesn''t happen on a dual Opteron x86-64 machine.Binary chop on changeset revisions isn''t a bad idea. I would think that such a blatant failure mode couldn''t have crept in all that long ago. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Travis Betak
2006-Sep-07 18:02 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Thu, 7 Sep 2006, Keir Fraser wrote:> On 7/9/06 15:32, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote: > >>> Have you tried disassembling the kernel image to look at the address >>> in the backtrace to see where the kernel is crashing? >> >> Working on it now. >> >> FWIW, it''s an 8G Intel based machine, but playing with the ammount of >> memory dedicated to Xen/dom0 doesn''t make a different. Also, it >> doesn''t happen on a dual Opteron x86-64 machine. > > Binary chop on changeset revisions isn''t a bad idea. I would think > that such a blatant failure mode couldn''t have crept in all that long > ago.For what it''s worth, a couple of us here got bitten by this every now and then a couple of weeks back and it was on Opterons. From what we saw, it might have been dependent on the toolchain or something of the like (although I hope it isn''t). I was building using Debian AMD64 stable and it would crash. I upgraded to Debian testing and it went away. It started happening after c/s 11131 within about 100 changesets or so. It seemed really obscure and not widespread so we just moved on. I hope this information helps a little. --travis _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-07 20:01 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Thu, Sep 07, 2006 at 06:06:51PM +0100, Keir Fraser wrote:> On 7/9/06 15:32, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote: > > >> Have you tried disassembling the kernel image to look at the address > >> in the backtrace to see where the kernel is crashing? > > > > Working on it now. > > > > FWIW, it''s an 8G Intel based machine, but playing with the ammount of > > memory dedicated to Xen/dom0 doesn''t make a different. Also, it > > doesn''t happen on a dual Opteron x86-64 machine. > > Binary chop on changeset revisions isn''t a bad idea. I would think that such > a blatant failure mode couldn''t have crept in all that long ago.Yep, I''m working on it. Might take a while to narrow it down though, between me and the machine is a very slow VPN connection and a weekend... I''ll keep you updated. Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-09 07:58 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Thu, Sep 07, 2006 at 11:01:46PM +0300, Muli Ben-Yehuda wrote:> > Binary chop on changeset revisions isn''t a bad idea. I would think that such > > a blatant failure mode couldn''t have crept in all that long ago. > > Yep, I''m working on it. Might take a while to narrow it down though, > between me and the machine is a very slow VPN connection and a > weekend... I''ll keep you updated.Ok, this changeset: changeset: 11223:a4550b7488400c44a9f27c92115c8e364493837a user: Ian Campbell <ian.campbell@xensource.com> date: Tue Aug 22 14:20:43 2006 +0100 files: linux-2.6-xen-sparse/arch/i386/kernel/vmlinux.lds.S patches/linux-2.6.16.13/series patches/linux-2.6.16.13/x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch patches/linux-2.6.16.13/x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch description: [LINUX] Support creating ELF note segments in the kernel ELF image. dies, while the previous one (11222:cd4e7ace4e58d9e35c08ccaa4677c6b6d0cf137b) works. Looking at the patch, I thought it might be a toolchain issue, so I tried with two toolchains, both of which displayed the same results. (cross compiling i386 to x86-64) gcc version 3.4.4 ld version: 2.15 (native compilation on x86-64) gcc version 3.3.3 (SuSE Linux) GNU ld version 2.15.90.0.1.1 20040303 (SuSE Linux) Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-09 15:25 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Sat, Sep 09, 2006 at 10:58:40AM +0300, Muli Ben-Yehuda wrote:> On Thu, Sep 07, 2006 at 11:01:46PM +0300, Muli Ben-Yehuda wrote: > > > > Binary chop on changeset revisions isn''t a bad idea. I would think that such > > > a blatant failure mode couldn''t have crept in all that long ago. > > > > Yep, I''m working on it. Might take a while to narrow it down though, > > between me and the machine is a very slow VPN connection and a > > weekend... I''ll keep you updated. > > Ok, this changeset: > > changeset: 11223:a4550b7488400c44a9f27c92115c8e364493837a > user: Ian Campbell <ian.campbell@xensource.com> > date: Tue Aug 22 14:20:43 2006 +0100 > files: linux-2.6-xen-sparse/arch/i386/kernel/vmlinux.lds.S > patches/linux-2.6.16.13/series > patches/linux-2.6.16.13/x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch > patches/linux-2.6.16.13/x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch > description: > [LINUX] Support creating ELF note segments in the kernel ELF image. > > dies, while the previous one > (11222:cd4e7ace4e58d9e35c08ccaa4677c6b6d0cf137b) works.Reverting the changes to x86-64''s vmlinux.lds.S first introduced in this patch is enough to get my box booting again. - is this change necessary or can it be reverted for the time being? - how to debug what''s causing this? linker scripts aren''t exactly my cup of tea - I''m going to try -mm now to see if this affects mainline as well. Cheers, Muli diff -r 1de184deaa9c patches/linux-2.6.16.13/series --- a/patches/linux-2.6.16.13/series Wed Sep 6 12:16:02 2006 +++ b/patches/linux-2.6.16.13/series Sat Sep 9 18:19:08 2006 @@ -19,5 +19,4 @@ xen-hotplug.patch xenoprof-generic.patch x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch -x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch x86-elfnote-as-preprocessor-macro.patch diff -r 1de184deaa9c patches/linux-2.6.16.13/x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch --- a/patches/linux-2.6.16.13/x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch Wed Sep 6 12:16:02 2006 +++ /dev/null Sat Sep 9 18:19:08 2006 @@ -1,60 +0,0 @@ -diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S -index 7c4de31..ef418b3 100644 ---- a/arch/x86_64/kernel/vmlinux.lds.S -+++ b/arch/x86_64/kernel/vmlinux.lds.S -@@ -13,6 +13,12 @@ OUTPUT_FORMAT("elf64-x86-64", "elf64-x86 - OUTPUT_ARCH(i386:x86-64) - ENTRY(phys_startup_64) - jiffies_64 = jiffies; -+PHDRS { -+ text PT_LOAD FLAGS(5); /* R_E */ -+ data PT_LOAD FLAGS(7); /* RWE */ -+ user PT_LOAD FLAGS(7); /* RWE */ -+ note PT_NOTE FLAGS(4); /* R__ */ -+} - SECTIONS - { - . = __START_KERNEL; -@@ -31,7 +37,7 @@ SECTIONS - KPROBES_TEXT - *(.fixup) - *(.gnu.warning) -- } = 0x9090 -+ } :text = 0x9090 - /* out-of-line lock text */ - .text.lock : AT(ADDR(.text.lock) - LOAD_OFFSET) { *(.text.lock) } - -@@ -57,7 +63,7 @@ #endif - .data : AT(ADDR(.data) - LOAD_OFFSET) { - *(.data) - CONSTRUCTORS -- } -+ } :data - - _edata = .; /* End of data section */ - -@@ -89,7 +95,7 @@ #define VVIRT_OFFSET (VSYSCALL_ADDR - VS - #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET) - - . = VSYSCALL_ADDR; -- .vsyscall_0 : AT(VSYSCALL_PHYS_ADDR) { *(.vsyscall_0) } -+ .vsyscall_0 : AT(VSYSCALL_PHYS_ADDR) { *(.vsyscall_0) } :user - __vsyscall_0 = VSYSCALL_VIRT_ADDR; - - . = ALIGN(CONFIG_X86_L1_CACHE_BYTES); -@@ -132,7 +138,7 @@ #undef VVIRT - . = ALIGN(8192); /* init_task */ - .data.init_task : AT(ADDR(.data.init_task) - LOAD_OFFSET) { - *(.data.init_task) -- } -+ } :data - - . = ALIGN(4096); - .data.page_aligned : AT(ADDR(.data.page_aligned) - LOAD_OFFSET) { -@@ -235,4 +241,6 @@ #endif - STABS_DEBUG - - DWARF_DEBUG -+ -+ NOTES - } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-09 17:28 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Thu, Sep 07, 2006 at 01:02:59PM -0500, Travis Betak wrote:> For what it''s worth, a couple of us here got bitten by this every now > and then a couple of weeks back and it was on Opterons. From what we > saw, it might have been dependent on the toolchain or something of the > like (although I hope it isn''t). I was building using Debian AMD64 > stable and it would crash. I upgraded to Debian testing and it went > away. It started happening after c/s 11131 within about 100 changesets > or so. It seemed really obscure and not widespread so we just moved > on.Do you recall / can you check the gcc and ld versions on both versions? also, if you still have access to the build machine with Debian stable, could you please see if applying the patch I posted earlier in the thread to back out the vmlinux.lds changes solves the problem? Thanks, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2006-Sep-09 17:46 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
Hi Muli, On Sat, 2006-09-09 at 18:25 +0300, Muli Ben-Yehuda wrote:> On Sat, Sep 09, 2006 at 10:58:40AM +0300, Muli Ben-Yehuda wrote: > > On Thu, Sep 07, 2006 at 11:01:46PM +0300, Muli Ben-Yehuda wrote: > > > > > > Binary chop on changeset revisions isn''t a bad idea. I would think that such > > > > a blatant failure mode couldn''t have crept in all that long ago. > > > > > > Yep, I''m working on it. Might take a while to narrow it down though, > > > between me and the machine is a very slow VPN connection and a > > > weekend... I''ll keep you updated. > > > > Ok, this changeset: > > > > changeset: 11223:a4550b7488400c44a9f27c92115c8e364493837a > > user: Ian Campbell <ian.campbell@xensource.com> > > date: Tue Aug 22 14:20:43 2006 +0100 > > files: linux-2.6-xen-sparse/arch/i386/kernel/vmlinux.lds.S > > patches/linux-2.6.16.13/series > > patches/linux-2.6.16.13/x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch > > patches/linux-2.6.16.13/x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch > > description: > > [LINUX] Support creating ELF note segments in the kernel ELF image. > > > > dies, while the previous one > > (11222:cd4e7ace4e58d9e35c08ccaa4677c6b6d0cf137b) works. > > Reverting the changes to x86-64''s vmlinux.lds.S first introduced in > this patch is enough to get my box booting again. > - is this change necessary or can it be reverted for the time being?It is necessary in order to have the ELF notes declared in head-xen.S appear in the final image, we use them at boot time in place of the older __xen_guest section. In theory we could remove the patch and rely on the code falling back to __xen_guest, but I''d rather not. I was told this independently yesterday, it is a problem with older (pre 2.16) binutils, I was able to reproduce it on Debian stable which has 2.15. It turns out that Jan Beulich posted a patch to xen-devel for the problem a while back and it got forgotten about. I''m on a crappy connection or I''d find a precise archive link for you -- It was on 28 August with message-id 44F3088B.76E4.0078.0@novell.com and subject "update for x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch". It basically moves the .bss declaration after all the .data ones. I haven''t committed this to Xen unstable because of http://marc.theaimsgroup.com/?l=linux-kernel&m=115629369729911&w=2 (the patch in question in this thread was from someone else, but it is the same as Jan''s). Andi Kleen has removed it from his queue for upstream because of this report (I think). Right now I think we should apply Jan''s patch anyway, the above problem looks a bit spurious and applying the patch would help us flush it out anyway. I''ll apply when I find a decent net connection (probably Monday when I get to the office). If you could confirm the fix for me that would be a very useful data point. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2006-Sep-09 20:01 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Sat, 2006-09-09 at 18:46 +0100, Ian Campbell wrote:> Right now I think we should apply Jan''s patch anywayWhich I have now done. It should come through as c/set 11439:6f36370e373a once regression test has passed. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2006-Sep-10 07:29 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Sat, Sep 09, 2006 at 06:46:37PM +0100, Ian Campbell wrote:> I was told this independently yesterday, it is a problem with older (pre > 2.16) binutils, I was able to reproduce it on Debian stable which has > 2.15. It turns out that Jan Beulich posted a patch to xen-devel for the > problem a while back and it got forgotten about. I''m on a crappy > connection or I''d find a precise archive link for you -- It was on 28 > August with message-id 44F3088B.76E4.0078.0@novell.com and subject > "update for > x86_64-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch". It > basically moves the .bss declaration after all the .data ones.http://lists.xensource.com/archives/html/xen-devel/2006-08/msg01416.html> I haven''t committed this to Xen unstable because of > http://marc.theaimsgroup.com/?l=linux-kernel&m=115629369729911&w=2 (the > patch in question in this thread was from someone else, but it is the > same as Jan''s). Andi Kleen has removed it from his queue for upstream > because of this report (I think).It''s still in his tree but disabled.> Right now I think we should apply Jan''s patch anyway, the above problem > looks a bit spurious and applying the patch would help us flush it out > anyway. I''ll apply when I find a decent net connection (probably Monday > when I get to the office). If you could confirm the fix for me that > would be a very useful data point.Yes, it works for me. Thanks. Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Travis Betak
2006-Sep-11 11:55 UTC
Re: [Xen-devel] unstable tip not booting on x86-64 with ''domain_crash_sync''
On Sat, 9 Sep 2006, Muli Ben-Yehuda wrote:> On Thu, Sep 07, 2006 at 01:02:59PM -0500, Travis Betak wrote: > >> For what it''s worth, a couple of us here got bitten by this every now >> and then a couple of weeks back and it was on Opterons. From what we >> saw, it might have been dependent on the toolchain or something of the >> like (although I hope it isn''t). I was building using Debian AMD64 >> stable and it would crash. I upgraded to Debian testing and it went >> away. It started happening after c/s 11131 within about 100 changesets >> or so. It seemed really obscure and not widespread so we just moved >> on. > > Do you recall / can you check the gcc and ld versions on both > versions? also, if you still have access to the build machine with > Debian stable, could you please see if applying the patch I posted > earlier in the thread to back out the vmlinux.lds changes solves the > problem?gcc 3.3.5-13 and 3.4.3-13sarge1 ld 2.15-6 Unfortunately, I upgraded my build machine when trying to get past the problem. I see from the rest of the thread that the problem has been more or less fixed. --travis _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel