I am still having problems getting my dom0 machines to boot - I am not sure it is a single problem but there do still seem to be ata issues. This is with the recent apic patch applied. The full boot log (compressed) is attached up to the point it stopped responding. Michael Young ata_piix 0000:00:1f.1: enabling device (0005 -> 0007) xen_set_ioapic_routing: irq 18 gsi 18 vector 160 ioapic 0 pin 18 triggering 1 polarity 1 ata_piix 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18 scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14 ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15 ata1.00: qc timeout (cmd 0xef) ata1.00: failed to IDENTIFY (SPINUP failed, err_mask=0x4) ata1.00: qc timeout (cmd 0xef) ata1.00: failed to IDENTIFY (SPINUP failed, err_mask=0x4) ata1.00: qc timeout (cmd 0xef) ata1.00: failed to IDENTIFY (SPINUP failed, err_mask=0x4) ata2.00: ATAPI: CD-952E/AKV, R7AR, max UDMA/33 ata2.00: configured for UDMA/33 ata2.00: qc timeout (cmd 0xa0) ata2.00: TEST_UNIT_READY failed (err_mask=0x5) ata2.00: configured for UDMA/33 ata2.00: qc timeout (cmd 0xa0) ata2.00: TEST_UNIT_READY failed (err_mask=0x5) ata2.00: limiting speed to UDMA/33:PIO3 ata2.00: configured for UDMA/33 ata2.00: qc timeout (cmd 0xa0) ata2.00: TEST_UNIT_READY failed (err_mask=0x5) ata2.00: disabled ata2: soft resetting link ata2: EH complete _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This log extract is from the same hardware booting off the USB stick. This contains a lot of traceback, starting =====================================================[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 2.6.29-0.135.rc5.git3.fc10.i686.PAE #1 ------------------------------------------------------ khubd/245 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&retval->lock){......}, at: [<c04a5b9f>] dma_pool_alloc+0x1d/0x247 and this task is already holding: (&ehci->lock){-.....}, at: [<c0626797>] ehci_urb_enqueue+0xac/0xc6b which would create a new lock dependency: (&ehci->lock){-.....} -> (&retval->lock){......} but this new dependency connects a HARDIRQ-irq-safe lock: (&ehci->lock){-.....} ... which became HARDIRQ-irq-safe at: [<c0456ef2>] __lock_acquire+0x241/0xb1c [<c0457820>] lock_acquire+0x53/0x75 [<c06fc2e3>] _spin_lock+0x1e/0x4e [<c0627fd6>] ehci_irq+0x21/0x193 [<c0615829>] usb_hcd_irq+0x38/0x93 [<c0477674>] handle_IRQ_event+0x1a/0x4b [<c0478890>] handle_level_irq+0x64/0xac [<ffffffff>] 0xffffffff to a HARDIRQ-irq-unsafe lock: (purge_lock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [<c0456f6f>] __lock_acquire+0x2be/0xb1c [<c0457820>] lock_acquire+0x53/0x75 [<c06fc2e3>] _spin_lock+0x1e/0x4e [<c04a106d>] __purge_vmap_area_lazy+0x39/0x145 [<c04a2362>] vm_unmap_aliases+0x150/0x159 [<c04061e0>] xen_create_contiguous_region+0x4c/0xd8 [<c056103d>] xen_swiotlb_fixup+0x6e/0x99 [<c08f1c2d>] swiotlb_alloc_boot+0x2e/0x35 [<c08fc3b2>] swiotlb_init_with_default_size+0x2f/0xdb [<c08fc46b>] swiotlb_init+0xd/0xf [<c08f1bf2>] pci_swiotlb_init+0x41/0x4e [<c08e53b8>] pci_iommu_alloc+0x8/0xa [<c08f283c>] mem_init+0xe/0x2b3 [<c08db7ff>] start_kernel+0x26b/0x31a [<c08db096>] i386_start_kernel+0x85/0x8d [<c08e0e7a>] xen_start_kernel+0x4bc/0x4c4 [<ffffffff>] 0xffffffff the full crash is in the attached log (I didn''t catch the entire log, so there is an earlier gap). Later there are further errors which eventually cause xen to give up. BUG: unable to handle kernel NULL pointer dereference at 000000a8 IP: [<c0680000>] rtnetlink_net_exit+0x11/0x1e *pdpt = 0000000004c33001 *pde = 0000000000000000 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent Modules linked in: pata_acpi ata_generic i915 drm i2c_algo_bit i2c_core BUG: unable to handle kernel NULL pointer dereference at 00000005 IP: [<c065b8e5>] dmi_get_system_info+0x0/0xc *pdpt = 0000000004c33001 *pde = 0000000000000000 Oops: 0002 [#2] SMP last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent Modules linked in: pata_acpi ata_generic i915 drm i2c_algo_bit i2c_core BUG: unable to handle kernel NULL pointer dereference at 00000005 IP: [<c065b8e5>] dmi_get_system_info+0x0/0xc *pdpt = 0000000004c33001 *pde = 0000000000000000 Oops: 0002 [#3] SMP last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent Modules linked in: pata_acpi ata_generic i915 drm i2c_algo_bit i2c_core BUG: unable to handle kernel NULL pointer dereference at 00000005 IP: [<c065b8e5>] dmi_get_system_info+0x0/0xc *pdpt = 0000000004c33001 *pde = 0000000000000000 Oops: 0002 [#4] SMP last sysfs file: /sys/devices/virtual/vtconsole/vtcon0/uevent Modules linked in: pata_acpi ata_generic i915 drm i2c_algo_bit i2c_core BUG: unable to handle kernel NULL pointer dereference at 00000005 IP:(XEN) domain_crash_sync called from entry.S (ff1a2c9e) (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.3.1 x86_32p debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) EIP: 0061:[<c0545ef5>] (XEN) EFLAGS: 00010206 EM: 1 CONTEXT: pv guest (XEN) eax: da434171 ebx: da43415e ecx: c07e154f edx: 7fffffff (XEN) esi: da434108 edi: c07cf380 ebp: da4340ec esp: da433fcc (XEN) cr0: 8005003b cr4: 000006f0 cr3: 04c37000 cr2: da433fdc (XEN) ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0069 cs: 0061 (XEN) Guest stack trace from esp=da433fcc: (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000(XEN) 00000000 00000000 00000000 00000000 00000000 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
And here are a couple more. First I get this traceback with a dom0 enabled kernel not running under xen atthe start of the boot log BUG: spinlock bad magic on CPU#0, swapper/0 (Not tainted) lock: ffffffff81a39c90, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0 Pid: 0, comm: swapper Not tainted 2.6.29-0.135.rc5.git3.fc10.x86_64 #1 Call Trace: [<ffffffff811f0af7>] spin_bug+0xb9/0xd8 [<ffffffff811f0b46>] _raw_spin_unlock+0x30/0xb9 [<ffffffff8143f17c>] _spin_unlock+0x35/0x50 [<ffffffff8102ebd5>] ? flat_send_IPI_mask+0x1f/0x35 [<ffffffff810402ef>] native_flush_tlb_others+0xf6/0x119 [<ffffffff810403a6>] flush_tlb_all+0x2a/0x60 [<ffffffff810f1f07>] __purge_vmap_area_lazy+0x142/0x1bc [<ffffffff810f1e1f>] ? __purge_vmap_area_lazy+0x5a/0x1bc [<ffffffff811ee6d4>] ? __bitmap_weight+0x4d/0xac [<ffffffff810f25dc>] free_unmap_vmap_area_noflush+0x80/0x9b [<ffffffff810f1941>] ? find_vmap_area+0x5b/0x7b [<ffffffff810f262b>] remove_vm_area+0x34/0x97 [<ffffffff810f27ad>] __vunmap+0x50/0x103 [<ffffffff810857ff>] ? trace_hardirqs_on_caller+0x140/0x17a [<ffffffff81392be0>] ? neigh_proxy_process+0xad/0x124 [<ffffffff810f2899>] vunmap+0x39/0x4f [<ffffffff81440dde>] text_poke+0x13c/0x186 [<ffffffff8116df16>] ? __sysfs_put+0x1c/0x41 [<ffffffff81446641>] ? _etext+0x0/0x3 [<ffffffff8101a765>] alternatives_smp_unlock+0x59/0x85 [<ffffffff8101aa31>] alternatives_smp_switch+0x16a/0x1bd [<ffffffff816ca9ef>] alternative_instructions+0x110/0x166 [<ffffffff816cb241>] ? identify_boot_cpu+0x23/0x5b [<ffffffff816cb3c8>] check_bugs+0x21/0x54 [<ffffffff816beffe>] start_kernel+0x410/0x43b [<ffffffff816be140>] ? early_idt_handler+0x0/0x71 [<ffffffff816be2ce>] x86_64_start_reservations+0xb9/0xd4 [<ffffffff816be000>] ? _sinittext+0x0/0x140 [<ffffffff816be3d6>] x86_64_start_kernel+0xed/0x110 Secondly I get this crash when trying to start xen under qemu-kvm. Something similar is happening when I try to start xen directly, but I can''t do serial logging on this computer so I can''t be sure. \ \/ /___ _ __ |___ / |___ / / | \ // _ \ ''_ \ |_ \ |_ \ | | / \ __/ | | | ___) | ___) || | /_/\_\___|_| |_| |____(_)____(_)_| (XEN) Xen version 3.3.1 (michael@home) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) Tue Feb 3 23:13:03 GMT 2009 (XEN) Latest ChangeSet: unavailable (XEN) Command line: console=com1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 0 MBR signatures (XEN) Found 0 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009fc00 (usable) (XEN) 000000000009fc00 - 00000000000a0000 (reserved) (XEN) 00000000000e8000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 000000003fff0000 (usable) (XEN) 000000003fff0000 - 0000000040000000 (ACPI data) (XEN) 00000000fffbd000 - 0000000100000000 (reserved) (XEN) System RAM: 1023MB (1048124kB) (XEN) ACPI: RSDP 000FB9D0, 0014 (r0 QEMU ) (XEN) ACPI: RSDT 3FFF0000, 002C (r1 QEMU QEMURSDT 1 QEMU 1) (XEN) ACPI: FACP 3FFF002C, 0074 (r1 QEMU QEMUFACP 1 QEMU 1) (XEN) ACPI: DSDT 3FFF0100, 253C (r1 BXPC BXDSDT 1 INTL 20061109) (XEN) ACPI: FACS 3FFF00C0, 0040 (XEN) ACPI: APIC 3FFF2640, 00E0 (r1 QEMU QEMUAPIC 1 QEMU 1) (XEN) Xen heap: 14MB (14632kB) (XEN) Domain heap initialised (XEN) Processor #0 6:2 APIC version 20 (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2394.081 MHz processor. (XEN) CPU0: Intel QEMU Virtual CPU version 0.9.1 stepping 03 (XEN) Total of 1 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) Platform timer is 3.579MHz ACPI PM Timer (XEN) Brought up 1 CPUs (XEN) I/O virtualisation disabled (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x239bbc0 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000038000000->000000003c000000 (221906 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff8239bbc0 (XEN) Init. ramdisk: ffffffff8239c000->ffffffff82f5f000 (XEN) Phys-Mach map: ffffffff82f5f000->ffffffff83130690 (XEN) Start info: ffffffff83131000->ffffffff831314a4 (XEN) Page tables: ffffffff83132000->ffffffff8314f000 (XEN) Boot stack: ffffffff8314f000->ffffffff83150000 (XEN) TOTAL: ffffffff80000000->ffffffff83400000 (XEN) ENTRY ADDRESS: ffffffff816be200 (XEN) Dom0 has maximum 1 VCPUs (XEN) Scrubbing Free RAM: done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen) (XEN) Freed 120kB init memory. (XEN) d0:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from 0000000000000028: (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff816c5315>] (XEN) RFLAGS: 0000000000000296 EM: 1 CONTEXT: pv guest (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: ffffffff83131000 rdi: ffffffff83131000 (XEN) rbp: ffffffff81695ff8 rsp: ffffffff81695f90 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006b0 (XEN) cr3: 000000003b132000 cr2: 0000000000000028 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81695f90: (XEN) 0000000000000000 0000000000000000 0000000000000000 ffffffff816c5315 (XEN) 000000010000e030 0000000000010096 ffffffff81695fd8 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> And here are a couple more. First I get this traceback with a dom0 > enabled kernel not running under xen atthe start of the boot log > BUG: spinlock bad magic on CPU#0, swapper/0 (Not tainted) > lock: ffffffff81a39c90, .magic: 00000000, .owner: swapper/0, > .owner_cpu: 0 > Pid: 0, comm: swapper Not tainted 2.6.29-0.135.rc5.git3.fc10.x86_64 #1Oh, I thought I''d fixed that (hm, must have lost the change somewhere). Does it keep going OK anyway?> Call Trace: > [<ffffffff811f0af7>] spin_bug+0xb9/0xd8 > [<ffffffff811f0b46>] _raw_spin_unlock+0x30/0xb9 > [<ffffffff8143f17c>] _spin_unlock+0x35/0x50 > [<ffffffff8102ebd5>] ? flat_send_IPI_mask+0x1f/0x35 > [<ffffffff810402ef>] native_flush_tlb_others+0xf6/0x119 > [<ffffffff810403a6>] flush_tlb_all+0x2a/0x60 > [<ffffffff810f1f07>] __purge_vmap_area_lazy+0x142/0x1bc > [<ffffffff810f1e1f>] ? __purge_vmap_area_lazy+0x5a/0x1bc > [<ffffffff811ee6d4>] ? __bitmap_weight+0x4d/0xac > [<ffffffff810f25dc>] free_unmap_vmap_area_noflush+0x80/0x9b > [<ffffffff810f1941>] ? find_vmap_area+0x5b/0x7b > [<ffffffff810f262b>] remove_vm_area+0x34/0x97 > [<ffffffff810f27ad>] __vunmap+0x50/0x103 > [<ffffffff810857ff>] ? trace_hardirqs_on_caller+0x140/0x17a > [<ffffffff81392be0>] ? neigh_proxy_process+0xad/0x124 > [<ffffffff810f2899>] vunmap+0x39/0x4f > [<ffffffff81440dde>] text_poke+0x13c/0x186 > [<ffffffff8116df16>] ? __sysfs_put+0x1c/0x41 > [<ffffffff81446641>] ? _etext+0x0/0x3 > [<ffffffff8101a765>] alternatives_smp_unlock+0x59/0x85 > [<ffffffff8101aa31>] alternatives_smp_switch+0x16a/0x1bd > [<ffffffff816ca9ef>] alternative_instructions+0x110/0x166 > [<ffffffff816cb241>] ? identify_boot_cpu+0x23/0x5b > [<ffffffff816cb3c8>] check_bugs+0x21/0x54 > [<ffffffff816beffe>] start_kernel+0x410/0x43b > [<ffffffff816be140>] ? early_idt_handler+0x0/0x71 > [<ffffffff816be2ce>] x86_64_start_reservations+0xb9/0xd4 > [<ffffffff816be000>] ? _sinittext+0x0/0x140 > [<ffffffff816be3d6>] x86_64_start_kernel+0xed/0x110 > > Secondly I get this crash when trying to start xen under qemu-kvm. > Something similar is happening when I try to start xen directly, but I > can''t do serial logging on this computer so I can''t be sure.Interesting. Xen has certainly revealed bugs in kvm''s pagetable management before, so it wouldn''t surprise me if they''ve broken something again (apparently they''re not in the habit of testing with Xen). Report it to kvm-devel <kvm@vger.kernel.org> J> > \ \/ /___ _ __ |___ / |___ / / | > \ // _ \ ''_ \ |_ \ |_ \ | | > / \ __/ | | | ___) | ___) || | > /_/\_\___|_| |_| |____(_)____(_)_| > > (XEN) Xen version 3.3.1 (michael@home) (gcc version 4.3.2 20081105 > (Red Hat 4.3.2-7) (GCC) ) Tue Feb 3 23:13:03 GMT 2009 > (XEN) Latest ChangeSet: unavailable > (XEN) Command line: console=com1 > (XEN) Video information: > (XEN) VGA is text mode 80x25, font 8x16 > (XEN) Disc information: > (XEN) Found 0 MBR signatures > (XEN) Found 0 EDD information structures > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009fc00 (usable) > (XEN) 000000000009fc00 - 00000000000a0000 (reserved) > (XEN) 00000000000e8000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 000000003fff0000 (usable) > (XEN) 000000003fff0000 - 0000000040000000 (ACPI data) > (XEN) 00000000fffbd000 - 0000000100000000 (reserved) > (XEN) System RAM: 1023MB (1048124kB) > (XEN) ACPI: RSDP 000FB9D0, 0014 (r0 QEMU ) > (XEN) ACPI: RSDT 3FFF0000, 002C (r1 QEMU QEMURSDT 1 > QEMU 1) > (XEN) ACPI: FACP 3FFF002C, 0074 (r1 QEMU QEMUFACP 1 > QEMU 1) > (XEN) ACPI: DSDT 3FFF0100, 253C (r1 BXPC BXDSDT 1 INTL > 20061109) > (XEN) ACPI: FACS 3FFF00C0, 0040 > (XEN) ACPI: APIC 3FFF2640, 00E0 (r1 QEMU QEMUAPIC 1 > QEMU 1) > (XEN) Xen heap: 14MB (14632kB) > (XEN) Domain heap initialised > (XEN) Processor #0 6:2 APIC version 20 > (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 > (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs > (XEN) Using scheduler: SMP Credit Scheduler (credit) > (XEN) Detected 2394.081 MHz processor. > (XEN) CPU0: Intel QEMU Virtual CPU version 0.9.1 stepping 03 > (XEN) Total of 1 processors activated. > (XEN) ENABLING IO-APIC IRQs > (XEN) -> Using new ACK method > (XEN) Platform timer is 3.579MHz ACPI PM Timer > (XEN) Brought up 1 CPUs > (XEN) I/O virtualisation disabled > (XEN) *** LOADING DOMAIN 0 *** > (XEN) Xen kernel: 64-bit, lsb, compat32 > (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x239bbc0 > (XEN) PHYSICAL MEMORY ARRANGEMENT: > (XEN) Dom0 alloc.: 0000000038000000->000000003c000000 (221906 pages > to be allocated) > (XEN) VIRTUAL MEMORY ARRANGEMENT: > (XEN) Loaded kernel: ffffffff81000000->ffffffff8239bbc0 > (XEN) Init. ramdisk: ffffffff8239c000->ffffffff82f5f000 > (XEN) Phys-Mach map: ffffffff82f5f000->ffffffff83130690 > (XEN) Start info: ffffffff83131000->ffffffff831314a4 > (XEN) Page tables: ffffffff83132000->ffffffff8314f000 > (XEN) Boot stack: ffffffff8314f000->ffffffff83150000 > (XEN) TOTAL: ffffffff80000000->ffffffff83400000 > (XEN) ENTRY ADDRESS: ffffffff816be200 > (XEN) Dom0 has maximum 1 VCPUs > (XEN) Scrubbing Free RAM: done. > (XEN) Xen trace buffers: disabled > (XEN) Std. Loglevel: Errors and warnings > (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) > (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch > input to Xen) > (XEN) Freed 120kB init memory. > (XEN) d0:v0: unhandled page fault (ec=0000) > (XEN) Pagetable walk from 0000000000000028: > (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff816c5315>] > (XEN) RFLAGS: 0000000000000296 EM: 1 CONTEXT: pv guest > (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: > 0000000000000000 > (XEN) rdx: 0000000000000000 rsi: ffffffff83131000 rdi: > ffffffff83131000 > (XEN) rbp: ffffffff81695ff8 rsp: ffffffff81695f90 r8: > 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: > 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: > 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: > 00000000000006b0 > (XEN) cr3: 000000003b132000 cr2: 0000000000000028 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=ffffffff81695f90: > (XEN) 0000000000000000 0000000000000000 0000000000000000 > ffffffff816c5315 > (XEN) 000000010000e030 0000000000010096 ffffffff81695fd8 > 000000000000e02b > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > (XEN) 0000000000000000 0000000000000000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) ffffffff816b6000 ffffffff816b6000 ffffffff816b6000 > ffffffff816b6000 > (XEN) Domain 0 crashed: rebooting machine in 5 seconds. > > Michael Young > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote:> M A Young wrote: >> And here are a couple more. First I get this traceback with a dom0 enabled >> kernel not running under xen atthe start of the boot log >> BUG: spinlock bad magic on CPU#0, swapper/0 (Not tainted) >> lock: ffffffff81a39c90, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0 >> Pid: 0, comm: swapper Not tainted 2.6.29-0.135.rc5.git3.fc10.x86_64 #1 > > Oh, I thought I''d fixed that (hm, must have lost the change somewhere). Does > it keep going OK anyway?Yes, that one doesn''t cause any obvious problems.>> Secondly I get this crash when trying to start xen under qemu-kvm. >> Something similar is happening when I try to start xen directly, but I >> can''t do serial logging on this computer so I can''t be sure. > > Interesting. Xen has certainly revealed bugs in kvm''s pagetable management > before, so it wouldn''t surprise me if they''ve broken something again > (apparently they''re not in the habit of testing with Xen). Report it to > kvm-devel <kvm@vger.kernel.org>I was hoping that wasn''t kvm related because something is crashing my x86_64 system when I try to boot it directly into xen at about the same point (though I don''t have any good way of catching the logging information so I can''t be sure). Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote:> Interesting. Xen has certainly revealed bugs in kvm''s pagetable management > before, so it wouldn''t surprise me if they''ve broken something again > (apparently they''re not in the habit of testing with Xen). Report it to > kvm-devel <kvm@vger.kernel.org>Further testing (and playing with the xen settings so I get to see the logging) reveals that it isn''t a kvm problem, because I get the same crash trying to boot xen directly on the computer. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ok, Keir's problem. But try doing a very clean rebuild; I spent a good chunk of last night bisecting something that turned out to be a misbuild... J M A Young <m.a.young@durham.ac.uk> wrote:>On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote: > >> Interesting. Xen has certainly revealed bugs in kvm's pagetable management >> before, so it wouldn't surprise me if they've broken something again >> (apparently they're not in the habit of testing with Xen). Report it to >> kvm-devel <kvm@vger.kernel.org> > >Further testing (and playing with the xen settings so I get to see the >logging) reveals that it isn't a kvm problem, because I get the same crash >trying to boot xen directly on the computer. > > Michael Young-- Sent from my Android phone with K-9. Please excuse my brevity. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote:> Ok, Keir''s problem. > > But try doing a very clean rebuild; I spent a good chunk of last night > bisecting something that turned out to be a misbuild...In this case I think a misbuild is unlikely because I have seen the behaviour with two or three kernels, and the xen package is a straight rpmbuild --rebuild of xen-3.3.1-3.fc11.src.rpm from Fedora rawhide. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
All the problems occur after your dom0 pv_ops kernel has started execution, Jeremy. ;-) -- Keir On 21/02/2009 14:59, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> Ok, Keir''s problem.But try doing a very clean rebuild; I spent a good chunk> of last night bisecting something that turned out to be a misbuild...J M A> Young <m.a.young@durham.ac.uk> wrote:>On Sat, 21 Feb 2009, Jeremy > Fitzhardinge wrote: >> >>> Interesting. Xen has certainly revealed bugs in kvm''s pagetable management >>> before, so it wouldn''t surprise me if they''ve broken something again >>> (apparently they''re not in the habit of testing with Xen). Report it to >>> kvm-devel <kvm@vger.kernel.org> >> >> Further testing (and playing with the xen settings so I get to see the >> logging) reveals that it isn''t a kvm problem, because I get the same crash >> trying to boot xen directly on the computer. >> >> Michael Young >-- Sent from my Android phone with K-9. Please excuse my brevity. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> And here are a couple more. First I get this traceback with a dom0 > enabled kernel not running under xen atthe start of the boot log > BUG: spinlock bad magic on CPU#0, swapper/0 (Not tainted) > lock: ffffffff81a39c90, .magic: 00000000, .owner: swapper/0, > .owner_cpu: 0 > Pid: 0, comm: swapper Not tainted 2.6.29-0.135.rc5.git3.fc10.x86_64 #1 > Call Trace: > [<ffffffff811f0af7>] spin_bug+0xb9/0xd8 > [<ffffffff811f0b46>] _raw_spin_unlock+0x30/0xb9 > [<ffffffff8143f17c>] _spin_unlock+0x35/0x50 > [<ffffffff8102ebd5>] ? flat_send_IPI_mask+0x1f/0x35 > [<ffffffff810402ef>] native_flush_tlb_others+0xf6/0x119 > [<ffffffff810403a6>] flush_tlb_all+0x2a/0x60 > [<ffffffff810f1f07>] __purge_vmap_area_lazy+0x142/0x1bc > [<ffffffff810f1e1f>] ? __purge_vmap_area_lazy+0x5a/0x1bc > [<ffffffff811ee6d4>] ? __bitmap_weight+0x4d/0xac > [<ffffffff810f25dc>] free_unmap_vmap_area_noflush+0x80/0x9b > [<ffffffff810f1941>] ? find_vmap_area+0x5b/0x7b > [<ffffffff810f262b>] remove_vm_area+0x34/0x97 > [<ffffffff810f27ad>] __vunmap+0x50/0x103 > [<ffffffff810857ff>] ? trace_hardirqs_on_caller+0x140/0x17a > [<ffffffff81392be0>] ? neigh_proxy_process+0xad/0x124 > [<ffffffff810f2899>] vunmap+0x39/0x4f > [<ffffffff81440dde>] text_poke+0x13c/0x186 > [<ffffffff8116df16>] ? __sysfs_put+0x1c/0x41 > [<ffffffff81446641>] ? _etext+0x0/0x3 > [<ffffffff8101a765>] alternatives_smp_unlock+0x59/0x85 > [<ffffffff8101aa31>] alternatives_smp_switch+0x16a/0x1bd > [<ffffffff816ca9ef>] alternative_instructions+0x110/0x166 > [<ffffffff816cb241>] ? identify_boot_cpu+0x23/0x5b > [<ffffffff816cb3c8>] check_bugs+0x21/0x54 > [<ffffffff816beffe>] start_kernel+0x410/0x43b > [<ffffffff816be140>] ? early_idt_handler+0x0/0x71 > [<ffffffff816be2ce>] x86_64_start_reservations+0xb9/0xd4 > [<ffffffff816be000>] ? _sinittext+0x0/0x140 > [<ffffffff816be3d6>] x86_64_start_kernel+0xed/0x110 > > Secondly I get this crash when trying to start xen under qemu-kvm. > Something similar is happening when I try to start xen directly, but I > can''t do serial logging on this computer so I can''t be sure. > > \ \/ /___ _ __ |___ / |___ / / | > \ // _ \ ''_ \ |_ \ |_ \ | | > / \ __/ | | | ___) | ___) || | > /_/\_\___|_| |_| |____(_)____(_)_| > > (XEN) Xen version 3.3.1 (michael@home) (gcc version 4.3.2 20081105 > (Red Hat 4.3.2-7) (GCC) ) Tue Feb 3 23:13:03 GMT 2009 > (XEN) Latest ChangeSet: unavailable > (XEN) Command line: console=com1 > (XEN) Video information: > (XEN) VGA is text mode 80x25, font 8x16 > (XEN) Disc information: > (XEN) Found 0 MBR signatures > (XEN) Found 0 EDD information structures > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009fc00 (usable) > (XEN) 000000000009fc00 - 00000000000a0000 (reserved) > (XEN) 00000000000e8000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 000000003fff0000 (usable) > (XEN) 000000003fff0000 - 0000000040000000 (ACPI data) > (XEN) 00000000fffbd000 - 0000000100000000 (reserved) > (XEN) System RAM: 1023MB (1048124kB) > (XEN) ACPI: RSDP 000FB9D0, 0014 (r0 QEMU ) > (XEN) ACPI: RSDT 3FFF0000, 002C (r1 QEMU QEMURSDT 1 > QEMU 1) > (XEN) ACPI: FACP 3FFF002C, 0074 (r1 QEMU QEMUFACP 1 > QEMU 1) > (XEN) ACPI: DSDT 3FFF0100, 253C (r1 BXPC BXDSDT 1 INTL > 20061109) > (XEN) ACPI: FACS 3FFF00C0, 0040 > (XEN) ACPI: APIC 3FFF2640, 00E0 (r1 QEMU QEMUAPIC 1 > QEMU 1) > (XEN) Xen heap: 14MB (14632kB) > (XEN) Domain heap initialised > (XEN) Processor #0 6:2 APIC version 20 > (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 > (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs > (XEN) Using scheduler: SMP Credit Scheduler (credit) > (XEN) Detected 2394.081 MHz processor. > (XEN) CPU0: Intel QEMU Virtual CPU version 0.9.1 stepping 03 > (XEN) Total of 1 processors activated. > (XEN) ENABLING IO-APIC IRQs > (XEN) -> Using new ACK method > (XEN) Platform timer is 3.579MHz ACPI PM Timer > (XEN) Brought up 1 CPUs > (XEN) I/O virtualisation disabled > (XEN) *** LOADING DOMAIN 0 *** > (XEN) Xen kernel: 64-bit, lsb, compat32 > (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x239bbc0 > (XEN) PHYSICAL MEMORY ARRANGEMENT: > (XEN) Dom0 alloc.: 0000000038000000->000000003c000000 (221906 pages > to be allocated) > (XEN) VIRTUAL MEMORY ARRANGEMENT: > (XEN) Loaded kernel: ffffffff81000000->ffffffff8239bbc0 > (XEN) Init. ramdisk: ffffffff8239c000->ffffffff82f5f000 > (XEN) Phys-Mach map: ffffffff82f5f000->ffffffff83130690 > (XEN) Start info: ffffffff83131000->ffffffff831314a4 > (XEN) Page tables: ffffffff83132000->ffffffff8314f000 > (XEN) Boot stack: ffffffff8314f000->ffffffff83150000 > (XEN) TOTAL: ffffffff80000000->ffffffff83400000 > (XEN) ENTRY ADDRESS: ffffffff816be200 > (XEN) Dom0 has maximum 1 VCPUs > (XEN) Scrubbing Free RAM: done. > (XEN) Xen trace buffers: disabled > (XEN) Std. Loglevel: Errors and warnings > (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) > (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch > input to Xen) > (XEN) Freed 120kB init memory. > (XEN) d0:v0: unhandled page fault (ec=0000) > (XEN) Pagetable walk from 0000000000000028: > (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff816c5315>]What does this correspond to in the kernel? $ gdb vmlinux (gdb) x/i 0xffffffff816c5315 J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote:>> ... >> (XEN) d0:v0: unhandled page fault (ec=0000) >> (XEN) Pagetable walk from 0000000000000028: >> (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff >> (XEN) domain_crash_sync called from entry.S >> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >> (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e033:[<ffffffff816c5315>] > > What does this correspond to in the kernel? > > $ gdb vmlinux > (gdb) x/i 0xffffffff816c53150xffffffff816c5315 <xen_start_kernel+16>: mov %gs:0x28,%rax Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, 22 Feb 2009, M A Young wrote:> On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote: > >>> ... >>> (XEN) d0:v0: unhandled page fault (ec=0000) >>> (XEN) Pagetable walk from 0000000000000028: >>> (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff >>> (XEN) domain_crash_sync called from entry.S >>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>> (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- >>> (XEN) CPU: 0 >>> (XEN) RIP: e033:[<ffffffff816c5315>] >> >> What does this correspond to in the kernel? >> >> $ gdb vmlinux >> (gdb) x/i 0xffffffff816c5315 > > 0xffffffff816c5315 <xen_start_kernel+16>: mov %gs:0x28,%raxThis is from 0xffffffff816c5305 <xen_start_kernel>: push %rbp 0xffffffff816c5306 <xen_start_kernel+1>: mov %rsp,%rbp 0xffffffff816c5309 <xen_start_kernel+4>: push %rbx 0xffffffff816c530a <xen_start_kernel+5>: sub $0x18,%rsp 0xffffffff816c530e <xen_start_kernel+9>: mov 0x333e23(%rip),%rdi # 0xffffffff819f9138 <xen_start_info> 0xffffffff816c5315 <xen_start_kernel+16>: mov %gs:0x28,%rax 0xffffffff816c531e <xen_start_kernel+25>: mov %rax,-0x18(%rbp) 0xffffffff816c5322 <xen_start_kernel+29>: xor %eax,%eax 0xffffffff816c5324 <xen_start_kernel+31>: test %rdi,%rdi 0xffffffff816c5327 <xen_start_kernel+34>: je 0xffffffff816c5827 <xen_start_kernel+1314> 0xffffffff816c532d <xen_start_kernel+40>: movl $0x1,0x333df9(%rip) # 0xffffffff819f9130 <xen_domain_type> ... which is generated if CONFIG_CC_STACKPROTECTOR=y (also CONFIG_CC_OPTIMIZE_FOR_SIZE=y though I don''t know is the latter is important). If these aren''t set, the compiler produces differnt code, and the boot process gets a bit further before crashing. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Sun, 22 Feb 2009, M A Young wrote: > >> On Sat, 21 Feb 2009, Jeremy Fitzhardinge wrote: >> >>>> ... >>>> (XEN) d0:v0: unhandled page fault (ec=0000) >>>> (XEN) Pagetable walk from 0000000000000028: >>>> (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff >>>> (XEN) domain_crash_sync called from entry.S >>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0: >>>> (XEN) ----[ Xen-3.3.1 x86_64 debug=n Not tainted ]---- >>>> (XEN) CPU: 0 >>>> (XEN) RIP: e033:[<ffffffff816c5315>] >>> >>> What does this correspond to in the kernel? >>> >>> $ gdb vmlinux >>> (gdb) x/i 0xffffffff816c5315 >> >> 0xffffffff816c5315 <xen_start_kernel+16>: mov %gs:0x28,%rax > > This is from > 0xffffffff816c5305 <xen_start_kernel>: push %rbp > 0xffffffff816c5306 <xen_start_kernel+1>: mov %rsp,%rbp > 0xffffffff816c5309 <xen_start_kernel+4>: push %rbx > 0xffffffff816c530a <xen_start_kernel+5>: sub $0x18,%rsp > 0xffffffff816c530e <xen_start_kernel+9>: > mov 0x333e23(%rip),%rdi # 0xffffffff819f9138 > <xen_start_info> > 0xffffffff816c5315 <xen_start_kernel+16>: mov %gs:0x28,%rax > 0xffffffff816c531e <xen_start_kernel+25>: mov %rax,-0x18(%rbp) > 0xffffffff816c5322 <xen_start_kernel+29>: xor %eax,%eax > 0xffffffff816c5324 <xen_start_kernel+31>: test %rdi,%rdi > 0xffffffff816c5327 <xen_start_kernel+34>: > je 0xffffffff816c5827 <xen_start_kernel+1314> > 0xffffffff816c532d <xen_start_kernel+40>: > movl $0x1,0x333df9(%rip) # 0xffffffff819f9130 > <xen_domain_type> > ... > > which is generated if CONFIG_CC_STACKPROTECTOR=y (also > CONFIG_CC_OPTIMIZE_FOR_SIZE=y though I don''t know is the latter is > important). If these aren''t set, the compiler produces differnt code, > and the boot process gets a bit further before crashing.Hm, yes, I guess there''s something to stop stack-protector from adding stuff to xen_start_kernel(). But I''m more interested in the crash you see when you have stack protector off. What are the symptoms? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, 22 Feb 2009, Jeremy Fitzhardinge wrote:> Hm, yes, I guess there''s something to stop stack-protector from adding stuff > to xen_start_kernel(). > > But I''m more interested in the crash you see when you have stack protector > off. What are the symptoms?That seems to have been pci related problems due to not setting pci=nomsi. Beyond that I still get ata timeouts, and IRQ problems. The end of one traceback (I couldn''t see the whole thing) looked similar to the one I mentioned for i686 at http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00832.html and another started with print_irq_inversion_bug Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Sun, 22 Feb 2009, Jeremy Fitzhardinge wrote: > >> Hm, yes, I guess there''s something to stop stack-protector from >> adding stuff to xen_start_kernel(). >> >> But I''m more interested in the crash you see when you have stack >> protector off. What are the symptoms? > > That seems to have been pci related problems due to not setting > pci=nomsi.Ah, yes. I should probably find a way to code that rather than relying on the user to put it on the command line.> Beyond that I still get ata timeouts, and IRQ problems.That''s a pity; I was hoping those problems would be behind us... Do you have any log messages relating to them?> The end of one traceback (I couldn''t see the whole thing) looked > similar to the one I mentioned for i686 at > http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00832.html > and another started with print_irq_inversion_bugThe USB lockdep error isn''t terribly worrying if the machine doesn''t actually lock up. I''m still working on a good way to fix that one. The NULL pointer dereferences are much more of a worry, and a bit random. Looks like the stack pointer has got trashed or something, so its not giving any useful information. Do you know what was going on before then? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, 22 Feb 2009, Jeremy Fitzhardinge wrote:> That''s a pity; I was hoping those problems would be behind us... Do you have > any log messages relating to them?ata is still timimg out, and the devices don''t work. Some errors from a kvm boot are ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 ata2.00: ATAPI: QEMU DVD-ROM, 0.9.1, max UDMA/100 ata2.00: configured for MWDMA2 Clocksource tsc unstable (delta = -2149391501 ns) ata2.00: qc timeout (cmd 0xa0) ata2.00: TEST_UNIT_READY failed (err_mask=0x4) ata2.00: configured for MWDMA2 ata2.00: qc timeout (cmd 0xa0) ata2.00: TEST_UNIT_READY failed (err_mask=0x4) ata2.00: limiting speed to MWDMA2:PIO3 ata2.00: configured for MWDMA2 ata2.00: TEST_UNIT_READY failed (err_mask=0x4) ata2.00: disabled ata2: soft resetting link ata2: EH complete> The NULL pointer dereferences are much more of a worry, and a bit random. > Looks like the stack pointer has got trashed or something, so its not giving > any useful information. Do you know what was going on before then?I haven''t yet seen the NULL pointer dereferences in x86_64, but for i686 boot they would just be part of the ordinary system startup, which I think in that case might have got far enough to try to fire off an X boot screen (the kernel was from before the last couple of days of drm related fixes were applied). Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Sun, 22 Feb 2009, Jeremy Fitzhardinge wrote: > >> That''s a pity; I was hoping those problems would be behind us... Do >> you have any log messages relating to them? > > ata is still timimg out, and the devices don''t work. Some errors from > a kvm boot are > ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 > ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 > ata2.00: ATAPI: QEMU DVD-ROM, 0.9.1, max UDMA/100 > ata2.00: configured for MWDMA2 > Clocksource tsc unstable (delta = -2149391501 ns) > ata2.00: qc timeout (cmd 0xa0) > ata2.00: TEST_UNIT_READY failed (err_mask=0x4) > ata2.00: configured for MWDMA2 > ata2.00: qc timeout (cmd 0xa0) > ata2.00: TEST_UNIT_READY failed (err_mask=0x4) > ata2.00: limiting speed to MWDMA2:PIO3 > ata2.00: configured for MWDMA2 > > ata2.00: TEST_UNIT_READY failed (err_mask=0x4) > ata2.00: disabled > ata2: soft resetting link > ata2: EH completeThis is under qemu/kvm? Can you include the complete boot output, and compare a native boot of the dom0 kernel too?> I haven''t yet seen the NULL pointer dereferences in x86_64, but for > i686 boot they would just be part of the ordinary system startup, > which I think in that case might have got far enough to try to fire > off an X boot screen (the kernel was from before the last couple of > days of drm related fixes were applied).Hm, I don''t see what would be causing them, regardless. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, 22 Feb 2009, Jeremy Fitzhardinge wrote:> M A Young wrote: >> ... >> ata2.00: TEST_UNIT_READY failed (err_mask=0x4) >> ata2.00: disabled >> ata2: soft resetting link >> ata2: EH complete > > This is under qemu/kvm? > > Can you include the complete boot output, and compare a native boot of the > dom0 kernel too?My x86_64 system has started working with the latest set of patches, I suspect as a result of the mtrr related smp changes. If QEMU is still broken, I will submit the boot log once I get a chance to test it. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 25 Feb 2009, M A Young wrote:> My x86_64 system has started working with the latest set of patches, I > suspect as a result of the mtrr related smp changes. If QEMU is still broken, > I will submit the boot log once I get a chance to test it.The QEMU boot still fails (pvops patch up-to-date as of 2 days ago), probably because it is emulating an ide style cdrom drive which I believe xen still has problems with (I have similar problems booting a different system with ide disks). The boot log (bzipped) is attached. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Wed, 25 Feb 2009, M A Young wrote: > >> My x86_64 system has started working with the latest set of patches, >> I suspect as a result of the mtrr related smp changes. If QEMU is >> still broken, I will submit the boot log once I get a chance to test it. > > The QEMU boot still fails (pvops patch up-to-date as of 2 days ago), > probably because it is emulating an ide style cdrom drive which I > believe xen still has problems with (I have similar problems booting a > different system with ide disks). The boot log (bzipped) is attached.(bzip''s a bit of an overkill, its only 7k.) Yes, I think the legacy interrupts are not being set up completely, but I''m not quite sure how they should be set up. Will look into it. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Yes, I think the legacy interrupts are not being set up completely, but > I''m not quite sure how they should be set up. Will look into it.FYI: Recently my dated, apic-less i386 laptop started to successfully boot the pv_ops/dom0 kernel, all the way up to userspace. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2 Mar 2009, Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: >> Yes, I think the legacy interrupts are not being set up completely, but >> I''m not quite sure how they should be set up. Will look into it. > > FYI: Recently my dated, apic-less i386 laptop started to successfully > boot the pv_ops/dom0 kernel, all the way up to userspace.How recently? I know there was a fix in the first half of last week relating to smp mtrr that got one of my machines working, but the problem I am having wasn''t fixed by that. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Mon, 2 Mar 2009, Gerd Hoffmann wrote: > >> Jeremy Fitzhardinge wrote: >>> Yes, I think the legacy interrupts are not being set up completely, but >>> I''m not quite sure how they should be set up. Will look into it. >> >> FYI: Recently my dated, apic-less i386 laptop started to successfully >> boot the pv_ops/dom0 kernel, all the way up to userspace. > > How recently? I know there was a fix in the first half of last week > relating to smp mtrr that got one of my machines working, but the > problem I am having wasn''t fixed by that.Somewhen last week. Didn''t try a while before that (early Feb IIRC), so there are quite a few candidates which could have fixed it ... cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Wed, 25 Feb 2009, M A Young wrote: > >> My x86_64 system has started working with the latest set of patches, >> I suspect as a result of the mtrr related smp changes. If QEMU is >> still broken, I will submit the boot log once I get a chance to test it. > > The QEMU boot still fails (pvops patch up-to-date as of 2 days ago), > probably because it is emulating an ide style cdrom drive which I > believe xen still has problems with (I have similar problems booting a > different system with ide disks). The boot log (bzipped) is attached.I committed a change to properly initialize legacy irqs, which might help with IDE devices. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: > >> Yes, I think the legacy interrupts are not being set up completely, but >> I''m not quite sure how they should be set up. Will look into it. >> > > FYI: Recently my dated, apic-less i386 laptop started to successfully > boot the pv_ops/dom0 kernel, all the way up to userspace. >Do you get a vga console? Can you start domains? x86-32 booting to usermode dom0, but only with serial console and domain creation fails (SIGBUS in the domain builder, so I''m hoping its related to the hvm qemu crash). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Gerd Hoffmann wrote: >> Jeremy Fitzhardinge wrote: >> >>> Yes, I think the legacy interrupts are not being set up completely, but >>> I''m not quite sure how they should be set up. Will look into it. >>> >> >> FYI: Recently my dated, apic-less i386 laptop started to successfully >> boot the pv_ops/dom0 kernel, all the way up to userspace. >> > > Do you get a vga console? Can you start domains?gfx console works (i.e. kernel /xen-3.3.gz vga=gfx-1024x768x16). vga text console didn''t last time I tried.> x86-32 booting to usermode dom0, but only with serial console and domain > creation fails (SIGBUS in the domain builder, so I''m hoping its related > to the hvm qemu crash).Didn''t try yet, the machine is heavily underpowered for serious virtualization work, it has 192 MB RAM only. And hvm doesn''t work anyway because the box is way to old for that (Pentium III). cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: >> Gerd Hoffmann wrote: >>> Jeremy Fitzhardinge wrote: >>> >>>> Yes, I think the legacy interrupts are not being set up completely, but >>>> I''m not quite sure how they should be set up. Will look into it. >>>> >>> FYI: Recently my dated, apic-less i386 laptop started to successfully >>> boot the pv_ops/dom0 kernel, all the way up to userspace. >>> >> Do you get a vga console? Can you start domains? > > gfx console works (i.e. kernel /xen-3.3.gz vga=gfx-1024x768x16). > vga text console didn''t last time I tried.Update: latest kernel (rc7 based) crashes. rc6 from somewhen last week is the working one. rc7 messages: unhandled page fault (ec=0003) page table walk from c1c55000 l1[0x055] = 9c55061 1c55 -> rw access to r/o page? EIP c140f4c3 c140f2d2 <alloc_bootmem_core>: [ ... ] c140f4c3: f3 ab rep stos %eax,%es:(%edi) -> memset(page,0,PAGE_SIZE) ? Dom0 domain builder says page tables are at c1c55000 -> c1c6a000 /me guesses the initial page tables are released to the page allocator, but still they are mapped r/o => boom as soon as one happens to get allocated. Which probably happens very soon on memory-constrained machines like mine, while other might stay up longer and show strange bugs later on ;) BTW: The trick to see the messages on the laptop screen is: kernel /xen-3.3.gz vga=text-80x50,keep module /vmlinuz-2.6.29-rc7-tip-kraxel ro root=/dev/zen/rawhide \ console=hvc0>> x86-32 booting to usermode dom0, but only with serial console and domain >> creation fails (SIGBUS in the domain builder, so I''m hoping its related >> to the hvm qemu crash). > > Didn''t try yet, the machine is heavily underpowered for serious > virtualization work, it has 192 MB RAM only. And hvm doesn''t work > anyway because the box is way to old for that (Pentium III).Doesn''t work. I get messages about failed multicalls with remap_page_range and privcmd_ioctl in the stack trace. Most likely mapping the guest pages in the domain builder doesn''t work. No surprise this leads to SIGBUS. HTH, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Mar 05, 2009 at 08:35:29AM +0100, Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: > > Gerd Hoffmann wrote: > >> Jeremy Fitzhardinge wrote: > >> > >>> Yes, I think the legacy interrupts are not being set up completely, but > >>> I''m not quite sure how they should be set up. Will look into it. > >>> > >> > >> FYI: Recently my dated, apic-less i386 laptop started to successfully > >> boot the pv_ops/dom0 kernel, all the way up to userspace. > >> > > > > Do you get a vga console? Can you start domains? > > gfx console works (i.e. kernel /xen-3.3.gz vga=gfx-1024x768x16). > vga text console didn''t last time I tried. >VGA text console doesn''t work for me either. I''ve only managed to get the serial console working.. although I haven''t tried any graphics modes yet. Maybe I should play with these again.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Update: latest kernel (rc7 based) crashes. rc6 from somewhen last week > is the working one. rc7 messages: > > unhandled page fault (ec=0003) > page table walk from c1c55000 > l1[0x055] = 9c55061 1c55 > > -> rw access to r/o page? > > EIP c140f4c3 > > c140f2d2 <alloc_bootmem_core>: > [ ... ] > c140f4c3: f3 ab rep stos %eax,%es:(%edi) > > -> memset(page,0,PAGE_SIZE) ? > > Dom0 domain builder says page tables are at c1c55000 -> c1c6a000 > > /me guesses the initial page tables are released to the page allocator, > but still they are mapped r/o => boom as soon as one happens to get > allocated. Which probably happens very soon on memory-constrained > machines like mine, while other might stay up longer and show strange > bugs later on ;) >Hm. You should see "XEN PAGETABLES" in the early reservations, which should protect them from then on. Oh, look, its only doing it in the 64-bit setup.> BTW: The trick to see the messages on the laptop screen is: > > kernel /xen-3.3.gz vga=text-80x50,keep > module /vmlinuz-2.6.29-rc7-tip-kraxel ro root=/dev/zen/rawhide \ > console=hvc0 >Hm, OK. But normal vga console should work. Works fine on 64-bit; can''t think of why they might differ... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 4 Mar 2009, Jeremy Fitzhardinge wrote:> M A Young wrote: >> The QEMU boot still fails (pvops patch up-to-date as of 2 days ago), >> probably because it is emulating an ide style cdrom drive which I believe >> xen still has problems with (I have similar problems booting a different >> system with ide disks). The boot log (bzipped) is attached. > > I committed a change to properly initialize legacy irqs, which might help > with IDE devices.Yes, a recent update allows my qemu test environment to see its disk. It crashes later, but in what looks to be a non-xen related way. However booting the kernel non-xen crashes much faster with the traceback RAMDISK: gzip decompressor not configured! BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 0 Oops: 0010 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.29-0.114.2.6.rc7.fc10.x86_64 #1 RIP: 0010:[<0000000000000000>] [<(null)>] (null) RSP: 0018:ffff88003f76de38 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88003b22dff0 RCX: ffffffff8162e4dc RDX: ffffffff8162e49c RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88003f76ded0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000046 R11: ffff88003f76dde0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81625fa8 FS: 0000000000000000(0000) GS:ffff880003000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff88003f76c000, task ffff88003f770000) Stack: ffffffff8162e7a8 ffffffff8162e471 ffffffff811a5fca ffff880000000010 ffffffff814c6f0d 000000013f76de70 ffffffff00000000 ffff88003f76dea0 ffffffff814f0922 0000000000000000 ffffffff814c6d54 000000005c2d2f7c Call Trace: [<ffffffff8162e7a8>] ? rd_load_image+0x27b/0x4dd [<ffffffff8162e471>] ? error+0x0/0x2b [<ffffffff811a5fca>] ? sscanf+0x38/0x3a [<ffffffff8162eaa8>] initrd_load+0x31/0x2ed [<ffffffff8162e37f>] prepare_namespace+0xe2/0x19d [<ffffffff8162d73f>] kernel_init+0x21a/0x22a [<ffffffff81012e6a>] child_rip+0xa/0x20 [<ffffffff81012850>] ? restore_args+0x0/0x30 [<ffffffff8162d525>] ? kernel_init+0x0/0x22a [<ffffffff81012e60>] ? child_rip+0x0/0x20 Code: Bad RIP value. RIP [<(null)>] (null) RSP <ffff88003f76de38> CR2: 0000000000000000 ---[ end trace a678a5d887494ac4 ]--- swapper used greatest stack depth: 4272 bytes left Kernel panic - not syncing: Attempted to kill init! Pid: 1, comm: swapper Tainted: G D 2.6.29-0.114.2.6.rc7.fc10.x86_64 #1 Call Trace: [<ffffffff813a6544>] panic+0x7a/0x13b [<ffffffff81071368>] ? trace_hardirqs_on_caller+0x1f/0x151 [<ffffffff813a91b3>] ? _write_unlock_irq+0x30/0x3b [<ffffffff8105092f>] ? do_exit+0x37c/0x8a9 [<ffffffff81050636>] do_exit+0x83/0x8a9 [<ffffffff813a6646>] ? printk+0x41/0x43 [<ffffffff813aa944>] oops_end+0xbf/0xc7 [<ffffffff81032e8d>] no_context+0x1f2/0x201 [<ffffffff81033046>] __bad_area_nosemaphore+0x1aa/0x1d0 [<ffffffff8102e046>] ? pvclock_clocksource_read+0x47/0x83 [<ffffffff811ab221>] ? debug_check_no_obj_freed+0x152/0x1c8 [<ffffffff813abd09>] ? do_page_fault+0x11a/0x27f [<ffffffff8103307f>] bad_area_nosemaphore+0x13/0x15 [<ffffffff813abd71>] do_page_fault+0x182/0x27f [<ffffffff813a9d65>] page_fault+0x25/0x30 [<ffffffff8162e4dc>] ? compr_flush+0x0/0x51 [<ffffffff8162e49c>] ? compr_fill+0x0/0x40 [<ffffffff8162e7a8>] ? rd_load_image+0x27b/0x4dd [<ffffffff8162e471>] ? error+0x0/0x2b [<ffffffff811a5fca>] ? sscanf+0x38/0x3a [<ffffffff8162eaa8>] initrd_load+0x31/0x2ed [<ffffffff8162e37f>] prepare_namespace+0xe2/0x19d [<ffffffff8162d73f>] kernel_init+0x21a/0x22a [<ffffffff81012e6a>] child_rip+0xa/0x20 [<ffffffff81012850>] ? restore_args+0x0/0x30 [<ffffffff8162d525>] ? kernel_init+0x0/0x22a [<ffffffff81012e60>] ? child_rip+0x0/0x20 so that might indicate that some recent change breaks a non-dom0 boot (if I haven''t done something to break it myself). Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
M A Young wrote:> On Wed, 4 Mar 2009, Jeremy Fitzhardinge wrote: > >> M A Young wrote: >>> The QEMU boot still fails (pvops patch up-to-date as of 2 days ago), >>> probably because it is emulating an ide style cdrom drive which I >>> believe xen still has problems with (I have similar problems booting >>> a different system with ide disks). The boot log (bzipped) is attached. >> >> I committed a change to properly initialize legacy irqs, which might >> help with IDE devices. > > Yes, a recent update allows my qemu test environment to see its disk. > It crashes later, but in what looks to be a non-xen related way. > However booting the kernel non-xen crashes much faster with the traceback > > RAMDISK: gzip decompressor not configured!It looks like you need to configure gzip compression for your initrd. Also, make sure you don''t use any of the other compression algorithms for the kernel itself, or Xen won''t be able to parse them.> BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<(null)>] (null)This looks like a bug. HPA, will it fall down a NULL function pointer if you leave compression out? J> PGD 0 > Oops: 0010 [#1] SMP > last sysfs file: > CPU 0 > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.29-0.114.2.6.rc7.fc10.x86_64 #1 > RIP: 0010:[<0000000000000000>] [<(null)>] (null) > RSP: 0018:ffff88003f76de38 EFLAGS: 00010246 > RAX: 0000000000000001 RBX: ffff88003b22dff0 RCX: ffffffff8162e4dc > RDX: ffffffff8162e49c RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffff88003f76ded0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000046 R11: ffff88003f76dde0 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81625fa8 > FS: 0000000000000000(0000) GS:ffff880003000000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 1, threadinfo ffff88003f76c000, task > ffff88003f770000) > Stack: > ffffffff8162e7a8 ffffffff8162e471 ffffffff811a5fca ffff880000000010 > ffffffff814c6f0d 000000013f76de70 ffffffff00000000 ffff88003f76dea0 > ffffffff814f0922 0000000000000000 ffffffff814c6d54 000000005c2d2f7c > Call Trace: > [<ffffffff8162e7a8>] ? rd_load_image+0x27b/0x4dd > [<ffffffff8162e471>] ? error+0x0/0x2b > [<ffffffff811a5fca>] ? sscanf+0x38/0x3a > [<ffffffff8162eaa8>] initrd_load+0x31/0x2ed > [<ffffffff8162e37f>] prepare_namespace+0xe2/0x19d > [<ffffffff8162d73f>] kernel_init+0x21a/0x22a > [<ffffffff81012e6a>] child_rip+0xa/0x20 > [<ffffffff81012850>] ? restore_args+0x0/0x30 > [<ffffffff8162d525>] ? kernel_init+0x0/0x22a > [<ffffffff81012e60>] ? child_rip+0x0/0x20 > Code: Bad RIP value. > RIP [<(null)>] (null) > RSP <ffff88003f76de38> > CR2: 0000000000000000 > ---[ end trace a678a5d887494ac4 ]--- > swapper used greatest stack depth: 4272 bytes left > Kernel panic - not syncing: Attempted to kill init! > Pid: 1, comm: swapper Tainted: G D > 2.6.29-0.114.2.6.rc7.fc10.x86_64 #1 > Call Trace: > [<ffffffff813a6544>] panic+0x7a/0x13b > [<ffffffff81071368>] ? trace_hardirqs_on_caller+0x1f/0x151 > [<ffffffff813a91b3>] ? _write_unlock_irq+0x30/0x3b > [<ffffffff8105092f>] ? do_exit+0x37c/0x8a9 > [<ffffffff81050636>] do_exit+0x83/0x8a9 > [<ffffffff813a6646>] ? printk+0x41/0x43 > [<ffffffff813aa944>] oops_end+0xbf/0xc7 > [<ffffffff81032e8d>] no_context+0x1f2/0x201 > [<ffffffff81033046>] __bad_area_nosemaphore+0x1aa/0x1d0 > [<ffffffff8102e046>] ? pvclock_clocksource_read+0x47/0x83 > [<ffffffff811ab221>] ? debug_check_no_obj_freed+0x152/0x1c8 > [<ffffffff813abd09>] ? do_page_fault+0x11a/0x27f > [<ffffffff8103307f>] bad_area_nosemaphore+0x13/0x15 > [<ffffffff813abd71>] do_page_fault+0x182/0x27f > [<ffffffff813a9d65>] page_fault+0x25/0x30 > [<ffffffff8162e4dc>] ? compr_flush+0x0/0x51 > [<ffffffff8162e49c>] ? compr_fill+0x0/0x40 > [<ffffffff8162e7a8>] ? rd_load_image+0x27b/0x4dd > [<ffffffff8162e471>] ? error+0x0/0x2b > [<ffffffff811a5fca>] ? sscanf+0x38/0x3a > [<ffffffff8162eaa8>] initrd_load+0x31/0x2ed > [<ffffffff8162e37f>] prepare_namespace+0xe2/0x19d > [<ffffffff8162d73f>] kernel_init+0x21a/0x22a > [<ffffffff81012e6a>] child_rip+0xa/0x20 > [<ffffffff81012850>] ? restore_args+0x0/0x30 > [<ffffffff8162d525>] ? kernel_init+0x0/0x22a > [<ffffffff81012e60>] ? child_rip+0x0/0x20 > so that might indicate that some recent change breaks a non-dom0 boot > (if I haven''t done something to break it myself). > > Michael Young_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 5 Mar 2009, Jeremy Fitzhardinge wrote:>> Yes, a recent update allows my qemu test environment to see its disk. It >> crashes later, but in what looks to be a non-xen related way. However >> booting the kernel non-xen crashes much faster with the traceback >> >> RAMDISK: gzip decompressor not configured! > It looks like you need to configure gzip compression for your initrd. Also, > make sure you don''t use any of the other compression algorithms for the > kernel itself, or Xen won''t be able to parse them. > >> BUG: unable to handle kernel NULL pointer dereference at (null) >> IP: [<(null)>] (null) > > This looks like a bug. HPA, will it fall down a NULL function pointer if you > leave compression out?It now boots (xen and non-xen) if I build with CONFIG_RD_GZIP=y (and work around the problems of my current livecd generating situation). I get the impression that the boot is expected to fail if this or an equivalent CONFIG_RD_BZIP2 or CONFIG_RD_LZMA setting, but it should do so more gracefully so I agree this is a bug. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Gerd Hoffmann wrote: > Hm. You should see "XEN PAGETABLES" in the early reservations, which > should protect them from then on. Oh, look, its only doing it in the > 64-bit setup.I see it is fixed in latest git. Now the i386 machine shows other issues, device drivers fail to register, errno 38 (ENOSYS). Huh? Also Shift-PgUp doesn''t work, which makes me think it is an interrupt issue. APCI-less box. The x86_64 machine (with IO-APIC) is doing fine.>> BTW: The trick to see the messages on the laptop screen is: >> >> kernel /xen-3.3.gz vga=text-80x50,keep >> module /vmlinuz-2.6.29-rc7-tip-kraxel ro root=/dev/zen/rawhide \ >> console=hvc0 >> > > Hm, OK. But normal vga console should work.I can watch the cursor moving, just no characters appear on the screen. Maybe some I/O port access issue? So the color palette is foobar and it prints black on black?> Works fine on 64-bit; > can''t think of why they might differ...My x86_64 machine is headless, so I can''t comment on that ;) cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I can watch the cursor moving, just no characters appear on the screen. > Maybe some I/O port access issue? So the color palette is foobar and > it prints black on black?Hmm, funny thing is, earlyprintk=vga _does_ print something. Looks incomplete though (scrolls by quickly) and of course the early console stops anyway as soon as the vga console takes over. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: > >> Gerd Hoffmann wrote: >> Hm. You should see "XEN PAGETABLES" in the early reservations, which >> should protect them from then on. Oh, look, its only doing it in the >> 64-bit setup. >> > > I see it is fixed in latest git. > > Now the i386 machine shows other issues, device drivers fail to > register, errno 38 (ENOSYS). Huh? Also Shift-PgUp doesn''t work, which > makes me think it is an interrupt issue. APCI-less box. The x86_64 > machine (with IO-APIC) is doing fine. >Hm. I was not really planning on supporting no-ACPI; I''ve only hooked acpi_register_gsi, which is called via acpi_pci_irq_enable. I guess you''d need to do something in pirq_enable_irq as well, and I''m not sure if all the stuff gets set up properly for IO_APIC_get_PCI_irq_vector to work.> I can watch the cursor moving, just no characters appear on the screen. > Maybe some I/O port access issue? So the color palette is foobar and > it prints black on black? >Yes, that''s what I see too. I spent some time staring at the vga framebuffer mapping and I can''t see anything wrong with it at all - and I think the vga code can see its own framebuffer because it actually tests to see if it can write and read back from it. The fact that the cursor moves around suggests that IO ports are the only thing that *are* working, but perhaps the io bitmap is coming in to play (hm, there''s the ring 1 vs ring 3 difference between 32 and 64 bit). Aside from that, there''s the palette, as you suggested, and the character generator might be all empty too, I guess. Also, when I try to start X it just spins there allocating memory until everything falls over (used to crash, before the pagetable reservation fix). I''m guessing its related, but I haven''t looked into it yet. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:>> I can watch the cursor moving, just no characters appear on the screen. >> Maybe some I/O port access issue? So the color palette is foobar and >> it prints black on black? >> > > Hmm, funny thing is, earlyprintk=vga _does_ print something. Looks > incomplete though (scrolls by quickly) and of course the early console > stops anyway as soon as the vga console takes over.Ah, yes. I''d noticed that and forgotten about it. Well. What the hell does that mean? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi,>> Now the i386 machine shows other issues, device drivers fail to >> register, errno 38 (ENOSYS). Huh? Also Shift-PgUp doesn''t work, which >> makes me think it is an interrupt issue. APCI-less box. The x86_64 >> machine (with IO-APIC) is doing fine. > > Hm. I was not really planning on supporting no-ACPI; I''ve only hooked > acpi_register_gsi, which is called via acpi_pci_irq_enable. I guess > you''d need to do something in pirq_enable_irq as well, and I''m not sure > if all the stuff gets set up properly for IO_APIC_get_PCI_irq_vector to > work.I have -rc6 kernel which *does* boot. Maybe that was by accident ;) The kernel initializes legacy interrupts anyway. I think you don''t need to do more to handle apic-less machines, no? cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:>> Hm. I was not really planning on supporting no-ACPI; I''ve only hooked >> acpi_register_gsi, which is called via acpi_pci_irq_enable. I guess >> you''d need to do something in pirq_enable_irq as well, and I''m not sure >> if all the stuff gets set up properly for IO_APIC_get_PCI_irq_vector to >> work. >> > > I have -rc6 kernel which *does* boot. Maybe that was by accident ;) >Hm, well I don''t remember adding any new ENOSYSes in there, so perhaps the core kernel has changed in some way under us. I''m guessing it''s this: static int __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) { struct irqaction *old, **old_ptr; const char *old_name = NULL; unsigned long flags; int shared = 0; int ret; if (!desc) return -EINVAL; if (desc->chip == &no_irq_chip) return -ENOSYS; ... which gets called from request_irq. So that means that the desc is getting allocated but the chip hasn''t been set up. Are you using sparse irqs?> The kernel initializes legacy interrupts anyway. I think you don''t need > to do more to handle apic-less machines, no? >I guess not, if its only using irqs < 16. How old is this machine anyway; do you really mean its a literal i386? But the info I''m using to set up the legacy interrupts comes from acpi tables, I think, so perhaps its misprogramming the legacy interrupts, whereas before they just happened to work in their default config (???). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 5 Mar 2009, H. Peter Anvin wrote:> Jeremy Fitzhardinge wrote: >> >> This looks like a bug. HPA, will it fall down a NULL function pointer >> if you leave compression out? >> > > It''s not supposed to, obviously, but this could be a bug. Could the OP > please post his .config?The config is attached. It might be specific to x86_64 as I tried to reproduce it on i686 PAE with a similar kernel but failed. Michael Young _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> static int > __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) > { > struct irqaction *old, **old_ptr; > const char *old_name = NULL; > unsigned long flags; > int shared = 0; > int ret; > > if (!desc) > return -EINVAL; > > if (desc->chip == &no_irq_chip) > return -ENOSYS; > ...I can try sprinkle in some printk''s to figure ...> which gets called from request_irq. So that means that the desc is > getting allocated but the chip hasn''t been set up. Are you using sparse > irqs?It''s enabled, yes (config is derived from default fedora one ...)>> The kernel initializes legacy interrupts anyway. I think you don''t need >> to do more to handle apic-less machines, no? > > I guess not, if its only using irqs < 16. How old is this machine > anyway; do you really mean its a literal i386?Pentium III (~2001), /proc/interrupts on bare metal looks like this: CPU0 0: 617526 XT-PIC-XT timer 1: 286 XT-PIC-XT i8042 2: 0 XT-PIC-XT cascade 3: 3 XT-PIC-XT 4: 1 XT-PIC-XT 5: 1 XT-PIC-XT Intel 440MX Modem, Intel 440MX 6: 1 XT-PIC-XT 7: 1 XT-PIC-XT 8: 1 XT-PIC-XT rtc0 9: 5 XT-PIC-XT acpi 10: 50 XT-PIC-XT yenta, firewire_ohci 11: 13186 XT-PIC-XT uhci_hcd:usb1, eth0 12: 6131 XT-PIC-XT i8042 14: 56351 XT-PIC-XT ata_piix 15: 0 XT-PIC-XT ata_piix NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0> But the info I''m using to set up the legacy interrupts comes from acpi > tables, I think, so perhaps its misprogramming the legacy interrupts, > whereas before they just happened to work in their default config (???).As the *registration* fails already I don''t think it is misprogramming. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi,> But the info I''m using to set up the legacy interrupts comes from acpi > tables, I think, so perhaps its misprogramming the legacy interrupts, > whereas before they just happened to work in their default config (???).Well, if there is no info in the acpi tables, you''ll ignore the IRQ altogether ... cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Gerd Hoffmann wrote: >>> I can watch the cursor moving, just no characters appear on the screen. >>> Maybe some I/O port access issue? So the color palette is foobar and >>> it prints black on black? >>> >> >> Hmm, funny thing is, earlyprintk=vga _does_ print something. Looks >> incomplete though (scrolls by quickly) and of course the early console >> stops anyway as soon as the vga console takes over. > > Ah, yes. I''d noticed that and forgotten about it. Well. What the hell > does that mean?Memory mapping issue. Seems to break during init_memory_mapping(). I can slowly print lines on /dev/tty0 using a shell loop with a sleep in there. Doesn''t print anything on the screen. I can see the stuff printed by earlyvga scroll through the screen though. Thus vga register access (for panning) works just fine. Accessing the memory probably ends up somewhere else due to the mappings not being setup correctly. Last line of earlyvga output is this: init_memory_mapping: 0000000000000000-000000001d001000 Note that init_memory_mapping () has this close to the end if the function: #ifdef CONFIG_X86_32 early_ioremap_page_table_range_init(); load_cr3(swapper_pg_dir); #endif i.e. early iomap setup is different in 32bit and 64bit. Which would also explain why vgacon works just fine in 64bit mode. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> i.e. early iomap setup is different in 32bit and 64bit. Which would > also explain why vgacon works just fine in 64bit mode.I think it is something else. arch/x86/mm/init_64.c, phys_pte_init(): /* * We will re-use the existing mapping. * Xen for example has some special requirements, like mapping * pagetable pages as RO. So assume someone who pre-setup * these mappings are more intelligent. */ if (pte_val(*pte)) { pages++; continue; } I think that does also make sure vga mappings are not overwritten with something else. 32bit seems to have no equivalent for this though ... cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Hi, > > >> But the info I''m using to set up the legacy interrupts comes from acpi >> tables, I think, so perhaps its misprogramming the legacy interrupts, >> whereas before they just happened to work in their default config (???). >> > > Well, if there is no info in the acpi tables, you''ll ignore the IRQ > altogether ... >Yes, that''s what I suspected. Can you write up a proper patch? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Gerd Hoffmann wrote: > >> i.e. early iomap setup is different in 32bit and 64bit. Which would >> also explain why vgacon works just fine in 64bit mode. >> > > I think it is something else. > > arch/x86/mm/init_64.c, phys_pte_init(): > > /* > * We will re-use the existing mapping. > * Xen for example has some special requirements, like mapping > * pagetable pages as RO. So assume someone who pre-setup > * these mappings are more intelligent. > */ > if (pte_val(*pte)) { > pages++; > continue; > } > > I think that does also make sure vga mappings are not overwritten with > something else. 32bit seems to have no equivalent for this though ... >It does, in a fairly hacky and disgusting way. During boot, we use a special version of xen_set_pte which ignores attempts to convert a RO->RW mapping, in order to protect existing pagetable mappings. But it probably won''t help if someone is trying to replace the ISA mappings with something else. In theory all those mappings should be created with _PAGE_IOMAP anyway, so we''d do the right thing; but I don''t think that''s happening. In the meantime I could extend the hack to look for attempts to overwrite _PAGE_IOMAP mappings or something... Or force _PAGE_IOMAP on for pfns in the ISA window. Fortunately there seems to be an active attempt to unify 32 and 64-bit mapping creation, should help (so long as it converges on the 64-bit code, which is more sensible). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> > Yes, that''s what I suspected. Can you write up a proper patch?I''ve two patches for you. The first turns the silly "xen-pirq-pirq" in /proc/interrupts into something useful. The second does proper legacy irq setup on top of that. enjoy, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Jeremy Fitzhardinge wrote: > >> Yes, that''s what I suspected. Can you write up a proper patch? >> > > I''ve two patches for you. The first turns the silly "xen-pirq-pirq" in > /proc/interrupts into something useful. The second does proper legacy > irq setup on top of that. >Could you s-o-b them too?> + if (0 == nr_ioapics) { > + for (irq=0; irq < NR_IRQS_LEGACY; irq++) > + xen_allocate_pirq(irq, "legacy"); > + return; > + } >I guess the assumption here is that if there''s no ioapics, we don''t have acpi? Or I guess it doesn''t matter because we can''t program the triggering anyway.> + > /* Pre-allocate legacy irqs */ > for (irq=0; irq < NR_IRQS_LEGACY; irq++) { > - int trigger, polarity; > - > - if (acpi_get_override_irq(irq, &trigger, &polarity) == -1) > - continue; > + int trigger= 1, polarity = 0; > > + acpi_get_override_irq(irq, &trigger, &polarity); > xen_register_gsi(irq, > trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE, > polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH);I don''t think this is correct, for two reasons. 1: I think the default ISA triggering is edge/active low, so this will result in screaming interrupts if we ever use the defaults, but 2: acpi_get_override_irq() returns the appropriate default for ISA anyway, and we shouldn''t do anything if it fails (otherwise we might try to do things to magic-irq 2 which could upset things, though I suspect Xen will stop anything really bad from happening). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Could you s-o-b them too?Sure.>> + if (0 == nr_ioapics) { >> + for (irq=0; irq < NR_IRQS_LEGACY; irq++) >> + xen_allocate_pirq(irq, "legacy"); >> + return; >> + } >> > > I guess the assumption here is that if there''s no ioapics, we don''t have > acpi? Or I guess it doesn''t matter because we can''t program the > triggering anyway.We can''t program the trigger anyway. My machine has acpi. Nevertheless acpi_get_override_irq fails due to no ioapic being present.>> /* Pre-allocate legacy irqs */ >> for (irq=0; irq < NR_IRQS_LEGACY; irq++) { >> - int trigger, polarity; >> - >> - if (acpi_get_override_irq(irq, &trigger, &polarity) == -1) >> - continue; >> + int trigger= 1, polarity = 0; >> >> + acpi_get_override_irq(irq, &trigger, &polarity); >> xen_register_gsi(irq, >> trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE, >> polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH); > > 2: acpi_get_override_irq() > returns the appropriate default for ISA anyway, and we shouldn''t do > anything if it failsOk. So the old code should be fine and we just need the additional loop to handle the ioapic-less case. Will send updated patches tomorrow. cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann wrote:> Ok. So the old code should be fine and we just need the additional loop > to handle the ioapic-less case. Will send updated patches tomorrow. >OK. I already applied them as-is just to check nothing breaks. I''ll replace them when you repost. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge wrote:> Gerd Hoffmann wrote: >> Ok. So the old code should be fine and we just need the additional loop >> to handle the ioapic-less case. Will send updated patches tomorrow. >> > > OK. I already applied them as-is just to check nothing breaks. I''ll > replace them when you repost.Here we go. Fixed ioapic loop as discussed, also updated names to be more descriptive, looks like this now: [root@xeni ~]# grep pirq /proc/interrupts 1: 2 0 0 0 xen-pirq-ioapic-edge i8042 3: 3 0 0 0 xen-pirq-ioapic-edge 4: 3 0 0 0 xen-pirq-ioapic-edge 7: 0 0 0 0 xen-pirq-ioapic-edge parport0 8: 1 0 0 0 xen-pirq-ioapic-edge rtc0 9: 0 0 0 0 xen-pirq-ioapic-level acpi 12: 4 0 0 0 xen-pirq-ioapic-edge i8042 16: 0 0 0 0 xen-pirq-ioapic-level uhci_hcd:usb3, uhci_hcd:usb8 18: 0 0 0 0 xen-pirq-ioapic-level uhci_hcd:usb5 19: 5288 0 0 0 xen-pirq-ioapic-level ehci_hcd:usb1, uhci_hcd:usb7, ahci 20: 524 0 0 0 xen-pirq-ioapic-level eth0 21: 0 0 0 0 xen-pirq-ioapic-level uhci_hcd:usb4 22: 242 0 0 0 xen-pirq-ioapic-level HDA Intel 23: 0 0 0 0 xen-pirq-ioapic-level ehci_hcd:usb2, uhci_hcd:usb6 [root@zen ~]# grep pirq /proc/interrupts 1: 8 xen-pirq-xt-pic i8042 3: 5 xen-pirq-xt-pic 4: 1 xen-pirq-xt-pic 5: 0 xen-pirq-xt-pic Intel 440MX, Intel 440MX Modem 6: 1 xen-pirq-xt-pic 7: 1 xen-pirq-xt-pic 8: 1 xen-pirq-xt-pic rtc0 10: 200002 xen-pirq-xt-pic yenta, firewire_ohci 11: 196 xen-pirq-xt-pic uhci_hcd:usb1, eth0 12: 107 xen-pirq-xt-pic i8042 14: 2840 xen-pirq-xt-pic ata_piix 15: 0 xen-pirq-xt-pic ata_piix cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel