Andy Smith
2012-Aug-08 19:53 UTC
[Pkg-xen-devel] Bug#684334: xen-hypervisor-4.0-amd64: Does not complete boot of dom0 kernel, extremely slow boot from BIOS RAM map onwards
Package: xen-hypervisor-4.0-amd64 Version: 4.0.1-5.2 Severity: important Tags: patch I have a system based on a Supermicro X8DTH-i motherboard and a Xeon E5506 CPU. It was previously running Debian lenny with xen-hypervisor-3.2-1-amd64 / linux-image-2.6.26-2-xen-amd64 without incident. I upgraded it to Debian squeeze, with xen-hypervisor-4.0-amd64 / linux-image-2.6.32-5-xen-amd64 and now it does not complete a boot of the dom0 kernel. I can boot it into a vanilla (or -xen) kernel directly on bare metal and it seems okay with that. This was originally reported upstream: http://lists.xen.org/archives/html/xen-devel/2012-07/msg01663.html I have since bisected their hg and found that the following changeset fixes the problem for me: http://xenbits.xen.org/hg/xen-4.0-testing.hg/rev/ab1fb1b8b569?revcount=480 I have attached Keir's changeset as fix_bind_irq_vector_destination.patch. Here follows a portion of my dom0 serial console output to the point where booting becomes incredibly slow, and then the later "increasing min_delta_ns" spam, just in case that helps someone searching their own issue. The full logs are available in the xen-devel post linked above. Thanks, Andy (XEN) Dom0 has maximum 4 VCPUs (XEN) Scrubbing Free RAM: ........................................................................................................................................................................................................................................done. (XEN) trace.c:89:d32767 calc_tinfo_first_offset: NR_CPUs 128, offset_in_bytes 258, t_info_first_offset 65 (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 176kB init memory. mapping kernel into physical memory Xen: setup ISA identity maps about to get started... [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-45) (dannf at xxxxxxxxxx) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Sun May 6 08:57:29 UTC 2012 [ 0.000000] Command line: placeholder root=UUID=10aafd06-36d7-4181-a343-c542150957e3 ro console=tty0 console=hvc0 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Centaur CentaurHauls [ 0.000000] released 0 pages of unused memory [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000099000 (usable) [ 0.000000] Xen: 0000000000099000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 00000000bf78e000 - 00000000bf790000 type 9 [ 0.000000] Xen: 00000000bf790000 - 00000000bf79e000 (ACPI data) [ 0.000000] Xen: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS) [ 0.000000] Xen: 00000000bf7d0000 - 00000000bf7e0000 (reserved) [ 0.000000] Xen: 00000000bf7ec000 - 00000000c0000000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000f0000000 (reserved) [ 0.000000] Xen: 00000000fec00000 - 00000000fec01000 (reserved) [ 0.000000] Xen: 00000000fec8a000 - 00000000fec8b000 (reserved) [ 0.000000] Xen: 00000000fec9a000 - 00000000fec9b000 (reserved) [ At this point output stalls and it has appeared to hang, but if I wait a long time (minutes) some more output does appear: [ 0.000000] Xen: 00000000fee00000 - 00000000fee01 I waited tens of minutes and after it gets past the BIOS RAM map bit, booting proceeds at a normal speed until.. [ 113.636747] 3w-9xxx: scsi0: AEN: INFO (0x04:0x000C): Initialize started:unit=0, subunit=1. [10065189777.645584] ------------[ cut here ]------------ [10065189777.645584] ------------[ cut here ]------------ [10065189777.645584] WARNING: at /build/buildd-linux-2.6_2.6.32-45-amd64-FcX7RM/linux-2.6-2.6.32/debian/build/source_amd64_xen/kernel/time/clockevents.c:112 clockevents_program_event+0x26/0x7e() [10065189777.645584] Hardware name: X8DTH-i/6/iF/6F [10065189777.645584] Modules linked in: psmouse serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc joydev pcspkr evdev i2c_i801 i2c_core ioatdma button processor acpi_processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd ehci_hcd 3w_9xxx usbcore nls_base scsi_mod igb dca thermal thermal_sys [last unloaded: scsi_wait_scan] [10065189777.645584] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1 [10065189777.645584] Call Trace: [10065189777.645584] <IRQ> [<ffffffff81070297>] ? clockevents_program_event+0x26/0x7e [10065189777.645584] [<ffffffff81070297>] ? clockevents_program_event+0x26/0x7e [10065189777.645584] [<ffffffff8104ef90>] ? warn_slowpath_common+0x77/0xa3 [10065189777.645584] [<ffffffff81070297>] ? clockevents_program_event+0x26/0x7e [10065189777.645584] [<ffffffff8107131b>] ? tick_dev_program_event+0x2d/0x95 [10065189777.645584] [<ffffffff81068942>] ? hrtimer_interrupt+0x15d/0x18d [10065189777.645584] [<ffffffff8100e9af>] ? xen_timer_interrupt+0x34/0x18d [10065189777.645584] [<ffffffffa00a5ae2>] ? twa_interrupt+0xed/0x5e8 [3w_9xxx] [10065189777.645584] [<ffffffff8100dc7b>] ? drop_other_mm_ref+0x42/0x62 [10065189777.645584] [<ffffffff81095c44>] ? handle_IRQ_event+0x58/0x126 [10065189777.645584] [<ffffffff810973e3>] ? handle_percpu_irq+0x39/0x6a [10065189777.645584] [<ffffffff811f2aea>] ? __xen_evtchn_do_upcall+0x1d0/0x28d [10065189777.645584] [<ffffffff811f334b>] ? xen_evtchn_do_upcall+0x2e/0x42 [10065189777.645584] [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30 [10065189777.645584] <EOI> [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001 [10065189777.645584] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001 [10065189777.645584] [<ffffffff8100e6b3>] ? xen_safe_halt+0xc/0x15 [10065189777.645584] [<ffffffff8100bfc7>] ? xen_idle+0x37/0x40 [10065189777.645584] [<ffffffff81010e97>] ? cpu_idle+0xa2/0xda [10065189777.645584] [<ffffffff81531cdd>] ? start_kernel+0x3dc/0x3e8 [10065189777.645584] [<ffffffff81533c93>] ? xen_start_kernel+0x586/0x58a [10065189777.645584] ---[ end trace 8a7858275426dc3b ]--- [10065189777.645584] CE: xen increasing min_delta_ns to 300000 nsec [10065189777.645584] CE: xen increasing min_delta_ns to 450000 nsec [10065189777.645584] CE: xen increasing min_delta_ns to 675000 nsec [10065189777.645584] CE: xen increasing min_delta_ns to 1012500 nsec I sat watching this for a few minutes and it showed no sign of stopping. The machine is also completely unresponsive aside from printing that to console. -- System Information: Debian Release: 6.0.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash xen-hypervisor-4.0-amd64 depends on no packages. Versions of packages xen-hypervisor-4.0-amd64 recommends: ii xen-utils-4.0 4.0.1-5.2 XEN administrative tools Versions of packages xen-hypervisor-4.0-amd64 suggests: pn xen-docs-4.0 <none> (no description available) -- no debconf information -------------- next part -------------- # HG changeset patch # User Keir Fraser <keir.fraser at citrix.com> # Date 1283154997 -3600 # Node ID ab1fb1b8b569ef78ac352a5ff4524d0fae80945a # Parent 9ff9b57ddddfd3693698ece28ae44542fba6ee0f Fix bind_irq_vector() destination The "mask" covered all online cpus in the "domain". It should be used as destination later, instead of using "domain" directly. Signed-off-by: Sheng Yang <sheng at linux.intel.com> xen-unstable: changeset: 3eb5127e4636 xen-unstable: date: Thu Aug 26 11:16:14 2010 +0100 diff -r 9ff9b57ddddf -r ab1fb1b8b569 xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Mon Aug 30 08:56:07 2010 +0100 +++ b/xen/arch/x86/irq.c Mon Aug 30 08:56:37 2010 +0100 @@ -90,14 +90,14 @@ static int __bind_irq_vector(int irq, in cpus_and(mask, domain, cpu_online_map); if (cpus_empty(mask)) return -EINVAL; - if ((cfg->vector == vector) && cpus_equal(cfg->domain, domain)) + if ((cfg->vector == vector) && cpus_equal(cfg->domain, mask)) return 0; if (cfg->vector != IRQ_VECTOR_UNASSIGNED) return -EBUSY; for_each_cpu_mask(cpu, mask) per_cpu(vector_irq, cpu)[vector] = irq; cfg->vector = vector; - cfg->domain = domain; + cfg->domain = mask; irq_status[irq] = IRQ_USED; if (IO_APIC_IRQ(irq)) irq_vector[irq] = vector;
Apparently Analagous Threads
- Failure to boot, Debian squeeze with 4.0.1 hypervisor, timer problems?
- xen-4.1: PV domain hanging at startup, jiffies stopped
- bnx2 FTQ issues on 2.6.32 + xen 4.0.1;
- Was: Re: [GIT PULL] timer changes for v3.6, Is: Regression introduced by 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
- nouveau 0000:01:00.0: drm_WARN_ON(!found_head)