Current xen-unstable has appearently issues with HVM guest. Within the guest I run either a SLES11SP2 kernel or a 3.7 pvops kernel. dom0 is either SLES11SP2 or also 3.7 pvops. After a first start of pvops guest on a pvops dom0 I see the guest hanging with the backtrace below, after the boot scripts set the real time clock. Once I kill the guest and start it again booting proceeds. After I do a xl migrate domU localhost I can not access the guest again with xl console. xenctx shows the same backtrace as shown below. With a sles11sp2 dom0 + domU I see in guest dmesg ''hpet1: lost 2000 timer interrupts'' (or similar) printed every second. Any idea what could be the reason? The domU.cfg, started with xl create, looks like this: ... name="name" description="something" uuid="a062cabb-5981-4472-9d3b-da7bd8e2594e" memory=512 vcpus=2 serial="pty" builder="hvm" boot="dcn" disk=[ ''file:/some/loop-disk0,hda,w'', ] vif=[ ''bridge=br0,model=rtl8139,type=netfront'' ] vfb = [ ''type=vnc,vncunused=1,keymap=de'' ] on_crash="preserve" ... xenctx output: ... rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 flags: 00000086 s nz p rsp: ffff88001b003df8 rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b000000/0000000000000000 Code (instr addr ffffffff81034b97) 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da d0 00 79 c9 85 db 74 Stack: ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b Call Trace: [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b [<ffffffff8109ab8c>] handle_irq_event+0x3c [<ffffffff8109d79d>] handle_edge_irq+0x6d [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d rip: ffffffff810360d6 native_safe_halt+0x6 flags: 00000246 i z p rsp: ffff88001acbdef8 rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b040000/0000000000000000 Code (instr addr ffffffff810360d6) 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f 84 00 00 00 00 00 55 Stack: ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff810360d6>] native_safe_halt+0x6 <-- [<ffffffff81014735>] default_idle+0x45 [<ffffffff81014a68>] cpu_idle+0x88 [<ffffffff8165824a>] start_secondary+0x188 rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 flags: 00000086 s nz p rsp: ffff88001b003df8 rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b000000/0000000000000000 Code (instr addr ffffffff81034b97) 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da d0 00 79 c9 85 db 74 Stack: ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b Call Trace: [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b [<ffffffff8109ab8c>] handle_irq_event+0x3c [<ffffffff8109d79d>] handle_edge_irq+0x6d [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d rip: ffffffff810360d6 native_safe_halt+0x6 flags: 00000246 i z p rsp: ffff88001acbdef8 rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b040000/0000000000000000 Code (instr addr ffffffff810360d6) 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f 84 00 00 00 00 00 55 Stack: ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff810360d6>] native_safe_halt+0x6 <-- [<ffffffff81014735>] default_idle+0x45 [<ffffffff81014a68>] cpu_idle+0x88 [<ffffffff8165824a>] start_secondary+0x188 rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 flags: 00000086 s nz p rsp: ffff88001b003df8 rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b000000/0000000000000000 Code (instr addr ffffffff81034b97) 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da d0 00 79 c9 85 db 74 Stack: ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b Call Trace: [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b [<ffffffff8109ab8c>] handle_irq_event+0x3c [<ffffffff8109d79d>] handle_edge_irq+0x6d [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d rip: ffffffff810360d6 native_safe_halt+0x6 flags: 00000246 i z p rsp: ffff88001acbdef8 rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b040000/0000000000000000 Code (instr addr ffffffff810360d6) 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f 84 00 00 00 00 00 55 Stack: ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff810360d6>] native_safe_halt+0x6 <-- [<ffffffff81014735>] default_idle+0x45 [<ffffffff81014a68>] cpu_idle+0x88 [<ffffffff8165824a>] start_secondary+0x188 rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 flags: 00000086 s nz p rsp: ffff88001b003df8 rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b000000/0000000000000000 Code (instr addr ffffffff81034b97) 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da d0 00 79 c9 85 db 74 Stack: ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b Call Trace: [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b [<ffffffff8109ab8c>] handle_irq_event+0x3c [<ffffffff8109d79d>] handle_edge_irq+0x6d [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d rip: ffffffff810360d6 native_safe_halt+0x6 flags: 00000246 i z p rsp: ffff88001acbdef8 rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b040000/0000000000000000 Code (instr addr ffffffff810360d6) 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f 84 00 00 00 00 00 55 Stack: ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff810360d6>] native_safe_halt+0x6 <-- [<ffffffff81014735>] default_idle+0x45 [<ffffffff81014a68>] cpu_idle+0x88 [<ffffffff8165824a>] start_secondary+0x188 ... Olaf
On Thu, 2013-01-31 at 17:07 +0000, Olaf Hering wrote:> Current xen-unstable has appearently issues with HVM guest.staging or tested? which changeset? The staging tests are currently failing with a Windows install timeout, Jan wondered if it might be due to the recent RTC changes, the bisector is working on it... Ian.
On Thu, Jan 31, Ian Campbell wrote:> On Thu, 2013-01-31 at 17:07 +0000, Olaf Hering wrote: > > Current xen-unstable has appearently issues with HVM guest. > > staging or tested? which changeset?staging, 26493:6727070b4129 Its the first time since many weeks that I run xen-unstable again, so I cant tell when it started. Olaf
On Thu, Jan 31, Olaf Hering wrote:> On Thu, Jan 31, Ian Campbell wrote: > > > On Thu, 2013-01-31 at 17:07 +0000, Olaf Hering wrote: > > > Current xen-unstable has appearently issues with HVM guest. > > > > staging or tested? which changeset? > > staging, 26493:6727070b4129 > Its the first time since many weeks that I run xen-unstable again, so I > cant tell when it started.It works with xend for some reason, pvops dom0+domU and changeset 26502:d1bf3b21f783. What I see in guest dmesg is shown below. With xl the irq#8 message does not appear. Olaf ... [ 0.832275] udev: starting version 147 [ 1.312097] tsc: Refined TSC clocksource calibration: 2926.333 MHz [ 14.378118] udev: starting version 147 [ 14.469299] input: Xen Virtual Keyboard as /devices/virtual/input/input5 [ 14.469532] input: Xen Virtual Pointer as /devices/virtual/input/input6 [ 14.480053] microcode: CPU0 sig=0x206c2, pf=0x1, revision=0xc [ 14.480626] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr [ 14.546814] microcode: CPU1 sig=0x206c2, pf=0x1, revision=0xc [ 14.601594] NET: Registered protocol family 10 [ 14.725108] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 15.084864] Adding 331772k swap on /dev/xvda2. Priority:-1 extents:1 across:331772k SS [ 18.810217] irq 8: nobody cared (try booting with the "irqpoll" option) [ 18.812058] Pid: 0, comm: swapper/0 Not tainted 3.7.5-9.home_olh_kernel_sles11sp1.1-kernel-linux-3_7 #1 [ 18.812058] Call Trace: [ 18.812058] <IRQ> [<ffffffff8109ce78>] __report_bad_irq+0x38/0xe0 [ 18.812058] [<ffffffff8109d0ef>] note_interrupt+0x1cf/0x210 [ 18.812058] [<ffffffff8109aa77>] handle_irq_event_percpu+0x97/0x170 [ 18.812058] [<ffffffff8109ab8c>] handle_irq_event+0x3c/0x60 [ 18.812058] [<ffffffff8109d79d>] handle_edge_irq+0x6d/0x120 [ 18.812058] [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4/0x280 [ 18.812058] [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a/0x40 [ 18.812058] [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d/0x80 [ 18.812058] <EOI> [<ffffffff810668f3>] ? hrtimer_start+0x13/0x20 [ 18.812058] [<ffffffff810360d6>] ? native_safe_halt+0x6/0x10 [ 18.812058] [<ffffffff81569e8f>] ? cpuidle_idle_call+0x1f/0xc0 [ 18.812058] [<ffffffff81014735>] default_idle+0x45/0x50 [ 18.812058] [<ffffffff81014a68>] cpu_idle+0x88/0xd0 [ 18.812058] [<ffffffff81642428>] rest_init+0x68/0x70 [ 18.812058] [<ffffffff81c7bc8f>] start_kernel+0x2f3/0x3b4 [ 18.812058] [<ffffffff81c7b795>] ? repair_env_string+0x58/0x58 [ 18.812058] [<ffffffff81cb8c36>] ? memblock_reserve+0x2e/0x4e [ 18.812058] [<ffffffff81c7b2d5>] x86_64_start_reservations+0xa5/0xc2 [ 18.812058] [<ffffffff81c7b401>] x86_64_start_kernel+0x10f/0x12a [ 18.812058] [<ffffffff81c7b120>] ? early_idt_handlers+0x120/0x120 [ 18.812058] handlers: [ 18.812058] [<ffffffff81550740>] cmos_interrupt [ 18.812058] Disabling IRQ #8 [ 19.420384] loop: module loaded [ 19.548950] kjournald starting. Commit interval 5 seconds [ 19.548970] EXT3-fs (xvda1): mounted filesystem with ordered data mode [ 59.305223] PM: freeze of devices complete after 2.291 msecs [ 59.305227] suspending xenstore... [ 59.305290] PM: late freeze of devices complete after 0.058 msecs [ 59.309521] PM: noirq freeze of devices complete after 4.227 msecs [ 59.312080] Xen HVM callback vector for event delivery is enabled [ 59.312080] Xen Platform PCI: I/O protocol version 1 [ 59.312080] Grant tables using version 1 layout. [ 59.312080] xen: --> irq=8, pirq=16 [ 59.312080] xen: --> irq=12, pirq=17 [ 59.312080] xen: --> irq=1, pirq=18 [ 59.312080] xen: --> irq=6, pirq=19 [ 59.312080] xen: --> irq=4, pirq=20 [ 59.312080] xen: --> irq=7, pirq=21 [ 59.312080] xen: --> irq=28, pirq=22 [ 59.313545] ata_piix 0000:00:01.1: restoring config space at offset 0x4 (was 0x2800001, writing 0x2800005) [ 59.313545] PM: noirq restore of devices complete after 7.838 msecs [ 59.313545] PM: early restore of devices complete after 0.046 msecs [ 59.319592] pci 0000:00:00.0: calling quirk_passive_release+0x0/0xa0 [ 59.479464] PM: restore of devices complete after 159.921 msecs [ 59.488309] Setting capacity to 2097152 [ 59.498494] Setting capacity to 2097152
>>> On 31.01.13 at 19:46, Olaf Hering <olaf@aepfle.de> wrote: > On Thu, Jan 31, Olaf Hering wrote: > >> On Thu, Jan 31, Ian Campbell wrote: >> >> > On Thu, 2013-01-31 at 17:07 +0000, Olaf Hering wrote: >> > > Current xen-unstable has appearently issues with HVM guest. >> > >> > staging or tested? which changeset? >> >> staging, 26493:6727070b4129 >> Its the first time since many weeks that I run xen-unstable again, so I >> cant tell when it started. > > It works with xend for some reason, pvops dom0+domU and changeset > 26502:d1bf3b21f783. What I see in guest dmesg is shown below. With xl the > irq#8 > message does not appear.Odd - I didn''t observe any problems specifically with SLE guests (which is what I used for testing). In fact I also, subsequent to the series that''s already in staging, figured the problem with the remaining changes left from the submission about half a year ago, and was hoping to post that one soon too (but of course won''t until what''s in is really sane). Looking over the changes again (and without having looked at the Linux side yet again), I think the most likely candidates for reverting are 26457:aa82638d58b0 (x86/HVM: consolidate toggling of RTC IRQ) 26461:78e91e9e4d61 (x86/HVM: generalize IRQ raising on RTC_REG_B writes) Especially the latter I can''t really see to be responsible for apparently superfluous IRQ8 instances, as my understanding is that this would imply repeated REG_B writes (which I specifically didn''t observe with extra debugging code added). Considering that I didn''t see any problems with the patches myself - any chance you could try the individual reverts above as a first step? (Obviously reverting 26461 should be strait forward, but reverting 26457 alone might not be as simple.) Thanks, Jan> ... > [ 0.832275] udev: starting version 147 > [ 1.312097] tsc: Refined TSC clocksource calibration: 2926.333 MHz > [ 14.378118] udev: starting version 147 > [ 14.469299] input: Xen Virtual Keyboard as /devices/virtual/input/input5 > [ 14.469532] input: Xen Virtual Pointer as /devices/virtual/input/input6 > [ 14.480053] microcode: CPU0 sig=0x206c2, pf=0x1, revision=0xc > [ 14.480626] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - > upgrade BIOS or use force_addr=0xaddr > [ 14.546814] microcode: CPU1 sig=0x206c2, pf=0x1, revision=0xc > [ 14.601594] NET: Registered protocol family 10 > [ 14.725108] microcode: Microcode Update Driver: v2.00 > <tigran@aivazian.fsnet.co.uk>, Peter Oruba > [ 15.084864] Adding 331772k swap on /dev/xvda2. Priority:-1 extents:1 > across:331772k SS > [ 18.810217] irq 8: nobody cared (try booting with the "irqpoll" option) > [ 18.812058] Pid: 0, comm: swapper/0 Not tainted > 3.7.5-9.home_olh_kernel_sles11sp1.1-kernel-linux-3_7 #1 > [ 18.812058] Call Trace: > [ 18.812058] <IRQ> [<ffffffff8109ce78>] __report_bad_irq+0x38/0xe0 > [ 18.812058] [<ffffffff8109d0ef>] note_interrupt+0x1cf/0x210 > [ 18.812058] [<ffffffff8109aa77>] handle_irq_event_percpu+0x97/0x170 > [ 18.812058] [<ffffffff8109ab8c>] handle_irq_event+0x3c/0x60 > [ 18.812058] [<ffffffff8109d79d>] handle_edge_irq+0x6d/0x120 > [ 18.812058] [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4/0x280 > [ 18.812058] [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a/0x40 > [ 18.812058] [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d/0x80 > [ 18.812058] <EOI> [<ffffffff810668f3>] ? hrtimer_start+0x13/0x20 > [ 18.812058] [<ffffffff810360d6>] ? native_safe_halt+0x6/0x10 > [ 18.812058] [<ffffffff81569e8f>] ? cpuidle_idle_call+0x1f/0xc0 > [ 18.812058] [<ffffffff81014735>] default_idle+0x45/0x50 > [ 18.812058] [<ffffffff81014a68>] cpu_idle+0x88/0xd0 > [ 18.812058] [<ffffffff81642428>] rest_init+0x68/0x70 > [ 18.812058] [<ffffffff81c7bc8f>] start_kernel+0x2f3/0x3b4 > [ 18.812058] [<ffffffff81c7b795>] ? repair_env_string+0x58/0x58 > [ 18.812058] [<ffffffff81cb8c36>] ? memblock_reserve+0x2e/0x4e > [ 18.812058] [<ffffffff81c7b2d5>] x86_64_start_reservations+0xa5/0xc2 > [ 18.812058] [<ffffffff81c7b401>] x86_64_start_kernel+0x10f/0x12a > [ 18.812058] [<ffffffff81c7b120>] ? early_idt_handlers+0x120/0x120 > [ 18.812058] handlers: > [ 18.812058] [<ffffffff81550740>] cmos_interrupt > [ 18.812058] Disabling IRQ #8 > [ 19.420384] loop: module loaded > [ 19.548950] kjournald starting. Commit interval 5 seconds > [ 19.548970] EXT3-fs (xvda1): mounted filesystem with ordered data mode > [ 59.305223] PM: freeze of devices complete after 2.291 msecs > [ 59.305227] suspending xenstore... > [ 59.305290] PM: late freeze of devices complete after 0.058 msecs > [ 59.309521] PM: noirq freeze of devices complete after 4.227 msecs > [ 59.312080] Xen HVM callback vector for event delivery is enabled > [ 59.312080] Xen Platform PCI: I/O protocol version 1 > [ 59.312080] Grant tables using version 1 layout. > [ 59.312080] xen: --> irq=8, pirq=16 > [ 59.312080] xen: --> irq=12, pirq=17 > [ 59.312080] xen: --> irq=1, pirq=18 > [ 59.312080] xen: --> irq=6, pirq=19 > [ 59.312080] xen: --> irq=4, pirq=20 > [ 59.312080] xen: --> irq=7, pirq=21 > [ 59.312080] xen: --> irq=28, pirq=22 > [ 59.313545] ata_piix 0000:00:01.1: restoring config space at offset 0x4 > (was 0x2800001, writing 0x2800005) > [ 59.313545] PM: noirq restore of devices complete after 7.838 msecs > [ 59.313545] PM: early restore of devices complete after 0.046 msecs > [ 59.319592] pci 0000:00:00.0: calling quirk_passive_release+0x0/0xa0 > [ 59.479464] PM: restore of devices complete after 159.921 msecs > [ 59.488309] Setting capacity to 2097152 > [ 59.498494] Setting capacity to 2097152 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
>>> On 04.02.13 at 09:21, "Jan Beulich" <JBeulich@suse.com> wrote: > Looking over the changes again (and without having looked at > the Linux side yet again), I think the most likely candidates for > reverting are > > 26457:aa82638d58b0 (x86/HVM: consolidate toggling of RTC IRQ) > 26461:78e91e9e4d61 (x86/HVM: generalize IRQ raising on RTC_REG_B writes)Actually, one possibility occurred to me right after sending the earlier response - could you give the patch below a try? Jan --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -55,6 +55,8 @@ static void rtc_toggle_irq(RTCState *s) struct domain *d = vrtc_domain(s); ASSERT(spin_is_locked(&s->lock)); + if ( s->hw.cmos_data[RTC_REG_C] & RTC_IRQF ) + return; s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF; hvm_isa_irq_deassert(d, RTC_IRQ); hvm_isa_irq_assert(d, RTC_IRQ);
On Mon, Feb 04, Jan Beulich wrote:> >>> On 04.02.13 at 09:21, "Jan Beulich" <JBeulich@suse.com> wrote: > > Looking over the changes again (and without having looked at > > the Linux side yet again), I think the most likely candidates for > > reverting are > > > > 26457:aa82638d58b0 (x86/HVM: consolidate toggling of RTC IRQ) > > 26461:78e91e9e4d61 (x86/HVM: generalize IRQ raising on RTC_REG_B writes) > > Actually, one possibility occurred to me right after sending the > earlier response - could you give the patch below a try?This patch leads also to a hang. The initial boot of a HVM guest hangs in "Set System Time to the current Hardware Clock", with the same backtrace in hpet_rtc_interrupt. If the HVM guest is later migrated to localhost, the new guest hangs also in hpet_rtc_interrupt. I''m attaching the xen, dom0 and domU logs from a pvops kernel. sles kernel will follow. I will also try to revert the two changeset above. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 31.01.13 at 18:07, Olaf Hering <olaf@aepfle.de> wrote:> xenctx output:I''m confused by this: For one, the mere fact that hpet_rtc_interrupt() is in use suggests that the kernel uses RTC emulation (and hence wouldn''t depend on any of the RTC registers we emulate to be set correctly). The only issue then could be that we''re raising way too many IRQ8-s. While that could explain hangs (due to the IRQ getting shut off by the kernel), has two other inconsistencies in itself: (a) hpet_rtc_interrupt() unconditionally returns IRQ_HANDLED, and hence the interrupt should never be regarded as spurious by the kernel (and hence never be shut off). (b) Once turned off (which we see happening in the kernel log you posted subsequently), hpet_rtc_interrupt() cannot possibly get onto the call stack at all anymore, hence a hang _here_ is impossible. So one question to you would be to clarify which hangs you see where under what conditions (I''m particularly unclear about the exact difference in behavior that you observe with xend vs xl). And then there is a problem with the xenctx output below, assuming they were taken subsequently on the same guest instance while in that hung state:> ... > rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 > flags: 00000086 s nz p > rsp: ffff88001b003df8 > rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a > rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 > rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 > r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b000000/0000000000000000 > Code (instr addr ffffffff81034b97) > 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da > d0 00 79 c9 85 db 74 > > > Stack: > ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 > ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc > ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 > 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b > > Call Trace: > [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- > [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc > [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b > [<ffffffff8109ab8c>] handle_irq_event+0x3c > [<ffffffff8109d79d>] handle_edge_irq+0x6d > [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 > [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a > [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d > rip: ffffffff810360d6 native_safe_halt+0x6 > flags: 00000246 i z p > rsp: ffff88001acbdef8 > rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 > rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 > rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 > r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b040000/0000000000000000 > Code (instr addr ffffffff810360d6) > 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f > 84 00 00 00 00 00 55 > > > Stack: > ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 > 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > [<ffffffff810360d6>] native_safe_halt+0x6 <-- > [<ffffffff81014735>] default_idle+0x45 > [<ffffffff81014a68>] cpu_idle+0x88 > [<ffffffff8165824a>] start_secondary+0x188 > > rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 > flags: 00000086 s nz p > rsp: ffff88001b003df8 > rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a > rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 > rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 > r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b000000/0000000000000000 > Code (instr addr ffffffff81034b97) > 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da > d0 00 79 c9 85 db 74 > > > Stack: > ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 > ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc > ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 > 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b > > Call Trace: > [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- > [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc > [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b > [<ffffffff8109ab8c>] handle_irq_event+0x3c > [<ffffffff8109d79d>] handle_edge_irq+0x6d > [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 > [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a > [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d > rip: ffffffff810360d6 native_safe_halt+0x6 > flags: 00000246 i z p > rsp: ffff88001acbdef8 > rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 > rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 > rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 > r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b040000/0000000000000000 > Code (instr addr ffffffff810360d6) > 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f > 84 00 00 00 00 00 55 > > > Stack: > ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 > 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > [<ffffffff810360d6>] native_safe_halt+0x6 <-- > [<ffffffff81014735>] default_idle+0x45 > [<ffffffff81014a68>] cpu_idle+0x88 > [<ffffffff8165824a>] start_secondary+0x188 > > rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 > flags: 00000086 s nz p > rsp: ffff88001b003df8 > rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a > rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 > rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 > r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b000000/0000000000000000 > Code (instr addr ffffffff81034b97) > 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da > d0 00 79 c9 85 db 74 > > > Stack: > ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 > ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc > ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 > 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b > > Call Trace: > [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- > [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc > [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b > [<ffffffff8109ab8c>] handle_irq_event+0x3c > [<ffffffff8109d79d>] handle_edge_irq+0x6d > [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 > [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a > [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d > rip: ffffffff810360d6 native_safe_halt+0x6 > flags: 00000246 i z p > rsp: ffff88001acbdef8 > rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 > rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 > rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 > r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b040000/0000000000000000 > Code (instr addr ffffffff810360d6) > 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f > 84 00 00 00 00 00 55 > > > Stack: > ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 > 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > [<ffffffff810360d6>] native_safe_halt+0x6 <-- > [<ffffffff81014735>] default_idle+0x45 > [<ffffffff81014a68>] cpu_idle+0x88 > [<ffffffff8165824a>] start_secondary+0x188 > > rip: ffffffff81034b97 hpet_rtc_interrupt+0x97 > flags: 00000086 s nz p > rsp: ffff88001b003df8 > rax: 00000000bc004a66 rcx: 00000000000ee6b2 rdx: 00000000ba6b145a > rbx: 000000000000087d rsi: ffff88001aec5000 rdi: 0000000000000008 > rbp: ffff88001b003e68 r8: ffff88001ac06800 r9: ffff88001b003e38 > r10: 000000000000008c r11: 000000000000f800 r12: ffff88001ac06884 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000008 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b000000/0000000000000000 > Code (instr addr ffffffff81034b97) > 00 89 10 48 8b 05 74 da d0 00 83 c3 01 48 05 f0 00 00 00 8b 00 <3b> 05 d3 da > d0 00 79 c9 85 db 74 > > > Stack: > ffff88001b003e68 ffff88001aec5000 0000000000000000 0000000800000000 > ffff88001ad49c68 0000000000000000 00000008156570c6 ffffffff8165e9cc > ffff88001b0125c0 ffff880019278480 ffff88001ac06884 0000000000000000 > 0000000000000000 0000000000000008 ffff88001b003eb8 ffffffff8109aa2b > > Call Trace: > [<ffffffff81034b97>] hpet_rtc_interrupt+0x97 <-- > [<ffffffff8165e9cc>] _raw_spin_unlock_irqrestore+0xc > [<ffffffff8109aa2b>] handle_irq_event_percpu+0x4b > [<ffffffff8109ab8c>] handle_irq_event+0x3c > [<ffffffff8109d79d>] handle_edge_irq+0x6d > [<ffffffff8142b474>] __xen_evtchn_do_upcall+0x1a4 > [<ffffffff8142ca6a>] xen_evtchn_do_upcall+0x2a > [<ffffffff81660aad>] xen_hvm_callback_vector+0x6d > rip: ffffffff810360d6 native_safe_halt+0x6 > flags: 00000246 i z p > rsp: ffff88001acbdef8 > rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 > rbx: ffff88001acbc010 rsi: 0140000000000000 rdi: 0000000000000086 > rbp: ffff88001acbdef8 r8: 0000000000000000 r9: 0000000000000000 > r10: 0000000000000001 r11: 0000000000000000 r12: ffffffff81c62330 > r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 > cs: 0010 ss: 0018 ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff88001b040000/0000000000000000 > Code (instr addr ffffffff810360d6) > 48 89 e5 fb c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <c9> c3 0f 1f > 84 00 00 00 00 00 55 > > > Stack: > ffff88001acbdf08 ffffffff81014735 ffff88001acbdf28 ffffffff81014a68 > 0000000000000001 0000000000000000 ffff88001acbdf48 ffffffff8165824a > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > Call Trace: > [<ffffffff810360d6>] native_safe_halt+0x6 <-- > [<ffffffff81014735>] default_idle+0x45 > [<ffffffff81014a68>] cpu_idle+0x88 > [<ffffffff8165824a>] start_secondary+0x188 > ...One of the vCPU-s (presumably vCPU0, but that doesn''t really matter) has an _invariant_ register state, yet the loop it''s in guarantees at least %ebx to change on each iteration (and one would expect %eax to change too). Which would imply the guest isn''t making forward progress at all anymore. With interrupts being disabled during that code sequence, this also can''t mean the guest is completely busy servicing interrupts - it must be getting no execution time anymore. Which in turn would mean either the vCPU is no longer in running state, or Xen itself is hung. As you didn''t indicate the latter, could you check the former? Finally, while I don''t see a connection (namely because I don''t think periodic interrupts would be in use by the guest here, which is what this change to a large part revolves around), you might nevertheless want to try to apply the last outstanding change (attached) as an alternative to reverting. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[ sorry, had this reply open in another window and forgot to send it ] On Mon, Feb 04, Olaf Hering wrote:> I''m attaching the xen, dom0 and domU logs from a pvops kernel. sles > kernel will follow. > > I will also try to revert the two changeset above.These are the logs from sles11sp2 kernel in dom0+domU. The guest prints "hpet1: lost 2201 rtc interrupts". Xen has changeset 26461 reverted, but this does at least not solve the "hpet1: lost 2201 rtc interrupts" interrupt messages. Olaf PS: xl migrate with sles11 kernel does not seem to work for other unrelated reasons: root@satriani:~/hpet # xl -v -v migrate -M 3 -m 2 -N sles11sp2_full_minimal_nfsroot_bug694863 localhost migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/657) Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/657) Savefile contains xl domain config xc: error: Can''t create lock file for suspend event channel /var/lib/xen/suspend_evtchn_2_lock.d: Internal error libxl: warning: libxl_dom.c:1268:libxl__domain_suspend: Suspend event channel initialization failed xc: Reloading memory pages: 105472/1048575 10%^Axc: detail: type fail: page 0 mfn 000f2000 xc: detail: type fail: page 1 mfn 000f2001 xc: detail: type fail: page 2 mfn 000f2002 xc: detail: delta 5993ms, dom0 20%, target 6%, sent 722Mb/s, dirtied 94Mb/s 17369 pages xc: detail: type fail: page 861 mfn 000f2001 xc: detail: delta 545ms, dom0 37%, target 4%, sent 1036Mb/s, dirtied 8Mb/s 147 pages xc: detail: type fail: page 133 mfn 000f2000 xc: detail: type fail: page 134 mfn 000f2002 xc: error: Live migration aborted, as requested. (guest too busy?) total_sent 149477 iter 3, max_iters 3 max_factor 2: Internal error xc: detail: Save exit rc=1 libxl: error: libxl_dom.c:1306:libxl__xc_domain_save_done: saving domain: domain did not respond to suspend request: No such device migration sender: libxl_domain_suspend failed (rc=-8) xc: error: 0-length read: Internal error xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error xc: error: Error when reading batch size (0 = Success): Internal error xc: error: Error when reading batch (0 = Success): Internal error libxl: error: libxl_create.c:787:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable libxl: error: libxl_create.c:869:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: error: libxl.c:1395:libxl__destroy_domid: non-existant domain 3 libxl: error: libxl.c:1359:domain_destroy_callback: unable to destroy guest with domid 3 libxl: error: libxl_create.c:1177:domcreate_destruction_cb: unable to destroy domain 3 following failed creation migration target: Domain creation failed (code -3). libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration target process [5218] exited with error status 3 Migration failed, failed to suspend at sender. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 04.02.13 at 15:03, Olaf Hering <olaf@aepfle.de> wrote: > [ sorry, had this reply open in another window and forgot to send it ] > > On Mon, Feb 04, Olaf Hering wrote: > >> I''m attaching the xen, dom0 and domU logs from a pvops kernel. sles >> kernel will follow. >> >> I will also try to revert the two changeset above. > > These are the logs from sles11sp2 kernel in dom0+domU. > The guest prints "hpet1: lost 2201 rtc interrupts". > > Xen has changeset 26461 reverted, but this does at least not solve the > "hpet1: lost 2201 rtc interrupts" interrupt messages.Ugly. But I don''t see the "IRQ8: nobody cared" anymore, and so far I had assumed the two are connected. Are you going to try reverting 26457:aa82638d58b0 (alone, or perhaps the full range 26457-26461) next? That would at least tell us whether similar reverts in the staging tree would have a chance of getting us out of the currently stuck state again. Jan
On Mon, Feb 04, Jan Beulich wrote:> So one question to you would be to clarify which hangs you see > where under what conditions (I''m particularly unclear about the > exact difference in behavior that you observe with xend vs xl).This is a list of dom0+domU combinations with plain 26502:d1bf3b21f783: pvops dom0 + pvops domU - very first domU hangs in hpet_rtc_interrupt - xl migrate domU localhost migration works, but new domU hangs in hpet_rtc_interrupt sles11 dom0 + sles11 domU - guest prints "hpet1: lost 2201 rtc interrupts" - xm migrate works I will retest more combinations with the source state above. Now I tried x86-HVM-RTC.patch, and the result is that the very first pvops guest boots fine, and also two xl migrate to localhost work fine! I will do some more testing to see if its consistent. Olaf
On Mon, Feb 04, Olaf Hering wrote:> Now I tried x86-HVM-RTC.patch, and the result is that the very first > pvops guest boots fine, and also two xl migrate to localhost work fine! > I will do some more testing to see if its consistent.So two cycles of pvops dom0 show that the first guest starts fine. xl create/migrate works fine as well. Later I did shutdown the xl guest, started xend and ran the sles11 kernel in the guest, no more hpet1 messages. Now I''m going to boot into sles11 dom0 and see if the hpet1 messsages are really gone. So I think the x86-HVM-RTC.patch works well for me. Olaf
On Mon, Feb 04, Olaf Hering wrote:> Now I''m going to boot into sles11 dom0 and see if the hpet1 messsages > are really gone.With sles11 dom0+domU there are no more hpet1 messages when started from xend. With xl however I now get a hanging guest, which I did not see before this patch. But this happens only if it is booted with quiet, with ''debug ignore_loglevel'' booting works fine. Also xl list shows the domain in paused state?! I will try once more with sles11 kernel, xl and the RTC patch reverted.. Olaf xl dump-core shows just this in dmesg: <6>[ 0.000000] Initializing cgroup subsys cpu <5>[ 0.000000] Linux version 3.0.51-0.7.9-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Thu Nov 29 22:12:17 UTC 2012 (f3be9d0) <6>[ 0.000000] Command line: root=bax.arch.suse.de:/olaf_xenimages/bug694863/nfsroot console=ttyS0,115200 quiet log_buf_len=64M memblock=debug <6>[ 0.000000] BIOS-provided physical RAM map: <6>[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e000 (usable) <6>[ 0.000000] BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved) <6>[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) <6>[ 0.000000] BIOS-e820: 0000000000100000 - 000000001f800000 (usable) <6>[ 0.000000] BIOS-e820: 00000000fc000000 - 0000000100000000 (reserved) <6>[ 0.000000] NX (Execute Disable) protection: active <6>[ 0.000000] DMI 2.4 present. <7>[ 0.000000] DMI: Xen HVM domU, BIOS 4.3.26502-20130204 02/04/2013 <7>[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) <7>[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) <6>[ 0.000000] No AGP bridge found <6>[ 0.000000] last_pfn = 0x1f800 max_arch_pfn = 0x400000000 <7>[ 0.000000] MTRR default type: write-back <7>[ 0.000000] MTRR fixed ranges enabled: <7>[ 0.000000] 00000-9FFFF write-back <7>[ 0.000000] A0000-BFFFF write-combining <7>[ 0.000000] C0000-FFFFF write-back <7>[ 0.000000] MTRR variable ranges enabled: <7>[ 0.000000] 0 base 00F0000000 mask FFF8000000 uncachable <7>[ 0.000000] 1 base 00F8000000 mask FFFC000000 uncachable <7>[ 0.000000] 2 disabled <7>[ 0.000000] 3 disabled <7>[ 0.000000] 4 disabled <7>[ 0.000000] 5 disabled <7>[ 0.000000] 6 disabled <7>[ 0.000000] 7 disabled <6>[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 <6>[ 0.000000] found SMP MP-table at [ffff8800000fbba0] fbba0 <6>[ 0.000000] memblock_x86_reserve_range: [0x000fbba0-0x000fbbaf] * MP-table mpf <6>[ 0.000000] memblock_x86_reserve_range: [0x000fbbb0-0x000fbc93] * MP-table mpc <6>[ 0.000000] memblock_x86_reserve_range: [0x01fd7000-0x01fd70d7] BRK <6>[ 0.000000] MEMBLOCK configuration: <6>[ 0.000000] memory size = 0x1f78e000 <6>[ 0.000000] memory.cnt = 0x2 <6>[ 0.000000] memory[0x0] [0x00000000010000-0x0000000009dfff], 0x8e000 bytes <6>[ 0.000000] memory[0x1] [0x00000000100000-0x0000001f7fffff], 0x1f700000 bytes <6>[ 0.000000] reserved.cnt = 0x3 <6>[ 0.000000] reserved[0x0] [0x0000000009e000-0x000000000fffff], 0x62000 bytes <6>[ 0.000000] reserved[0x1] [0x00000001000000-0x00000001fd70d7], 0xfd70d8 bytes <6>[ 0.000000] reserved[0x2] [0x0000001f2d0000-0x0000001f7effff], 0x520000 bytes <7>[ 0.000000] initial memory mapped : 0 - 20000000 <6>[ 0.000000] memblock_x86_reserve_range: [0x00099000-0x0009dfff] TRAMPOLINE <7>[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 20480 <6>[ 0.000000] init_memory_mapping: 0000000000000000-000000001f800000 <7>[ 0.000000] 0000000000 - 001f800000 page 2M <7>[ 0.000000] kernel direct mapping tables up to 0x1f7fffff @ [mem 0x1f7fe000-0x1f7fffff] root@satriani:~ # /usr/lib64/xen/bin/xenctx -C -s /boot/System.map-3.0.51-0.7.9-default `xl domid sles11sp2_full_minimal_nfsroot_bug694863 2>&1` rip: ffffffff812a9543 acpi_os_write_port+0xe flags: 00000246 i z p rsp: ffff880019629468 rax: 0000000000000091 rcx: 0000000000000001 rdx: 000000000000b044 rbx: 0000000000000000 rsi: 0000000000000091 rdi: 000000000000b044 rbp: 000000000000b044 r8: 0000000000000002 r9: ffffffff817b9a2f r10: ffff88001ac20300 r11: ffffffff00000001 r12: 0000000000000008 r13: ffff880019629678 r14: ffffffff812c3144 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b000000/0000000000000000 Code (instr addr ffffffff812a9543) eb fe 59 31 c0 5e c3 83 fa 08 89 f0 77 09 40 0f b6 c6 89 fa ee <eb> 1c 83 fa 10 77 09 0f b7 c6 89 Stack: ffffffff812c31d3 ffff880000000008 ffff880000000001 0000000000000000 ffffffff0000b044 ffff8800196106a8 0000000000000000 0000000000000000 000000000000b044 ffff880019610f60 ffff880019622150 ffff880019610f18 ffffffff812babf1 ffff880019622150 ffffffff812c3144 0000000000000000 Call Trace: [<ffffffff812a9543>] acpi_os_write_port+0xe <-- [<ffffffff812c31d3>] acpi_ex_system_io_space_handler+0x8f [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812babf1>] acpi_ev_address_space_dispatch+0x1d5 [<ffffffff812c3144>] acpi_ex_system_io_space_handler [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<ffffffff812c0158>] acpi_ex_access_region+0x118 [<0000000000000001>] irq_stack_union+0x1 [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<ffffffff812c02b9>] acpi_ex_field_datum_io+0xf2 [<0000000000000000>] irq_stack_union [<0000000000000008>] irq_stack_union+0x8 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812c077f>] acpi_ex_write_with_update_rule+0x130 [<0000000000000091>] irq_stack_union+0x91 [<0000000000000091>] irq_stack_union+0x91 [<0000000000000011>] irq_stack_union+0x11 [<0000000000000008>] irq_stack_union+0x8 [<00000000000000ff>] irq_stack_union+0xff [<0000000000000000>] irq_stack_union [<ffffffff812c0979>] acpi_ex_insert_into_field+0x1ed [<0000000000000202>] irq_stack_union+0x202 [<0000000000000000>] irq_stack_union [<ffffffff8113b8f2>] kmem_cache_alloc+0xe2 [<0000000000000091>] irq_stack_union+0x91 [<0000000000000011>] irq_stack_union+0x11 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000008>] irq_stack_union+0x8 [<ffffffff812be963>] acpi_ex_write_data_to_field+0x277 [<0000000000000000>] irq_stack_union [<0000000000000011>] irq_stack_union+0x11 [<ffffffff812c350d>] acpi_ex_store_object_to_node+0x109 [<0000000000000002>] irq_stack_union+0x2 [<0000000000000000>] irq_stack_union [<0000000000000004>] irq_stack_union+0x4 [<0000000000000000>] irq_stack_union [<ffffffff812c11a9>] acpi_ex_opcode_1A_1T_1R+0x371 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000004>] irq_stack_union+0x4 [<ffffffff812b62de>] acpi_ds_exec_end_op+0xec [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812ce047>] acpi_ps_parse_loop+0x32b [<000000000000000a>] irq_stack_union+0xa [<0000000000000009>] irq_stack_union+0x9 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812cce4e>] acpi_ps_parse_aml+0x105 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<ffffffff812ce7e6>] acpi_ps_execute_method+0x213 [<ffffffff812c895d>] acpi_ns_evaluate+0x189 [<0000000000000004>] irq_stack_union+0x4 [<ffffffff812d1fed>] acpi_ut_evaluate_object+0x69 [<0000000000000060>] irq_stack_union+0x60 [<0000000000000001>] irq_stack_union+0x1 [<0000000000000060>] irq_stack_union+0x60 [<ffffffff812d2258>] acpi_ut_execute_STA+0x24 [<ffffffff812c9832>] acpi_get_object_info+0x167 [<0000000000000006>] irq_stack_union+0x6 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812c9858>] acpi_get_object_info+0x18d [<ffffffff812a9cdc>] acpi_os_wait_semaphore+0xce [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<ffffffff812acd13>] do_acpi_find_child+0x16 [<0000000000000000>] irq_stack_union [<ffffffff812cadc5>] acpi_ns_walk_namespace+0xc4 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812accfd>] do_acpi_find_child [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812accfd>] do_acpi_find_child [<0000000000000001>] irq_stack_union+0x1 [<0000000000000006>] irq_stack_union+0x6 [<ffffffff812c8079>] acpi_walk_namespace+0x80 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff812accf3>] acpi_get_child+0x42 [<0000000000000000>] irq_stack_union [<ffffffff81247f76>] kobject_add+0x46 [<ffffffff81284368>] acpi_pci_find_device+0x28 [<ffffffff812acbb0>] acpi_platform_notify+0xb9 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff8131aaa3>] device_add+0xe3 [<ffffffff8126657d>] pci_bus_add_device+0x1d [<ffffffff812665ec>] pci_bus_add_devices+0x3c [<ffffffff812b0dec>] acpi_pci_root_start+0x14 [<ffffffff812aceb7>] acpi_start_single_object+0x21 [<0000000000000000>] irq_stack_union [<ffffffff812ad041>] acpi_device_probe+0xab [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff8131d23a>] really_probe+0x7a [<0000000000000000>] irq_stack_union [<ffffffff8131d483>] driver_probe_device+0x63 [<ffffffff8131d573>] __driver_attach+0x93 [<0000000000000000>] irq_stack_union [<ffffffff8131d4e0>] __driver_attach [<ffffffff8131c8c8>] bus_for_each_dev+0x58 [<ffffffff8131c065>] bus_add_driver+0x155 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff8131dbc9>] driver_register+0x79 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff81c1c3ce>] acpi_pci_root_init [<0000000000000000>] irq_stack_union [<ffffffff81c1c3f1>] acpi_pci_root_init+0x23 [<0000000000000000>] irq_stack_union [<ffffffff810001cb>] do_one_initcall+0x3b [<000000000000000f>] irq_stack_union+0xf [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff81be3893>] kernel_init+0x241 [<0000000000000000>] irq_stack_union [<0000000000000001>] irq_stack_union+0x1 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff8144db44>] kernel_thread_helper+0x4 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<ffffffff81be3652>] kernel_init [<0000000000000000>] irq_stack_union [<ffffffff8144db40>] kernel_thread_helper [<0000000000000010>] irq_stack_union+0x10 [<0000000000000202>] irq_stack_union+0x202 [<0000000000000018>] irq_stack_union+0x18 rip: ffffffff8102a0b2 native_safe_halt+0x2 flags: 00000246 i z p rsp: ffff8800196a3f10 rax: 0000000000000000 rcx: 00000000ffffffff rdx: 0000000000000000 rbx: ffff8800196a2010 rsi: 0000000000000001 rdi: ffffffff81d29108 rbp: 0000000000000001 r8: 0000000000000000 r9: ffff88001974d600 r10: 0000000000000000 r11: ffffffff81047df0 r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000 cs: 0010 ss: 0018 ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff88001b020000/0000000000000000 Code (instr addr ffffffff8102a0b2) 00 00 00 fb c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 fb f4 <c3> 66 66 66 66 2e 0f 1f 84 00 00 Stack: ffffffff8100ad45 0000000000000000 ffff8800196a2010 ffffffff81bc8d40 ffffffff810020e6 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff8102a0b2>] native_safe_halt+0x2 <-- [<ffffffff8100ad45>] default_idle+0x145 [<0000000000000000>] irq_stack_union [<ffffffff810020e6>] cpu_idle+0x66 [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union [<0000000000000000>] irq_stack_union root@satriani:~ #
On Mon, 2013-02-04 at 15:27 +0000, Olaf Hering wrote:> On Mon, Feb 04, Olaf Hering wrote: > > > Now I''m going to boot into sles11 dom0 and see if the hpet1 messsages > > are really gone. > > With sles11 dom0+domU there are no more hpet1 messages when started from > xend. > > > With xl however I now get a hanging guest, which I did not see before this > patch. But this happens only if it is booted with quiet, with ''debug > ignore_loglevel'' booting works fine. > Also xl list shows the domain in paused state?![...]> [<ffffffff812a9543>] acpi_os_write_port+0xe <--I expect this will have trapped to qemu, which would explain why the vcpu is paused, but not why qemu appears to not be responding. Maybe something in the qemu logs will shed some light?> [<ffffffff812c31d3>] acpi_ex_system_io_space_handler+0x8f > [<0000000000000000>] irq_stack_union > [<0000000000000000>] irq_stack_union > [<0000000000000000>] irq_stack_union > [<ffffffff812babf1>] acpi_ev_address_space_dispatch+0x1d5 > [<ffffffff812c3144>] acpi_ex_system_io_space_handler[...]
On Mon, Feb 04, Ian Campbell wrote:> On Mon, 2013-02-04 at 15:27 +0000, Olaf Hering wrote: > > On Mon, Feb 04, Olaf Hering wrote: > > > > > Now I''m going to boot into sles11 dom0 and see if the hpet1 messsages > > > are really gone. > > > > With sles11 dom0+domU there are no more hpet1 messages when started from > > xend. > > > > > > With xl however I now get a hanging guest, which I did not see before this > > patch. But this happens only if it is booted with quiet, with ''debug > > ignore_loglevel'' booting works fine. > > Also xl list shows the domain in paused state?! > > [...] > > [<ffffffff812a9543>] acpi_os_write_port+0xe <-- > > I expect this will have trapped to qemu, which would explain why the > vcpu is paused, but not why qemu appears to not be responding. Maybe > something in the qemu logs will shed some light?I was running xenctx in another window, and it leads to this sort of hang. Without the xenctx monitoring booting works fine. watch -n 0,1 ''/usr/lib64/xen/bin/xenctx -C -s /boot/System.map-3.0.51-0.7.9-default \ `xl domid sles11sp2_full_minimal_nfsroot_bug694863 2>&1` 2>&1'' Olaf
On Mon, 2013-02-04 at 15:50 +0000, Olaf Hering wrote:> On Mon, Feb 04, Ian Campbell wrote: > > > On Mon, 2013-02-04 at 15:27 +0000, Olaf Hering wrote: > > > On Mon, Feb 04, Olaf Hering wrote: > > > > > > > Now I''m going to boot into sles11 dom0 and see if the hpet1 messsages > > > > are really gone. > > > > > > With sles11 dom0+domU there are no more hpet1 messages when started from > > > xend. > > > > > > > > > With xl however I now get a hanging guest, which I did not see before this > > > patch. But this happens only if it is booted with quiet, with ''debug > > > ignore_loglevel'' booting works fine. > > > Also xl list shows the domain in paused state?! > > > > [...] > > > [<ffffffff812a9543>] acpi_os_write_port+0xe <-- > > > > I expect this will have trapped to qemu, which would explain why the > > vcpu is paused, but not why qemu appears to not be responding. Maybe > > something in the qemu logs will shed some light? > > I was running xenctx in another window, and it leads to this sort of > hang.That is, erm, interesting! Ian.