thr3ads.net - Xen devel - [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Andreas Kinzler

2010-Sep-09 09:20 UTC

[Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

I am talking a while (via email) with Jan now to track the following 
problem and he suggested that I report the problem on xen-devel:

Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI 
hang ?
Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
Jul  9 01:49:10 virt kernel: Calling adapter init
Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not 
guaranteed on shared IRQs
Jul  9 01:49:49 virt kernel: Acquiring adapter information
Jul  9 01:49:49 virt kernel: update_interval=30:00 check_interval=86400s
Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous 
command timed out.
Jul  9 01:53:13 virt kernel: Usually a result of a PCI interrupt routing 
problem;
Jul  9 01:53:13 virt kernel: update mother board BIOS or consider 
utilizing one of
Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic etc)

After the VMs have been running a while the aacraid driver reports a 
non-responding RAID controller. Most of the time the NIC is also no 
longer working.
I nearly tried every combination of dom0 kernel (pvops0, xenfied suse 
2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen 
hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
No success in two month. Every combination earlier or later had the 
problem shown above. I did extensive tests to make sure that the 
hardware is OK. And it is - I am sure it is a Xen/dom0 problem.

Jan suggested to try the fix in c/s 22051 but it did not help. My answer 
to him:

 > In the meantime I did try xen-unstable c/s 22068 (contains staging 
c/s 22051) and
 > it did not fix the problem at all. I was able to fix a problem with 
the serial console
 > and so I got some debug info that is attached to this email. The 
following line looks
 > suspicious to me (irr=1, delivery_status=1):

 > (XEN)     IRQ 16 Vec216:
 > (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1, 
dest_mode=logical,
 >             delivery_status=1, polarity=1, irr=1, trigger=level, 
mask=0, dest_id:1

 > IRQ 16 is the aacraid controller which after some while seems to be 
enable to receive
 > interrupts. Can you see from the debug info what is going on?

I also applied a small patch which disables HPET broadcast. The machine 
is now running
for 110 hours without a crash while normally it crashes within a few 
minutes. Is there
something wrong (race, deadlock) with HPET broadcasts in relation to 
blocked interrupt
reception (see above)?

Andreas




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2010-Sep-21 11:56 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On Thu, Sep 09, 2010 at 11:20:51AM +0200, Andreas Kinzler
wrote:>  I am talking a while (via email) with Jan now to track the following  
> problem and he suggested that I report the problem on xen-devel:
>
> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI  
> hang ?
> Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
> Jul  9 01:49:10 virt kernel: Calling adapter init
> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not  
> guaranteed on shared IRQs
> Jul  9 01:49:49 virt kernel: Acquiring adapter information
> Jul  9 01:49:49 virt kernel: update_interval=30:00 check_interval=86400s
> Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous  
> command timed out.
> Jul  9 01:53:13 virt kernel: Usually a result of a PCI interrupt routing  
> problem;
> Jul  9 01:53:13 virt kernel: update mother board BIOS or consider  
> utilizing one of
> Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic etc)
>
> After the VMs have been running a while the aacraid driver reports a  
> non-responding RAID controller. Most of the time the NIC is also no  
> longer working.
> I nearly tried every combination of dom0 kernel (pvops0, xenfied suse  
> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen  
> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
> No success in two month. Every combination earlier or later had the  
> problem shown above. I did extensive tests to make sure that the  
> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>
> Jan suggested to try the fix in c/s 22051 but it did not help. My answer  
> to him:
>
> > In the meantime I did try xen-unstable c/s 22068 (contains staging c/s
> 22051) and
> > it did not fix the problem at all. I was able to fix a problem with  
> the serial console
> > and so I got some debug info that is attached to this email. The  
> following line looks
> > suspicious to me (irr=1, delivery_status=1):
>
> > (XEN)     IRQ 16 Vec216:
> > (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,  
> dest_mode=logical,
> >             delivery_status=1, polarity=1, irr=1, trigger=level,  
> mask=0, dest_id:1
>
> > IRQ 16 is the aacraid controller which after some while seems to be  
> enable to receive
> > interrupts. Can you see from the debug info what is going on?
>
> I also applied a small patch which disables HPET broadcast. The machine  
> is now running
> for 110 hours without a crash while normally it crashes within a few  
> minutes. Is there
> something wrong (race, deadlock) with HPET broadcasts in relation to  
> blocked interrupt
> reception (see above)?
>
Hello,

What kind of hardware does this happen on?

Should this patch be merged? 

-- Pasi

> Andreas
>
> diff -urN xx/xen/arch/x86/hpet.c xen-4.0.1/xen/arch/x86/hpet.c
> --- xx/xen/arch/x86/hpet.c	2010-08-25 12:22:11.000000000 +0200
> +++ xen-4.0.1/xen/arch/x86/hpet.c	2010-08-30 18:13:34.000000000 +0200
> @@ -405,7 +405,7 @@
>          /* Only consider HPET timer with MSI support */
>          if ( !(cfg & HPET_TN_FSB_CAP) )
>              continue;
> -
> +if (1) continue;
>          ch->flags = 0;
>          ch->idx = i;
>  
> @@ -703,8 +703,9 @@
>  
>  int hpet_broadcast_is_available(void)
>  {
> -    return (legacy_hpet_event.event_handler == handle_hpet_broadcast
> -            || num_hpets_used > 0);
> +    /*return (legacy_hpet_event.event_handler == handle_hpet_broadcast
> +            || num_hpets_used > 0);*/
> +    return 0;
>  }
>  
>  int hpet_legacy_irq_tick(void)
> (XEN) ''*'' pressed -> firing all diagnostic keyhandlers
> (XEN) [d: dump registers]
> (XEN) ''d'' pressed -> dumping registers
> (XEN) 
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000000000080   rcx: 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: 0000000000000000
> (XEN) rbp: ffff82c480367f28   rsp: ffff82c480367c38   r8:  0000000000004000
> (XEN) r9:  0000000000003fff   r10: ffff83043fd08000   r11: 0000000000000400
> (XEN) r12: ffff82c480375408   r13: ffff82c480367dc8   r14: ffff82c4801339f0
> (XEN) r15: ffff82c480100000   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000434385000   cr2: 000007feff5844c6
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c480367c38:
> (XEN)    0000000000000080 ffff82c480111182 ffff82c4802277c0
0000000000000065
> (XEN)    ffff82c480375408 ffff82c4801107b8 000007fffffd5000
ffff82c4802278a0
> (XEN)    000000000000002a ffff82c480367dc8 ffff82c480367dc8
ffff82c480110870
> (XEN)    ffff82c480224de0 ffff82c480224e58 0000000000000296
ffff82c480134eb6
> (XEN)    ffff82c48025e180 2a00000000000292 ffff82c480380260
ffff82c480224de0
> (XEN)    ffff82c480367dc8 000000cd86c47808 ffff82c48039f578
ffff82c48013456b
> (XEN)    ffff82c4801344f0 ffff82c480367dc8 ffff82c480133f51
ffff82c4801746a0
> (XEN)    ffff8300be696000 ffff82c4801b8933 ffff82c48037a930
ffff82c480367d48
> (XEN)    ffff82c480367dd8 000000fc00000000 ffff82000005a4bc
0000000000000000
> (XEN)    ffff820000c20b0f 0000000000004000 ffff82c480367db8
ffff82c480118875
> (XEN)    ffff82c48019ecd5 0000000000000001 ffff82c480367dc8
ffff8300be696000
> (XEN)    ffff82c480133f20 ffff82c480224de0 000000cd86c47808
ffff83043fd314b8
> (XEN)    ffff83043fd314b0 ffff82c4801e9615 ffff83043fd314b0
ffff83043fd314b8
> (XEN)    000000cd86c47808 ffff82c480224de0 ffff82c480133f20
ffff82c48025e180
> (XEN)    0000000000000001 ffff82c48025e0a0 0000000000000006
000000cdbfe68fd2
> (XEN)    ffff82c48025e2a0 0000000000000003 0000000000000000
ffff82c4803802a8
> (XEN)    ffff82c480224de0 0000000600000000 ffff82c480133f47
000000000000e008
> (XEN)    0000000000010246 ffff82c480367e70 0000000000000000
ffff82c48011f4eb
> (XEN)    ffff82c48025e180 ffff83043fd314b0 ffff82c4803802a8
ffff82c48011f5a5
> (XEN)    ffff8300be696000 ffff8300be697760 ffff82c480367f28
ffff82c48037a980
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN)    [<ffff82c480111182>] dump_registers+0x52/0x110
> (XEN)    [<ffff82c4801107b8>] run_all_keyhandlers+0x78/0xa0
> (XEN)    [<ffff82c480110870>] handle_keypress+0x90/0xe0
> (XEN)    [<ffff82c480134eb6>] serial_rx_interrupt+0x66/0xe0
> (XEN)    [<ffff82c48013456b>] __ns16550_poll+0x7b/0xf0
> (XEN)    [<ffff82c4801344f0>] __ns16550_poll+0x0/0xf0
> (XEN)    [<ffff82c480133f51>] ns16550_poll+0x31/0x40
> (XEN)    [<ffff82c4801746a0>] do_invalid_op+0x700/0x710
> (XEN)    [<ffff82c4801b8933>] vmx_vmexit_handler+0x253/0x1d10
> (XEN)    [<ffff82c480118875>] _csched_cpu_pick+0xd5/0x2f0
> (XEN)    [<ffff82c48019ecd5>] hvm_do_resume+0x175/0x1a0
> (XEN)    [<ffff82c480133f20>] ns16550_poll+0x0/0x40
> (XEN)    [<ffff82c4801e9615>] handle_exception_saved+0x2d/0x6b
> (XEN)    [<ffff82c480133f20>] ns16550_poll+0x0/0x40
> (XEN)    [<ffff82c480133f47>] ns16550_poll+0x27/0x40
> (XEN)    [<ffff82c48011f4eb>] execute_timer+0x2b/0x50
> (XEN)    [<ffff82c48011f5a5>] timer_softirq_action+0x95/0x240
> (XEN)    [<ffff82c48011d550>] __do_softirq+0x60/0xa0
> (XEN)    [<ffff82c4801b26e5>] vmx_asm_do_vmentry+0xd2/0xdd
> (XEN)    
> (XEN) *** Dumping CPU0 guest state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    0033:[<000000014011f3b4>]
> (XEN) RFLAGS: 0000000000000a17   CONTEXT: hvm guest
> (XEN) rax: 0000000000000371   rbx: 00000000039ae380   rcx: 0000000003b4ec80
> (XEN) rdx: 0000000000000006   rsi: 0000000003b2df00   rdi: 00000000039ad4e0
> (XEN) rbp: 00000000039b1e00   rsp: 00000000036ef2e0   r8:  0000000001f02000
> (XEN) r9:  00000000036ef590   r10: 4000000000000000   r11: 0000000001f02000
> (XEN) r12: 0000000000000fa0   r13: 00000000012fffff   r14: 0000000000000057
> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
> (XEN) cr3: 0000000009e54000   cr2: 000007feff5844c6
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 002b   cs: 0033
> (XEN) 
> (XEN) *** Dumping CPU1 host state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: ffff82c4801110d0   rbx: ffff82c48037a980   rcx: 0000000000000001
> (XEN) rdx: 0000000000000080   rsi: 0000000000000001   rdi: 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffff83043fd3fce8   r8:  0000000080050031
> (XEN) r9:  fffffffffffffffd   r10: 0000000000000008   r11: fffff800026e0db0
> (XEN) r12: 0000000000000001   r13: 0000000000000001   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000434386000   cr2: 000007fefa0516ce
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff83043fd3fce8:
> (XEN)    ffff82c48037a980 ffff82c48016bb32 ffff82c48025ea40
ffff82c48016c24f
> (XEN)    0000000000000012 ffff83043fd3ff28 ffff8300be690000
ffff82c4801b97de
> (XEN)    0000000000000000 ffff83043fd12c20 ffff82c48025e080
0000000000000082
> (XEN)    ffff82c4801ad9f3 0000000000000001 ffff83043fd3fde8
0000000000004000
> (XEN)    ffff83043fd3fdb8 ffff83042e320490 ffff82c48025e180
0000000000000296
> (XEN)    ffff83043fd3fdc8 ffff82c480260180 ffff82c48025e080
ffff83042e320450
> (XEN)    ffff8300be691768 ffff8300be690000 ffff8300be691760
ffffffffffff8000
> (XEN)    0000000001c9c380 ffff82c4801ada2c 0000000000000006
ffff8300be690000
> (XEN)    000000cd9e7d0daa ffff82c480260080 ffff82c48025e080
ffff82c48019eb91
> (XEN)    ffff82c48025e180 ffff8300be690000 000000cd9e7d0daa
ffff82c480260080
> (XEN)    ffff82c48025e080 ffff8300be690000 0000000001c9c380
ffff82c4801b36a3
> (XEN)    ffff8300be690000 ffff82c48014a97d 0000000001c9c380
ffff82c48011c311
> (XEN)    ffff82c4802600a0 ffff82c4801468fa ffff8300be690000
ffff82c4801adcac
> (XEN)    ffff8300be690000 ffff82c48011f6b0 ffff8300be690000
ffff8300be691760
> (XEN)    0000000001c9c380 ffff8300be690000 000000cd9e7d0daa
ffff82c48019c0ed
> (XEN)    ffff8300be690000 ffff8300be690000 0000000000000001
ffff82c4801b5abb
> (XEN)    0000000000000000 ffff8300be690000 fffff80002844c18
0000000000000004
> (XEN)    0000000000000000 fffff80002844c10 0000000000000001
ffff82c4801b2613
> (XEN)    0000000000000001 fffff80002844c10 0000000000000000
0000000000000004
> (XEN)    fffff80002844c18 0000000002038c32 fffff800026e0db0
0000000000000008
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN)    [<ffff82c48016bb32>] __smp_call_function_interrupt+0x82/0x90
> (XEN)    [<ffff82c48016c24f>] smp_call_function_interrupt+0x4f/0x90
> (XEN)    [<ffff82c4801b97de>] vmx_vmexit_handler+0x10fe/0x1d10
> (XEN)    [<ffff82c4801ad9f3>] pt_restore_timer+0x23/0xb0
> (XEN)    [<ffff82c4801ada2c>] pt_restore_timer+0x5c/0xb0
> (XEN)    [<ffff82c48019eb91>] hvm_do_resume+0x31/0x1a0
> (XEN)    [<ffff82c4801b36a3>] vmx_do_resume+0x123/0x1d0
> (XEN)    [<ffff82c48014a97d>] continue_running+0xd/0x20
> (XEN)    [<ffff82c48011c311>] schedule+0x381/0x500
> (XEN)    [<ffff82c4801468fa>] reprogram_timer+0x6a/0xb0
> (XEN)    [<ffff82c4801adcac>] pt_update_irq+0x2c/0x210
> (XEN)    [<ffff82c48011f6b0>] timer_softirq_action+0x1a0/0x240
> (XEN)    [<ffff82c48019c0ed>] hvm_interrupt_blocked+0x4d/0xd0
> (XEN)    [<ffff82c4801b5abb>] vmx_vmenter_helper+0x5b/0x150
> (XEN)    [<ffff82c4801b2613>] vmx_asm_do_vmentry+0x0/0xdd
> (XEN)    
> (XEN) *** Dumping CPU1 guest state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    1
> (XEN) RIP:    0010:[<fffff800026dc500>]
> (XEN) RFLAGS: 0000000000000202   CONTEXT: hvm guest
> (XEN) rax: 0000000000000001   rbx: 0000000002038c32   rcx: 0000000000010008
> (XEN) rdx: 0000000002038800   rsi: 0000000000000001   rdi: fffff80002848e80
> (XEN) rbp: fffff80002844c18   rsp: fffff88003163860   r8:  0000000000000000
> (XEN) r9:  fffffffffffffffd   r10: 0000000000000008   r11: fffff800026e0db0
> (XEN) r12: 0000000000000004   r13: 0000000000000000   r14: fffff80002844c10
> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
> (XEN) cr3: 0000000000187000   cr2: 000007fefa0516ce
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
> (XEN) 
> (XEN) *** Dumping CPU2 host state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: ffff82c4801110d0   rbx: ffff82c48037a980   rcx: 0000000000000001
> (XEN) rdx: 0000000000000100   rsi: 000000000390b480   rdi: 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffff83043fd2fce8   r8:  0000000080050031
> (XEN) r9:  00000000ffffff80   r10: 000000000361f590   r11: 0000000001d14000
> (XEN) r12: 0000000000000001   r13: 0000000000000001   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000275c06000   cr2: 0000000000528900
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff83043fd2fce8:
> (XEN)    ffff82c48037a980 ffff82c48016bb32 ffff82c48025ea40
ffff82c48016c24f
> (XEN)    0000000000000020 ffff83043fd2ff28 ffff8300bf45c000
ffff82c4801b97de
> (XEN)    4000000000000000 0000000000006000 0000000000004000
0000000000000297
> (XEN)    ffff82c48025e180 0000000000000002 ffff83043fd2fde8
0000000000006000
> (XEN)    ffff83043fd2fdb8 ffff82c480118875 ffff8300bf2f8000
0000000000000001
> (XEN)    ffff83043fd2fdc8 ffff82c48025ee40 ffff82c48025e080
00ff8300bf45c000
> (XEN)    ffff82c48025ee40 ffff82c48025e080 0000000000000008
0000000000000000
> (XEN)    0000000000000004 0000000000000000 000000000000000c
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000286
0000000000004000
> (XEN)    ffff83043fd31070 ffff83043fd31088 ffff82c48025e180
0000000000000296
> (XEN)    ffff830275c3bca0 ffff82c480262180 000000cdb41a4edd
ffff82c480262180
> (XEN)    ffff82c480118ab0 0000000000000002 000000cdb41c528f
ffff83043fd315d8
> (XEN)    ffff83043fd315d0 ffff82c4801468fa ffff82c480262180
ffff82c4801adcac
> (XEN)    ffff83043fd31088 ffff82c48011f6b0 000000000000d102
ffff8300bf45d760
> (XEN)    0000000000000000 ffff8300bf45c000 0000000000000080
ffff82c4801a4a3c
> (XEN)    00000000012fffff ffff8300bf45c000 0000000000000001
ffff82c4801b5abb
> (XEN)    0000000000000000 ffff8300bf45c000 0000000000000080
0000000000000fa0
> (XEN)    00000000012fffff 0000000000000057 0000000000000001
ffff82c4801b2613
> (XEN)    0000000000000001 0000000000000057 00000000012fffff
0000000000000fa0
> (XEN)    0000000000000080 000000000390c380 0000000001d14000
000000000361f590
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN)    [<ffff82c48016bb32>] __smp_call_function_interrupt+0x82/0x90
> (XEN)    [<ffff82c48016c24f>] smp_call_function_interrupt+0x4f/0x90
> (XEN)    [<ffff82c4801b97de>] vmx_vmexit_handler+0x10fe/0x1d10
> (XEN)    [<ffff82c480118875>] _csched_cpu_pick+0xd5/0x2f0
> (XEN)    [<ffff82c480118ab0>] csched_tick+0x0/0x240
> (XEN)    [<ffff82c4801468fa>] reprogram_timer+0x6a/0xb0
> (XEN)    [<ffff82c4801adcac>] pt_update_irq+0x2c/0x210
> (XEN)    [<ffff82c48011f6b0>] timer_softirq_action+0x1a0/0x240
> (XEN)    [<ffff82c4801a4a3c>] hvm_vcpu_has_pending_irq+0x4c/0xb0
> (XEN)    [<ffff82c4801b5abb>] vmx_vmenter_helper+0x5b/0x150
> (XEN)    [<ffff82c4801b2613>] vmx_asm_do_vmentry+0x0/0xdd
> (XEN)    
> (XEN) *** Dumping CPU2 guest state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    0033:[<0000000140252b39>]
> (XEN) RFLAGS: 0000000000000286   CONTEXT: hvm guest
> (XEN) rax: 0000000000008401   rbx: 000000000390c380   rcx: 0000000003bc1600
> (XEN) rdx: 0000000000000000   rsi: 000000000390b480   rdi: 000000000390c680
> (XEN) rbp: 0000000000000080   rsp: 000000000361f310   r8:  000000000361f590
> (XEN) r9:  00000000ffffff80   r10: 000000000361f590   r11: 0000000001d14000
> (XEN) r12: 0000000000000fa0   r13: 00000000012fffff   r14: 0000000000000057
> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000006f8
> (XEN) cr3: 0000000012245000   cr2: 0000000000528900
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 002b   cs: 0033
> (XEN) 
> (XEN) *** Dumping CPU3 host state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: ffff82c4801110d0   rbx: ffff82c48037a980   rcx: 0000000000000001
> (XEN) rdx: 0000000000000180   rsi: 0000000004738a40   rdi: 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffff83043fd1fce8   r8:  0000000080050031
> (XEN) r9:  000000000351f360   r10: 4000000000000000   r11: 0000000001d18000
> (XEN) r12: 0000000000000001   r13: 0000000000000001   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000275c06000   cr2: 0000000000528900
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff83043fd1fce8:
> (XEN)    ffff82c48037a980 ffff82c48016bb32 ffff82c48025ea40
ffff82c48016c24f
> (XEN)    ffff8300bf45c000 ffff83043fd1ff28 ffff8300bf45c000
ffff82c4801b97de
> (XEN)    ffff8300bf45c000 ffff8300bf45d760 ffffffffffff8000
0000000000000001
> (XEN)    ffff82c4801ada2c 0000000000000003 ffff83043fd1fde8
0000000000004000
> (XEN)    ffff83043fd1fdb8 ffff82c480118875 ffff82c48019eb91
0000000000000001
> (XEN)    ffff83043fd1fdc8 ffff82c48025ee40 ffff82c48025e080
00ff83043fd1fe00
> (XEN)    ffff82c48025ee40 ffff82c48025e080 0000000000000004
0000000000000000
> (XEN)    0000000000000008 0000000000000000 000000000000000c
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000286
0000000000006000
> (XEN)    ffff83043fd31130 ffff83043fd31148 ffff82c48025e180
0000000000000296
> (XEN)    ffff830275c3bca0 ffff82c480264180 ffff8300bf45c000
ffff82c480264180
> (XEN)    ffff82c480118ab0 0000000000000003 000000cdc8f8e6f2
ffff83043fd31668
> (XEN)    ffff83043fd31660 ffff82c4801468fa ffff82c480264180
ffff82c4801adcac
> (XEN)    ffff83043fd31148 ffff82c48011f6b0 ffff82c48037ab00
ffff8300bf45d760
> (XEN)    ffff82c48037a980 ffff8300bf45c000 000000000371d600
ffff82c4801a4a3c
> (XEN)    00000000012fffff ffff8300bf45c000 0000000000000001
ffff82c4801b5abb
> (XEN)    0000000000000000 ffff8300bf45c000 0000000003720000
0000000000000fa0
> (XEN)    00000000012fffff 0000000000000057 0000000000000000
ffff82c4801b2613
> (XEN)    0000000000000000 0000000000000057 00000000012fffff
0000000000000fa0
> (XEN)    0000000003720000 000000000371dc00 0000000001d18000
4000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801110d1>] __dump_execstate+0x1/0x60
> (XEN)    [<ffff82c48016bb32>] __smp_call_function_interrupt+0x82/0x90
> (XEN)    [<ffff82c48016c24f>] smp_call_function_interrupt+0x4f/0x90
> (XEN)    [<ffff82c4801b97de>] vmx_vmexit_handler+0x10fe/0x1d10
> (XEN)    [<ffff82c4801ada2c>] pt_restore_timer+0x5c/0xb0
> (XEN)    [<ffff82c480118875>] _csched_cpu_pick+0xd5/0x2f0
> (XEN)    [<ffff82c48019eb91>] hvm_do_resume+0x31/0x1a0
> (XEN)    [<ffff82c480118ab0>] csched_tick+0x0/0x240
> (XEN)    [<ffff82c4801468fa>] reprogram_timer+0x6a/0xb0
> (XEN)    [<ffff82c4801adcac>] pt_update_irq+0x2c/0x210
> (XEN)    [<ffff82c48011f6b0>] timer_softirq_action+0x1a0/0x240
> (XEN)    [<ffff82c4801a4a3c>] hvm_vcpu_has_pending_irq+0x4c/0xb0
> (XEN)    [<ffff82c4801b5abb>] vmx_vmenter_helper+0x5b/0x150
> (XEN)    [<ffff82c4801b2613>] vmx_asm_do_vmentry+0x0/0xdd
> (XEN)    
> (XEN) *** Dumping CPU3 guest state: ***
> (XEN) ----[ Xen-4.0.1  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    3
> (XEN) RIP:    0033:[<000000014011f512>]
> (XEN) RFLAGS: 0000000000000286   CONTEXT: hvm guest
> (XEN) rax: 00000000000084fa   rbx: 000000000371dc00   rcx: 0000000004763580
> (XEN) rdx: 000000000000002b   rsi: 0000000004738a40   rdi: 000000000371d300
> (XEN) rbp: 0000000003720000   rsp: 000000000351f0b0   r8:  0000000001d18000
> (XEN) r9:  000000000351f360   r10: 4000000000000000   r11: 0000000001d18000
> (XEN) r12: 0000000000000fa0   r13: 00000000012fffff   r14: 0000000000000057
> (XEN) r15: 0000000000000000   cr0: 0000000080050031   cr4: 00000000000006f8
> (XEN) cr3: 0000000012245000   cr2: 0000000000528900
> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 002b   cs: 0033
> (XEN) 
> (XEN) [0: dump Dom0 registers]
> (XEN) ''0'' pressed -> dumping Dom0''s registers
> (XEN) *** Dumping Dom0 vcpu#0 state: ***
> (XEN) RIP:    e033:[<ffffffff810093aa>]
> (XEN) RFLAGS: 0000000000000246   EM: 0   CONTEXT: pv guest
> (XEN) rax: 0000000000000000   rbx: ffffffff8147c000   rcx: ffffffff810093aa
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000001
> (XEN) rbp: ffffffff8147dee0   rsp: ffffffff8147dec8   r8:  0000000000000000
> (XEN) r9:  ffff880001817388   r10: 000000cd462353f3   r11: 0000000000000246
> (XEN) r12: ffffffff814ed038   r13: ffffffff8147c000   r14: ffffffffffffffff
> (XEN) r15: 0000000000000000   cr0: 0000000000000008   cr4: 0000000000002660
> (XEN) cr3: 0000000277678000   cr2: 000000000156e000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffffffff8147dec8:
> (XEN)    0000000100023801 0000000000000000 ffffffff8100edf0
ffffffff8147def8
> (XEN)    ffffffff8100b9c0 ffffffff8147dfd8 ffffffff8147df28
ffffffff8101167b
> (XEN)    ffffffff81537920 ffffffff8147c000 ffffffff81537920
0000000000000000
> (XEN)    ffffffff8147df48 ffffffff8134a64e ffffffff8147df48
ffffffff81535940
> (XEN)    ffffffff8147df88 ffffffff81504cf5 ffffffff8147df88
ffffffff81537920
> (XEN)    00000000015c9ed8 0000000000000000 0000000000000000
0000000000000000
> (XEN)    ffffffff8147dfa8 ffffffff8150433a ffffffff814fd5a0
ffffffff81001000
> (XEN)    ffffffff8147dff8 ffffffff81506baf 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000001
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> (XEN) *** Dumping Dom0 vcpu#1 state: ***
> (XEN) RIP:    e033:[<ffffffff810093aa>]
> (XEN) RFLAGS: 0000000000000246   EM: 0   CONTEXT: pv guest
> (XEN) rax: 0000000000000000   rbx: ffff88003f8a2000   rcx: ffffffff810093aa
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000001
> (XEN) rbp: ffff88003f8a3ef8   rsp: ffff88003f8a3ee0   r8:  0000000000000000
> (XEN) r9:  0000000000000002   r10: 0000000000000000   r11: 0000000000000246
> (XEN) r12: ffffffff814ed038   r13: ffff88003f8a2000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 0000000000002660
> (XEN) cr3: 000000027b5fa000   cr2: 00007f3548d00810
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffff88003f8a3ee0:
> (XEN)    00000001000237a8 0000000000000000 ffffffff8100edf0
ffff88003f8a3f10
> (XEN)    ffffffff8100b9c0 ffff88003f8a3fd8 ffff88003f8a3f40
ffffffff8101167b
> (XEN)    ffffffff8100f6b9 0000000000000000 0000000000000000
0000000000000000
> (XEN)    ffff88003f8a3f50 ffffffff81353f8d 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN) [H: dump heap info]
> (XEN) ''H'' pressed -> dumping heap info
(now-0xCD:DE2462AC)
> (XEN) heap[node=0][zone=0] -> 0 pages
> (XEN) heap[node=0][zone=1] -> 0 pages
> (XEN) heap[node=0][zone=2] -> 0 pages
> (XEN) heap[node=0][zone=3] -> 0 pages
> (XEN) heap[node=0][zone=4] -> 0 pages
> (XEN) heap[node=0][zone=5] -> 0 pages
> (XEN) heap[node=0][zone=6] -> 0 pages
> (XEN) heap[node=0][zone=7] -> 0 pages
> (XEN) heap[node=0][zone=8] -> 0 pages
> (XEN) heap[node=0][zone=9] -> 0 pages
> (XEN) heap[node=0][zone=10] -> 0 pages
> (XEN) heap[node=0][zone=11] -> 0 pages
> (XEN) heap[node=0][zone=12] -> 0 pages
> (XEN) heap[node=0][zone=13] -> 0 pages
> (XEN) heap[node=0][zone=14] -> 16120 pages
> (XEN) heap[node=0][zone=15] -> 32768 pages
> (XEN) heap[node=0][zone=16] -> 65536 pages
> (XEN) heap[node=0][zone=17] -> 131072 pages
> (XEN) heap[node=0][zone=18] -> 262144 pages
> (XEN) heap[node=0][zone=19] -> 259171 pages
> (XEN) heap[node=0][zone=20] -> 1048576 pages
> (XEN) heap[node=0][zone=21] -> 1009493 pages
> (XEN) heap[node=0][zone=22] -> 0 pages
> (XEN) heap[node=0][zone=23] -> 0 pages
> (XEN) heap[node=0][zone=24] -> 0 pages
> (XEN) heap[node=0][zone=25] -> 0 pages
> (XEN) heap[node=0][zone=26] -> 0 pages
> (XEN) heap[node=0][zone=27] -> 0 pages
> (XEN) heap[node=0][zone=28] -> 0 pages
> (XEN) heap[node=0][zone=29] -> 0 pages
> (XEN) heap[node=0][zone=30] -> 0 pages
> (XEN) heap[node=0][zone=31] -> 0 pages
> (XEN) heap[node=0][zone=32] -> 0 pages
> (XEN) heap[node=0][zone=33] -> 0 pages
> (XEN) heap[node=0][zone=34] -> 0 pages
> (XEN) heap[node=0][zone=35] -> 0 pages
> (XEN) heap[node=0][zone=36] -> 0 pages
> (XEN) heap[node=0][zone=37] -> 0 pages
> (XEN) heap[node=0][zone=38] -> 0 pages
> (XEN) heap[node=0][zone=39] -> 0 pages
> (XEN) [M: dump MSI state]
> (XEN) PCI-MSI interrupt information:
> (XEN)  MSI    32 vec=31 lowest  edge   assert  log lowest dest=00000001
mask=0/0/-1
> (XEN)  MSI    33 vec=49 lowest  edge   assert  log lowest dest=00000001
mask=0/0/-1
> (XEN) [Q: dump PCI devices]
> (XEN) ==== PCI devices ===> (XEN) 06:03.0 - dom 0   - MSIs < >
> (XEN) 05:00.0 - dom 0   - MSIs < >
> (XEN) 04:00.0 - dom 0   - MSIs < >
> (XEN) 02:00.0 - dom 0   - MSIs < 33 >
> (XEN) 01:00.0 - dom 0   - MSIs < >
> (XEN) 00:1f.3 - dom 0   - MSIs < >
> (XEN) 00:1f.2 - dom 0   - MSIs < 32 >
> (XEN) 00:1f.0 - dom 0   - MSIs < >
> (XEN) 00:1e.0 - dom 0   - MSIs < >
> (XEN) 00:1d.0 - dom 0   - MSIs < >
> (XEN) 00:1c.5 - dom 0   - MSIs < >
> (XEN) 00:1c.4 - dom 0   - MSIs < >
> (XEN) 00:1c.0 - dom 0   - MSIs < >
> (XEN) 00:1a.0 - dom 0   - MSIs < >
> (XEN) 00:10.1 - dom 0   - MSIs < >
> (XEN) 00:10.0 - dom 0   - MSIs < >
> (XEN) 00:08.3 - dom 0   - MSIs < >
> (XEN) 00:08.2 - dom 0   - MSIs < >
> (XEN) 00:08.1 - dom 0   - MSIs < >
> (XEN) 00:08.0 - dom 0   - MSIs < >
> (XEN) 00:05.0 - dom 0   - MSIs < >
> (XEN) 00:03.0 - dom 0   - MSIs < >
> (XEN) 00:00.0 - dom 0   - MSIs < >
> (XEN) [a: dump timer queues]
> (XEN) Dumping timer queues: NOW=0x000000CDDE265C34
> (XEN) CPU[00]   1 : ffff82c4803802a8 ex=0x000000CDDE31A231 ffff82c480224de0
ffff82c480133f20
> (XEN)   2 : ffff82c48037a900 ex=0x000000CDDFEC2B8F 0000000000000000
ffff82c480117520
> (XEN)   3 : ffff83043fdf7778 ex=0x000000CDDE6D4A00 0000000000000000
ffff82c480118ab0
> (XEN)   4 : ffff82c48039a520 ex=0x000000CEDBF58071 0000000000000000
ffff82c480196f30
> (XEN)   5 : ffff82c480396d20 ex=0x000000CE19BDF7C9 0000000000000000
ffff82c480170260
> (XEN)   6 : ffff82c480396e20 ex=0x000000D19FE29133 0000000000000000
ffff82c48016ffd0
> (XEN) 
> (XEN) CPU[01]   1 : ffff83042e320490 ex=0x000000CDDE9CF908 ffff83042e320450
ffff82c4801ad330
> (XEN)   2 : ffff83043fdf7f68 ex=0x000000CDDEA4CF85 0000000000000001
ffff82c480118ab0
> (XEN)   3 : ffff83042e320400 ex=0x000000CDEE44861B ffff83042e3203c0
ffff82c4801a7070
> (XEN)   4 : ffff82c4802600a0 ex=0x000000CDDFF04E4D 0000000000000000
ffff82c48011ad10
> (XEN) 
> (XEN) CPU[02]   1 : ffff83043fd31088 ex=0x000000CDDE6D4A00 0000000000000002
ffff82c480118ab0
> (XEN)   2 : ffff82c4802620a0 ex=0x000000CDDFE45374 0000000000000000
ffff82c48011ad10
> (XEN)   3 : ffff830275c20490 ex=0x000000CDDE9CF908 ffff830275c20450
ffff82c4801ad330
> (XEN)   4 : ffff830275c20718 ex=0x00000103C1093DE3 ffff830275c206f8
ffff82c4801a6830
> (XEN)   5 : ffff830275c20400 ex=0x000000CDFC5C355A ffff830275c203c0
ffff82c4801a7070
> (XEN)   6 : ffff8300bf2f8058 ex=0x000000CDDEB2AC28 ffff8300bf2f8000
ffff82c48011ba60
> (XEN) 
> (XEN) CPU[03]   1 : ffff8300be692058 ex=0x000000CDDE385B73 ffff8300be692000
ffff82c48011ba60
> (XEN)   2 : ffff82c4802640a0 ex=0x000000CDDFEE04B6 0000000000000000
ffff82c48011ad10
> (XEN)   3 : ffff83043fd31148 ex=0x000000CDDE6D4A00 0000000000000003
ffff82c480118ab0
> (XEN)   4 : ffff83042e320718 ex=0x000001030021801F ffff83042e3206f8
ffff82c4801a6830
> (XEN) 
> (XEN) [c: dump ACPI Cx structures]
> (XEN) ''c'' pressed -> printing ACPI Cx structures
> (XEN) ==cpu0=> (XEN) active state:		C3
> (XEN) max_cstate:		C7
> (XEN) states:
> (XEN)     C1:	type[C1] latency[001] usage[00796066] method[  FFH]
duration[19697677581]
> (XEN)     C2:	type[C2] latency[017] usage[00000000] method[  FFH]
duration[0]
> (XEN)    *C3:	type[C3] latency[017] usage[00911081] method[  FFH]
duration[685338169141]
> (XEN)     C0:	usage[01707147] duration[179159560144]
> (XEN) ==cpu1=> (XEN) active state:		C3
> (XEN) max_cstate:		C7
> (XEN) states:
> (XEN)     C1:	type[C1] latency[001] usage[00282336] method[  FFH]
duration[8738035183]
> (XEN)     C2:	type[C2] latency[017] usage[00000000] method[  FFH]
duration[0]
> (XEN)    *C3:	type[C3] latency[017] usage[00254304] method[  FFH]
duration[673317637054]
> (XEN)     C0:	usage[00536640] duration[202139745271]
> (XEN) ==cpu2=> (XEN) active state:		C3
> (XEN) max_cstate:		C7
> (XEN) states:
> (XEN)     C1:	type[C1] latency[001] usage[00531258] method[  FFH]
duration[12402144588]
> (XEN)     C2:	type[C2] latency[017] usage[00000000] method[  FFH]
duration[0]
> (XEN)    *C3:	type[C3] latency[017] usage[00259457] method[  FFH]
duration[674870343631]
> (XEN)     C0:	usage[00790715] duration[196922939670]
> (XEN) ==cpu3=> (XEN) active state:		C3
> (XEN) max_cstate:		C7
> (XEN) states:
> (XEN)     C1:	type[C1] latency[001] usage[00367553] method[  FFH]
duration[10728282307]
> (XEN)     C2:	type[C2] latency[017] usage[00000000] method[  FFH]
duration[0]
> (XEN)    *C3:	type[C3] latency[017] usage[00267950] method[  FFH]
duration[693228162741]
> (XEN)     C0:	usage[00635503] duration[180238993044]
> (XEN) [e: dump evtchn info]
> (XEN) ''e'' pressed -> dumping event-channel info
> (XEN) Event channel information for domain 0:
> (XEN) Polling vCPUs: {00000000,00000000,00000000,00000001}
> (XEN)     port [p/m]
> (XEN)        1 [0/0]: s=5 n=0 v=0 x=0
> (XEN)        2 [0/0]: s=6 n=0 x=0
> (XEN)        3 [0/0]: s=6 n=0 x=0
> (XEN)        4 [0/0]: s=5 n=0 v=1 x=0
> (XEN)        5 [0/0]: s=6 n=0 x=0
> (XEN)        6 [0/0]: s=5 n=1 v=0 x=0
> (XEN)        7 [0/0]: s=6 n=1 x=0
> (XEN)        8 [0/0]: s=6 n=1 x=0
> (XEN)        9 [0/0]: s=5 n=1 v=1 x=0
> (XEN)       10 [0/0]: s=6 n=1 x=0
> (XEN)       11 [0/0]: s=3 n=0 d=0 p=22 x=0
> (XEN)       12 [0/0]: s=4 n=0 p=9 x=0
> (XEN)       13 [0/0]: s=5 n=0 v=9 x=0
> (XEN)       14 [0/0]: s=5 n=0 v=2 x=0
> (XEN)       15 [0/0]: s=4 n=0 p=16 x=0
> (XEN)       16 [0/0]: s=4 n=0 p=279 x=0
> (XEN)       17 [0/0]: s=4 n=0 p=21 x=0
> (XEN)       18 [0/0]: s=4 n=0 p=23 x=0
> (XEN)       19 [0/0]: s=4 n=0 p=12 x=0
> (XEN)       20 [0/0]: s=4 n=0 p=1 x=0
> (XEN)       21 [0/0]: s=4 n=0 p=278 x=0
> (XEN)       22 [0/0]: s=3 n=0 d=0 p=11 x=0
> (XEN)       23 [0/0]: s=5 n=0 v=3 x=0
> (XEN)       24 [0/0]: s=3 n=0 d=1 p=3 x=0
> (XEN)       25 [0/0]: s=3 n=0 d=1 p=1 x=0
> (XEN)       26 [0/0]: s=3 n=0 d=1 p=2 x=0
> (XEN)       27 [0/0]: s=3 n=0 d=2 p=3 x=0
> (XEN)       28 [0/0]: s=3 n=0 d=2 p=1 x=0
> (XEN)       29 [0/0]: s=3 n=0 d=2 p=2 x=0
> (XEN)       30 [0/0]: s=3 n=0 d=1 p=7 x=0
> (XEN)       31 [0/0]: s=3 n=0 d=1 p=8 x=0
> (XEN)       32 [0/0]: s=3 n=0 d=1 p=9 x=0
> (XEN)       33 [0/0]: s=3 n=0 d=2 p=7 x=0
> (XEN)       34 [0/0]: s=3 n=0 d=2 p=8 x=0
> (XEN)       35 [0/0]: s=3 n=0 d=2 p=9 x=0
> (XEN) Event channel information for domain 1:
> (XEN) Polling vCPUs: {00000000,00000000,00000000,00000001}
> (XEN)     port [p/m]
> (XEN)        1 [0/1]: s=3 n=0 d=0 p=25 x=1
> (XEN)        2 [0/1]: s=3 n=1 d=0 p=26 x=1
> (XEN)        3 [0/0]: s=3 n=0 d=0 p=24 x=0
> (XEN)        4 [0/1]: s=2 n=0 d=0 x=0
> (XEN)        5 [0/0]: s=6 n=0 x=0
> (XEN)        6 [0/0]: s=2 n=0 d=0 x=0
> (XEN)        7 [0/0]: s=3 n=0 d=0 p=30 x=0
> (XEN)        8 [0/0]: s=3 n=0 d=0 p=31 x=0
> (XEN)        9 [0/0]: s=3 n=0 d=0 p=32 x=0
> (XEN) Event channel information for domain 2:
> (XEN) Polling vCPUs: {00000000,00000000,00000000,00000001}
> (XEN)     port [p/m]
> (XEN)        1 [0/1]: s=3 n=0 d=0 p=28 x=1
> (XEN)        2 [0/1]: s=3 n=1 d=0 p=29 x=1
> (XEN)        3 [0/0]: s=3 n=0 d=0 p=27 x=0
> (XEN)        4 [0/1]: s=2 n=0 d=0 x=0
> (XEN)        5 [0/0]: s=6 n=0 x=0
> (XEN)        6 [0/0]: s=2 n=0 d=0 x=0
> (XEN)        7 [0/0]: s=3 n=0 d=0 p=33 x=0
> (XEN)        8 [0/0]: s=3 n=0 d=0 p=34 x=0
> (XEN)        9 [0/0]: s=3 n=0 d=0 p=35 x=0
> (XEN) [g: print grant table usage]
> (XEN) gnttab_usage_print_all [ key ''g'' pressed
> (XEN)       -------- active --------       -------- shared --------
> (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> (XEN) grant-table for remote domain:    0 ... no active grant table entries
> (XEN)       -------- active --------       -------- shared --------
> (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> (XEN) grant-table for remote domain:    1 (v1)
> (XEN) [15580]        0 0x285587 0x00000300          0 0x072387 0x09
> (XEN) [15581]        0 0x286905 0x00000300          0 0x070f05 0x09
> (XEN) [15595]        0 0x286905 0x00000300          0 0x070f05 0x09
> (XEN) [15629]        0 0x285b84 0x00000300          0 0x071d84 0x09
> (XEN) [15688]        0 0x285587 0x00000300          0 0x072387 0x09
> (XEN) [15718]        0 0x285e86 0x00000300          0 0x071886 0x09
> (XEN) [15719]        0 0x285b84 0x00000300          0 0x071d84 0x09
> (XEN) [15724]        0 0x285e86 0x00000300          0 0x071886 0x09
> (XEN) [15728]        0 0x28680b 0x00000300          0 0x070e0b 0x09
> (XEN) [15780]        0 0x286f89 0x00000300          0 0x070989 0x09
> (XEN) [15814]        0 0x287388 0x00000300          0 0x070588 0x09
> (XEN) [16087]        0 0x24a88e 0x00000300          0 0x02ce8e 0x09
> (XEN) [16108]        0 0x2f8848 0x00000001          0 0x07ee48 0x19
> (XEN) [16150]        0 0x2f8847 0x00000001          0 0x07ee47 0x19
> (XEN) [16202]        0 0x286e0a 0x00000300          0 0x07080a 0x09
> (XEN) [16371]        0 0x2f82ca 0x00000001          0 0x07f4ca 0x19
> (XEN) [16383]        0 0x2f82b3 0x00000001          0 0x07f4b3 0x19
> (XEN)       -------- active --------       -------- shared --------
> (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> (XEN) grant-table for remote domain:    2 (v1)
> (XEN) [15801]        0 0x3004d4 0x00000300          0 0x0766d4 0x09
> (XEN) [15839]        0 0x300d52 0x00000300          0 0x075f52 0x09
> (XEN) [15842]        0 0x300b53 0x00000300          0 0x076153 0x09
> (XEN) [15846]        0 0x300751 0x00000300          0 0x076551 0x09
> (XEN) [15957]        0 0x37cdc4 0x00000003          0 0x079fc4 0x19
> (XEN) [16038]        0 0x377cd5 0x00000001          0 0x07eed5 0x19
> (XEN) [16039]        0 0x377cd4 0x00000001          0 0x07eed4 0x19
> (XEN) [16045]        0 0x3005c9 0x00000003          0 0x0767c9 0x19
> (XEN) [16050]        0 0x3004d4 0x00000300          0 0x0766d4 0x09
> (XEN) [16063]        0 0x300751 0x00000300          0 0x076551 0x09
> (XEN) [16079]        0 0x300a42 0x00000003          0 0x076042 0x19
> (XEN) [16101]        0 0x301539 0x00000003          0 0x075739 0x19
> (XEN) [16135]        0 0x3001ca 0x00000003          0 0x076bca 0x19
> (XEN) [16165]        0 0x300751 0x00000300          0 0x076551 0x09
> (XEN) [16168]        0 0x300b53 0x00000300          0 0x076153 0x09
> (XEN) [16180]        0 0x300941 0x00000003          0 0x076341 0x19
> (XEN) [16203]        0 0x300c43 0x00000003          0 0x075e43 0x19
> (XEN) [16221]        0 0x3006c7 0x00000003          0 0x0764c7 0x19
> (XEN) [16233]        0 0x332e00 0x00000300          0 0x043c00 0x09
> (XEN) [16235]        0 0x300448 0x00000003          0 0x076648 0x19
> (XEN) [16254]        0 0x300d52 0x00000300          0 0x075f52 0x09
> (XEN) [16371]        0 0x3776e8 0x00000001          0 0x07f4e8 0x19
> (XEN) [16383]        0 0x3776d1 0x00000001          0 0x07f4d1 0x19
> (XEN) gnttab_usage_print_all ] done
> (XEN) [i: dump interrupt bindings]
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:f0
type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000001 vec:28
type=IO-APIC-edge    status=00000014 in-flight=0 domain-list=0:  1(----),
> (XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
type=XT-PIC          status=00000000 mapped, unbound
> (XEN)    IRQ:   3 affinity:00000000,00000000,00000000,0000000f vec:30
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   4 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:f1
type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   5 affinity:00000000,00000000,00000000,0000000f vec:38
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:40
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000001 vec:48
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   8 affinity:00000000,00000000,00000000,0000000f vec:50
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:58
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(----),
> (XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:60
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:68
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:70
type=IO-APIC-edge    status=00000010 in-flight=0 domain-list=0: 12(----),
> (XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:78
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:88
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:90
type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:d8
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 16(----),
> (XEN)    IRQ:  17 affinity:00000000,00000000,00000000,0000000f vec:21
type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:29
type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  21 affinity:00000000,00000000,00000000,00000001 vec:39
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 21(----),
> (XEN)    IRQ:  23 affinity:00000000,00000000,00000000,00000001 vec:41
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 23(----),
> (XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000008 vec:98
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000008 vec:a0
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000002 vec:a8
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  27 affinity:00000000,00000000,00000000,00000002 vec:b0
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  28 affinity:00000000,00000000,00000000,00000002 vec:b8
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000002 vec:c0
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  30 affinity:00000000,00000000,00000000,00000002 vec:c8
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000008 vec:d0
type=HPET-MSI        status=00000000 mapped, unbound
> (XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:31
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:279(----),
> (XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000001 vec:49
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:278(----),
> (XEN) IO-APIC interrupt information:
> (XEN)     IRQ  0 Vec240:
> (XEN)       Apic 0x00, Pin  2: vector=240, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:255
> (XEN)     IRQ  1 Vec 40:
> (XEN)       Apic 0x00, Pin  1: vector=40, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ  3 Vec 48:
> (XEN)       Apic 0x00, Pin  3: vector=48, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=1,
dest_id:15
> (XEN)     IRQ  4 Vec241:
> (XEN)       Apic 0x00, Pin  4: vector=241, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:255
> (XEN)     IRQ  5 Vec 56:
> (XEN)       Apic 0x00, Pin  5: vector=56, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=1,
dest_id:15
> (XEN)     IRQ  6 Vec 64:
> (XEN)       Apic 0x00, Pin  6: vector=64, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ  7 Vec 72:
> (XEN)       Apic 0x00, Pin  7: vector=72, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ  8 Vec 80:
> (XEN)       Apic 0x00, Pin  8: vector=80, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=1,
dest_id:15
> (XEN)     IRQ  9 Vec 88:
> (XEN)       Apic 0x00, Pin  9: vector=88, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=level, mask=0,
dest_id:1
> (XEN)     IRQ 10 Vec 96:
> (XEN)       Apic 0x00, Pin 10: vector=96, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ 11 Vec104:
> (XEN)       Apic 0x00, Pin 11: vector=104, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ 12 Vec112:
> (XEN)       Apic 0x00, Pin 12: vector=112, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ 13 Vec120:
> (XEN)       Apic 0x00, Pin 13: vector=120, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=1,
dest_id:15
> (XEN)     IRQ 14 Vec136:
> (XEN)       Apic 0x00, Pin 14: vector=136, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ 15 Vec144:
> (XEN)       Apic 0x00, Pin 15: vector=144, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=0, irr=0, trigger=edge, mask=0,
dest_id:1
> (XEN)     IRQ 16 Vec216:
> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
dest_mode=logical, delivery_status=1, polarity=1, irr=1, trigger=level, mask=0,
dest_id:1
> (XEN)     IRQ 17 Vec 33:
> (XEN)       Apic 0x00, Pin 17: vector=33, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=1, irr=0, trigger=level, mask=1,
dest_id:15
> (XEN)     IRQ 19 Vec 41:
> (XEN)       Apic 0x00, Pin 19: vector=41, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=1, irr=0, trigger=level, mask=1,
dest_id:15
> (XEN)     IRQ 21 Vec 57:
> (XEN)       Apic 0x00, Pin 21: vector=57, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=1, irr=0, trigger=level, mask=0,
dest_id:1
> (XEN)     IRQ 23 Vec 65:
> (XEN)       Apic 0x00, Pin 23: vector=65, delivery_mode=1,
dest_mode=logical, delivery_status=0, polarity=1, irr=0, trigger=level, mask=0,
dest_id:1
> (XEN) [m: memory info]
> (XEN) Physical memory information:
> (XEN)     Xen heap: 0kB free
> (XEN)     heap[14]: 64480kB free
> (XEN)     heap[15]: 131072kB free
> (XEN)     heap[16]: 262144kB free
> (XEN)     heap[17]: 524288kB free
> (XEN)     heap[18]: 1048576kB free
> (XEN)     heap[19]: 1036684kB free
> (XEN)     heap[20]: 4194304kB free
> (XEN)     heap[21]: 4037972kB free
> (XEN)     Dom heap: 11299520kB free
> (XEN) [n: NMI statistics]
> (XEN) CPU	NMI
> (XEN)   0	  0
> (XEN)   1	  0
> (XEN)   2	  0
> (XEN)   3	  0
> (XEN)   4	  0
> (XEN)   5	  0
> (XEN)   6	  0
> (XEN)   7	  0
> (XEN)   8	  0
> (XEN)   9	  0
> (XEN)  10	  0
> (XEN)  11	  0
> (XEN)  12	  0
> (XEN)  13	  0
> (XEN)  14	  0
> (XEN)  15	  0
> (XEN)  16	  0
> (XEN)  17	  0
> (XEN)  18	  0
> (XEN)  19	  0
> (XEN)  20	  0
> (XEN)  21	  0
> (XEN)  22	  0
> (XEN)  23	  0
> (XEN)  24	  0
> (XEN)  25	  0
> (XEN)  26	  0
> (XEN)  27	  0
> (XEN)  28	  0
> (XEN)  29	  0
> (XEN)  30	  0
> (XEN)  31	  0
> (XEN)  32	  0
> (XEN)  33	  0
> (XEN)  34	  0
> (XEN)  35	  0
> (XEN)  36	  0
> (XEN)  37	  0
> (XEN)  38	  0
> (XEN)  39	  0
> (XEN)  40	  0
> (XEN)  41	  0
> (XEN)  42	  0
> (XEN)  43	  0
> (XEN)  44	  0
> (XEN)  45	  0
> (XEN)  46	  0
> (XEN)  47	  0
> (XEN)  48	  0
> (XEN)  49	  0
> (XEN)  50	  0
> (XEN)  51	  0
> (XEN)  52	  0
> (XEN)  53	  0
> (XEN)  54	  0
> (XEN)  55	  0
> (XEN)  56	  0
> (XEN)  57	  0
> (XEN)  58	  0
> (XEN)  59	  0
> (XEN)  60	  0
> (XEN)  61	  0
> (XEN)  62	  0
> (XEN)  63	  0
> (XEN)  64	  0
> (XEN)  65	  0
> (XEN)  66	  0
> (XEN)  67	  0
> (XEN)  68	  0
> (XEN)  69	  0
> (XEN)  70	  0
> (XEN)  71	  0
> (XEN)  72	  0
> (XEN)  73	  0
> (XEN)  74	  0
> (XEN)  75	  0
> (XEN)  76	  0
> (XEN)  77	  0
> (XEN)  78	  0
> (XEN)  79	  0
> (XEN)  80	  0
> (XEN)  81	  0
> (XEN)  82	  0
> (XEN)  83	  0
> (XEN)  84	  0
> (XEN)  85	  0
> (XEN)  86	  0
> (XEN)  87	  0
> (XEN)  88	  0
> (XEN)  89	  0
> (XEN)  90	  0
> (XEN)  91	  0
> (XEN)  92	  0
> (XEN)  93	  0
> (XEN)  94	  0
> (XEN)  95	  0
> (XEN)  96	  0
> (XEN)  97	  0
> (XEN)  98	  0
> (XEN)  99	  0
> (XEN) 100	  0
> (XEN) 101	  0
> (XEN) 102	  0
> (XEN) 103	  0
> (XEN) 104	  0
> (XEN) 105	  0
> (XEN) 106	  0
> (XEN) 107	  0
> (XEN) 108	  0
> (XEN) 109	  0
> (XEN) 110	  0
> (XEN) 111	  0
> (XEN) 112	  0
> (XEN) 113	  0
> (XEN) 114	  0
> (XEN) 115	  0
> (XEN) 116	  0
> (XEN) 117	  0
> (XEN) 118	  0
> (XEN) 119	  0
> (XEN) 120	  0
> (XEN) 121	  0
> (XEN) 122	  0
> (XEN) 123	  0
> (XEN) 124	  0
> (XEN) 125	  0
> (XEN) 126	  0
> (XEN) 127	  0
> (XEN) dom0 vcpu0: NMI neither pending nor masked
> (XEN) [q: dump domain (and guest debug) info]
> (XEN) ''q'' pressed -> dumping domain info
(now=0xCE:0E1F0EE8)
> (XEN) General information for domain 0:
> (XEN)     refcnt=3 dying=0 nr_pages=260224 xenheap_pages=5 dirty_cpus={3}
max_pages=4294967295
> (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=0000000d
> (XEN) Rangesets belonging to domain 0:
> (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
d00-ffff }
> (XEN)     Interrupts { 0-279 }
> (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN)     DomPage list too long to display
> (XEN)     XenPage 000000000043fce0: caf=c000000000000002,
taf=7400000000000002
> (XEN)     XenPage 000000000043fcdf: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000043fcde: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000043fcdd: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000000bf780: caf=c000000000000002,
taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN)     VCPU0: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={3} cpu_affinity={0-127}
> (XEN)     No periodic timer
> (XEN)     VCPU1: CPU2 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={} cpu_affinity={0-127}
> (XEN)     No periodic timer
> (XEN) General information for domain 1:
> (XEN)     refcnt=3 dying=0 nr_pages=525271 xenheap_pages=34
dirty_cpus={0,2} max_pages=525312
> (XEN)     handle=3e5e7fbc-bf06-d566-007f-7c887fe0ea95 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external 
> (XEN) Rangesets belonging to domain 1:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 0000000000434393: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000434392: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000434391: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000434390: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000000bf471: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000004341ca: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767f4: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767f3: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767f2: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767f1: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767f0: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767ef: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767ee: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767ed: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767ec: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767eb: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767ea: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e9: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e8: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e7: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e6: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e5: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e4: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e3: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e2: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e1: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767e0: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767df: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767de: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767dd: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767dc: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767db: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767da: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000003767d9: caf=c000000000000001,
taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN)     VCPU0: CPU3 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={3} cpu_affinity={0-127}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN)     VCPU1: CPU0 [has=F] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={0} cpu_affinity={0-127}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) General information for domain 2:
> (XEN)     refcnt=3 dying=0 nr_pages=525271 xenheap_pages=34 dirty_cpus={1}
max_pages=525312
> (XEN)     handle=c32d89f1-f967-2e93-3aa8-9b32547900d8 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external 
> (XEN) Rangesets belonging to domain 2:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 2:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 0000000000275c13: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000275c12: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000275c11: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000275c10: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000000bf473: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000027660a: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376766: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376765: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376764: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376763: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376762: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376761: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376760: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675f: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675e: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675d: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675c: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675b: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037675a: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376759: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376758: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376757: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376756: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376755: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376754: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376753: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376752: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376751: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000376750: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037674f: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037674e: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037674d: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037674c: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000037674b: caf=c000000000000001,
taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 2:
> (XEN)     VCPU0: CPU1 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={1} cpu_affinity={0-127}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN)     VCPU1: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask
= 00 dirty_cpus={} cpu_affinity={0-127}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 4, stat 0/0/0)
> (XEN) Notifying guest 0:1 (virq 1, port 9, stat 0/0/0)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/-1/0)
> Aug 31 18:16:46 virt kernel: vcpu 0(XEN) Notifying guest 1:1 (virq 1, port
0, stat 0/-1/0)
> 
> (XEN) Notifying guest 2:0 (virq 1, port 0, stat 0/-1/0)
> Aug 31 18:16:46 virt kernel: 0: masked=0 pending=0 event_sel 00000000(XEN)
Notifying guest 2:1 (virq 1, port 0, stat 0/-1/0)
> 
> (XEN) [r: dump run queues]
> Aug 31 18:16:46 virt kernel: 1: masked=0 pending=0 event_sel 00000000(XEN)
Scheduler: SMP Credit Scheduler (credit)
> 
> (XEN) info:
> (XEN) 	ncpus              = 4
> (XEN) 	master             = 0
> (XEN) 	credit             = 1200
> (XEN) 	credit balance     = -802
> (XEN) 	weight             = 768
> (XEN) 	runq_sort          = 10301
> (XEN) 	default-weight     = 256
> (XEN) 	msecs per tick     = 10ms
> (XEN) 	credits per msec   = 10
> (XEN) 	ticks per tslice   = 3
> (XEN) 	ticks per acct     = 3
> (XEN) 	migration delay    = 0us
> Aug 31 18:16:46 virt kernel: pending:(XEN) idlers:
00000000,00000000,00000000,00000008
> 
> (XEN) active vcpus:
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) 	  1: [0.0] pri=-1 flags=0 cpu=2 credit=-2015
[w=256]
> 
> (XEN) 	  2: [2.1] pri=0 flags=0 cpu=3 credit=-225 [w=256]Aug 31 18:16:46
virt kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
> (XEN) 	  3: [2.0] pri=-2 flags=0 cpu=1
>  credit=-16076 [w=256]Aug 31 18:16:46 virt kernel: 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000
> (XEN) 	  4: 
> [1.0] pri=-2 flags=0 cpu=1Aug 31 18:16:46 virt kernel: 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 credit=-17250 [w=256]
> 
> (XEN) 	  5: Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000[1.1] pri=-2 flags=0 cpu=0
>  credit=-300 [w=256]Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
> (XEN) sched_smt_power_savings: disabled
> 
> (XEN) NOW=0x000000CE3D7AB8CB
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) CPU[00]
>  sort=10300, sibling=00000000,00000000,00000000,00000001, Aug 31 18:16:46
virt kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000core=00000000,00000000,00000000,0000000f
> 
> (XEN) 	run: [32767.0] pri=0 flags=0 cpu=0Aug 31 18:16:46 virt kernel: 
> (XEN) 	  1: 
> [1.1] pri=-2 flags=0 cpu=0Aug 31 18:16:46 virt kernel: masks: credit=-300
[w=256]
> 
> (XEN) CPU[01] Aug 31 18:16:46 virt kernel: ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
fffffffffffff sort=10301, sibling=00000000,00000000,00000000,00000002, fff
ffffffffffffffff ffffffffffffffffcore=00000000,00000000,00000000,0000000f
> 
> (XEN) 	run: Aug 31 18:16:46 virt kernel: ffffffffffffffff ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffff[2.0] pri=-2
flags=0 cpu=1fff ffffffffffffffff ffffffffffffffff credit=-16600 [w=256]
> 
> (XEN) 	  1: Aug 31 18:16:46 virt kernel: ffffffffffffffff ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffff[32767.1]
pri=-64 flags=0 cpu=1fff ffffffffffffffff ffffffffffffffff
> 
> (XEN) CPU[02] Aug 31 18:16:46 virt kernel: ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
fffffffffffff sort=10301, sibling=00000000,00000000,00000000,00000004,
core=00000000,00000000,00000000,0000000f
> (XEN) 	run: [0.0] pri=-1 flags=0 cpu=2 credit=-3956 [w=256]
> (XEN) 	  1: [32767.2] pri=-64 flags=0 cpu=2
> (XEN) CPU[03] fff ffffffffffffffff ffffffffffffffff sort=10301,
sibling=00000000,00000000,00000000,00000008,
> core=00000000,00000000,00000000,0000000f
> Aug 31 18:16:46 virt kernel: ffffffffffffffff ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffff(XEN) 	run:
[32767.3] pri=-64 flags=0 cpu=3fff ffffffffffffffff ffffffffffffffff
> 
> (XEN) [s: dump softtsc stats]
> Aug 31 18:16:46 virt kernel: ffffffffffffffff ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffff(XEN) TSC marked
as reliable, warp = 0 (count=2)
> fff ffffffffffffffff ffffffffffffffff(XEN) dom1(hvm): mode=0
> ,ofs=0x77519ca8a6Aug 31 18:16:46 virt kernel: ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
fffffffffffff,khz=2666746fff ffffffffffffffff ffffffffffffffff,inc=1
> 
> (XEN) dom2(hvm): mode=0Aug 31 18:16:46 virt kernel: ffffffffffffffff
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
fffffffffffff,ofs=0x781272adcefff ffffffffffffffff fffffff000000011,khz=2666746
> ,inc=1
> Aug 31 18:16:46 virt kernel: (XEN) No domains have emulated TSC
> 
> (XEN) [t: display multi-cpu clock info]
> Aug 31 18:16:46 virt kernel: unmasked:(XEN) Synced stime skew: max=76ns
avg=76ns samples=1 current=76ns
> 
> (XEN) Synced cycles skew: max=276 avg=276 samples=1 current=276
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) [u: dump numa info]
> 
> (XEN) ''u'' pressed -> dumping numa info
(now-0xCE:56A328B5)
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) idx0 -> NODE0 start->0 size->4456448
> 
> (XEN) phys_to_nid(0000000000001000) -> 0 should be 0
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) CPU0 -> NODE0
> 
> (XEN) CPU1 -> NODE0
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) CPU2 -> NODE0
> 
> (XEN) CPU3 -> NODE0
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN) Memory location of each domain:
> 
> (XEN) Domain 0 (total: 260224):
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN)     Node 0: 260224
> 
> (XEN) Domain 1 (total: 525271):
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN)     Node 0: 525271
> 
> (XEN) Domain 2 (total: 525271):
> Aug 31 18:16:46 virt kernel: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000(XEN)     Node 0: 525271
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-29 18:08 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 21.09.2010 13:56, Pasi Kärkkäinen wrote:>>   I am talking a while (via email) with Jan now to track the following
>> problem and he suggested that I report the problem on xen-devel:
>>
>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI
>> hang ?
>> Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
>> Jul  9 01:49:10 virt kernel: Calling adapter init
>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not
>> guaranteed on shared IRQs
>> Jul  9 01:49:49 virt kernel: Acquiring adapter information
>> Jul  9 01:49:49 virt kernel: update_interval=30:00
check_interval=86400s
>> Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous
>> command timed out.
>> Jul  9 01:53:13 virt kernel: Usually a result of a PCI interrupt
routing
>> problem;
>> Jul  9 01:53:13 virt kernel: update mother board BIOS or consider
>> utilizing one of
>> Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic
etc)
>>
>> After the VMs have been running a while the aacraid driver reports a
>> non-responding RAID controller. Most of the time the NIC is also no
>> longer working.
>> I nearly tried every combination of dom0 kernel (pvops0, xenfied suse
>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen
>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
>> No success in two month. Every combination earlier or later had the
>> problem shown above. I did extensive tests to make sure that the
>> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>>
>> Jan suggested to try the fix in c/s 22051 but it did not help. My
answer
>> to him:
>>
>>> In the meantime I did try xen-unstable c/s 22068 (contains staging
c/s
>> 22051) and
>>> it did not fix the problem at all. I was able to fix a problem with
>> the serial console
>>> and so I got some debug info that is attached to this email. The
>> following line looks
>>> suspicious to me (irr=1, delivery_status=1):
>>
>>> (XEN)     IRQ 16 Vec216:
>>> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
>> dest_mode=logical,
>>>              delivery_status=1, polarity=1, irr=1, trigger=level,
>> mask=0, dest_id:1
>>
>>> IRQ 16 is the aacraid controller which after some while seems to be
>> enable to receive
>>> interrupts. Can you see from the debug info what is going on?
>>
>> I also applied a small patch which disables HPET broadcast. The machine
>> is now running
>> for 110 hours without a crash while normally it crashes within a few
>> minutes. Is there
>> something wrong (race, deadlock) with HPET broadcasts in relation to
>> blocked interrupt
>> reception (see above)?
> What kind of hardware does this happen on?
It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
> Should this patch be merged?
Not easy to answer. I spend more than 10 weeks searching nearly full 
time for the reason of the stability issues. Finally I was able to track 
it down to the HPET broadcast code.

We need to find the developer of the HPET broadcast code. Then, he 
should try to fix the code. I consider it a quite severe bug as it 
renders Xen nearly useless on affected systems. That is why I (and my 
boss who pays me) spend so much time (developing/fixing Xen is not 
really my core job) and money (buying a E5620 machine just for testing Xen).

I think many people on affected systems are having problems. See 
http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Lyon

2010-Sep-29 19:34 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On Wed, Sep 29, 2010 at 7:08 PM, Andreas Kinzler <ml-xen-devel@hfp.de>
wrote:> On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
>>>
>>>  I am talking a while (via email) with Jan now to track the
following
>>> problem and he suggested that I report the problem on xen-devel:
>>>
>>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request.
SCSI
>>> hang ?
>>> Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
>>> Jul  9 01:49:10 virt kernel: Calling adapter init
>>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not
>>> guaranteed on shared IRQs
>>> Jul  9 01:49:49 virt kernel: Acquiring adapter information
>>> Jul  9 01:49:49 virt kernel: update_interval=30:00
check_interval=86400s
>>> Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first
asynchronous
>>> command timed out.
>>> Jul  9 01:53:13 virt kernel: Usually a result of a PCI interrupt
routing
>>> problem;
>>> Jul  9 01:53:13 virt kernel: update mother board BIOS or consider
>>> utilizing one of
>>> Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi,
apic
>>> etc)
>>>
>>> After the VMs have been running a while the aacraid driver reports
a
>>> non-responding RAID controller. Most of the time the NIC is also no
>>> longer working.
>>> I nearly tried every combination of dom0 kernel (pvops0, xenfied
suse
>>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen
>>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
>>> No success in two month. Every combination earlier or later had the
>>> problem shown above. I did extensive tests to make sure that the
>>> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>>>
>>> Jan suggested to try the fix in c/s 22051 but it did not help. My
answer
>>> to him:
>>>
>>>> In the meantime I did try xen-unstable c/s 22068 (contains
staging c/s
>>>
>>> 22051) and
>>>>
>>>> it did not fix the problem at all. I was able to fix a problem
with
>>>
>>> the serial console
>>>>
>>>> and so I got some debug info that is attached to this email.
The
>>>
>>> following line looks
>>>>
>>>> suspicious to me (irr=1, delivery_status=1):
>>>
>>>> (XEN)     IRQ 16 Vec216:
>>>> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
>>>
>>> dest_mode=logical,
>>>>
>>>>             delivery_status=1, polarity=1, irr=1,
trigger=level,
>>>
>>> mask=0, dest_id:1
>>>
>>>> IRQ 16 is the aacraid controller which after some while seems
to be
>>>
>>> enable to receive
>>>>
>>>> interrupts. Can you see from the debug info what is going on?
>>>
>>> I also applied a small patch which disables HPET broadcast. The
machine
>>> is now running
>>> for 110 hours without a crash while normally it crashes within a
few
>>> minutes. Is there
>>> something wrong (race, deadlock) with HPET broadcasts in relation
to
>>> blocked interrupt
>>> reception (see above)?
>>
>> What kind of hardware does this happen on?
>
> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
>
>> Should this patch be merged?
>
> Not easy to answer. I spend more than 10 weeks searching nearly full time
> for the reason of the stability issues. Finally I was able to track it down
> to the HPET broadcast code.
>
> We need to find the developer of the HPET broadcast code. Then, he should
> try to fix the code. I consider it a quite severe bug as it renders Xen
> nearly useless on affected systems. That is why I (and my boss who pays me)
> spend so much time (developing/fixing Xen is not really my core job) and
> money (buying a E5620 machine just for testing Xen).
>
> I think many people on affected systems are having problems. See
> http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html
>
> Regards Andreas
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
I will test that patch on my Supermicro X7DWA-N based dual Xeon
workstation, I always use a Xenified kernel rather than pv_ops as it
supports some features that I need and is compatible with nvidia
binary drivers, but I''ve always had problems with very occasional
hard/soft lockups :(.

I''ve ruled out the nvidia drivers, before going on holiday a few weeks
ago I upgraded Xen to 4.0.1 (from 3.4.2) and the kernel to 2.6.34
patched with the latest suse xen patches, but I did not compile the
nvidia module or run X using any other drivers, the system locked up
after 11 days of moderate load runtime, unfortunately my serial to
tcp/ip device was not working so I could not check the serial console
remotely and had to reboot the system.

This problem has happened with 2.6.29,30,31,32, and 34 + Xen 3.4.1,
3.4.2 and 4.0.1, I''ve also tried using the full suse patch set rather
than the minimal set of Xen patches that I usually use, no change, I
think this is a Xen problem.

Usually a soft lockup is reported by the linux kernel but it is
impossible to diagnose further as no i/o is possible so commands like
xm do not work, more rarely the system locks hard with no response at
all on the serial console and no errors logged.

In perhaps 1 in 20 of cases the lockup is temporary and the system
returns to normal performance, but usually it is terminal.

The machine is my main workstation and the problem is rare enough that
I''ve tolerated it, we recently got another dual Xeon workstation with
Supermicro X8DAL-i so it will be interesting to see if that has the
same issue.

Some example soft lockup errors, this one the system recovered from:

BUG: soft lockup - CPU#3 stuck for 2796s! [swapper:0]
Modules linked in: fuse cifs nvidia(P) ipv6 coretemp w83627hf w83793
hwmon_vid sco rfcomm bnep l2cap crc16 xen_scsibk st ftdi_sio usbserial
snd_usb_audio snd_hwdep snd_usb_lib snd_rawmidi snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm snd_timer snd sym53c8xx iTCO_wdt
i2c_i801 iTCO_vendor_support igb i5k_amb snd_page_alloc btusb
bluetooth [last unloaded: nvidia]
CPU 3
Modules linked in: fuse cifs nvidia(P) ipv6 coretemp w83627hf w83793
hwmon_vid sco rfcomm bnep l2cap crc16 xen_scsibk st ftdi_sio usbserial
snd_usb_audio snd_hwdep snd_usb_lib snd_rawmidi snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm snd_timer snd sym53c8xx iTCO_wdt
i2c_i801 iTCO_vendor_support igb i5k_amb snd_page_alloc btusb
bluetooth [last unloaded: nvidia]

Pid: 0, comm: swapper Tainted: P           2.6.34-xen-r4 #1 X7DWA/X7DWA
RIP: e030:[<ffffffff802013aa>]  [<ffffffff802013aa>]
0xffffffff802013aa
RSP: e02b:ffff8803ec4cdf10  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff8088a158 RCX: ffffffff802013aa
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff8803ec4cdfd8 R08: 0000000000000000 R09: ffffffff8088a158
R10: ffff880071aeecc0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f1a34310710(0000) GS:ffff880001049000(0000) knlGS:0000000000000000
CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00007f5b6b5fa000 CR3: 00000000363be000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8803ec4cc000, task ffff8803ec4be000)
Stack:
 000000000000a280 0000000000000000 ffffffff802062e1 ffffffff80209761
<0> ffffffff8088a158 ffffffff8020361f ffffffff804b2e73 0000000000000000
<0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff802062e1>] ? xen_safe_halt+0xc/0xd
 [<ffffffff80209761>] ? xen_idle+0x4f/0x85
 [<ffffffff8020361f>] ? cpu_idle+0x4b/0x80
 [<ffffffff804b2e73>] ? force_evtchn_callback+0x9/0xa
Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41>
5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
Call Trace:
 [<ffffffff802062e1>] ? xen_safe_halt+0xc/0xd
 [<ffffffff80209761>] ? xen_idle+0x4f/0x85
 [<ffffffff8020361f>] ? cpu_idle+0x4b/0x80
 [<ffffffff804b2e73>] ? force_evtchn_callback+0x9/0xa

Some older examples which were terminal:

Sep 25 05:10:12 ubermicro kernel: BUG: soft lockup - CPU#6 stuck for
61s! [xenvnc.sh:12180]
Sep 25 05:10:12 ubermicro kernel: Modules linked in: cifs nvidia(P)
ipv6 coretemp w83627hf w83793 hwmon_vid sco bnep rfcomm l2cap crc16
xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep snd_usb_lib
snd_rawmidi btusb bluetooth snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_timer iTCO_wdt snd iTCO_vendor_support
i2c_i801 snd_page_alloc igb sym53c8xx i5k_amb [last unloaded:
microcode]
Sep 25 05:10:12 ubermicro kernel: CPU 6
Sep 25 05:10:12 ubermicro kernel: Modules linked in: cifs nvidia(P)
ipv6 coretemp w83627hf w83793 hwmon_vid sco bnep rfcomm l2cap crc16
xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep snd_usb_lib
snd_rawmidi btusb bluetooth snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_timer iTCO_wdt snd iTCO_vendor_support
i2c_i801 snd_page_alloc igb sym53c8xx i5k_amb [last unloaded:
microcode]
Sep 25 05:10:12 ubermicro kernel:
Sep 25 05:10:12 ubermicro kernel: Pid: 12180, comm: xenvnc.sh Tainted:
P           2.6.34-xen-r4 #1 X7DWA/X7DWA
Sep 25 05:10:12 ubermicro kernel: RIP: e030:[<ffffffff8025328d>]
[<ffffffff8025328d>] smp_call_function_many+0x187/0x19c
Sep 25 05:10:12 ubermicro kernel: RSP: e02b:ffff88004e0c7dd8  EFLAGS: 00000202
Sep 25 05:10:12 ubermicro kernel: RAX: ffff880001086ac0 RBX:
ffff880001089b30 RCX: 00007f3124b39000
Sep 25 05:10:12 ubermicro kernel: RDX: ffff88000107f000 RSI:
0000000000000020 RDI: 0000000000000020
Sep 25 05:10:12 ubermicro kernel: RBP: ffff880001089b00 R08:
0000000000000000 R09: ffff880001089b30
Sep 25 05:10:12 ubermicro kernel: R10: 0000000000007ff0 R11:
ffff8803a73c21c0 R12: ffff8803a73c21c0
Sep 25 05:10:12 ubermicro kernel: R13: ffffffff80216f7f R14:
0000000000000006 R15: ffffffff8088a158
Sep 25 05:10:12 ubermicro kernel: FS:  00007f3125469700(0000)
GS:ffff88000107f000(0000) knlGS:0000000000000000
Sep 25 05:10:12 ubermicro kernel: CS:  e033 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep 25 05:10:12 ubermicro kernel: CR2: 00007f3124b39a90 CR3:
00000000008fd000 CR4: 0000000000002660
Sep 25 05:10:12 ubermicro kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Sep 25 05:10:12 ubermicro kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Sep 25 05:10:12 ubermicro kernel: Process xenvnc.sh (pid: 12180,
threadinfo ffff88004e0c6000, task ffff88004cdc3980)
Sep 25 05:10:12 ubermicro kernel: Stack:
Sep 25 05:10:12 ubermicro kernel: 0000000000000000 0100000000000010
ffff880005ab4588 ffff8803a73c21c0
Sep 25 05:10:12 ubermicro kernel: <0> ffff88004cdc3980
ffff88004cdc3e4c ffff8803a73c2220 0000000000000232
Sep 25 05:10:12 ubermicro kernel: <0> 0000000000000001
ffffffff80216f40 00007f3124b39a90 ffff8803a73c21c0
Sep 25 05:10:12 ubermicro kernel: Call Trace:
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80216f40>] ?
arch_exit_mmap+0x44/0x83
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80298322>] ?
exit_mmap+0x49/0x16c
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022a2a9>] ? mmput+0x28/0xe5
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022e03d>] ?
exit_mm+0x108/0x113
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802484d5>] ?
hrtimer_try_to_cancel+0x92/0x9d
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022fc58>] ?
do_exit+0x1f2/0x6e0
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022f8dd>] ?
sys_wait4+0xa5/0xb5
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802301f4>] ?
do_group_exit+0xae/0xd8
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80230230>] ?
sys_exit_group+0x12/0x17
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80204248>] ?
system_call_fastpath+0x16/0x1b
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802041e0>] ?
system_call+0x0/0x52
Sep 25 05:10:12 ubermicro kernel: Code: 7e 80 48 89 2d 55 8f 59 00 48
89 c6 48 89 6a 08 e8 82 0c 3c 00 0f ae f0 48 89 df e8 91 c1 fb ff 80
7c 24 0f 00 75 04 eb 08 f3 90 <f6> 45 20 01 75 f8 48 83 c4 18 5b 5d 41
5c 41 5d 41 5e 41 5f c3
Sep 25 05:10:12 ubermicro kernel: Call Trace:
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80216f40>] ?
arch_exit_mmap+0x44/0x83
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80298322>] ?
exit_mmap+0x49/0x16c
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022a2a9>] ? mmput+0x28/0xe5
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022e03d>] ?
exit_mm+0x108/0x113
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802484d5>] ?
hrtimer_try_to_cancel+0x92/0x9d
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022fc58>] ?
do_exit+0x1f2/0x6e0
Sep 25 05:10:12 ubermicro kernel: [<ffffffff8022f8dd>] ?
sys_wait4+0xa5/0xb5
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802301f4>] ?
do_group_exit+0xae/0xd8
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80230230>] ?
sys_exit_group+0x12/0x17
Sep 25 05:10:12 ubermicro kernel: [<ffffffff80204248>] ?
system_call_fastpath+0x16/0x1b
Sep 25 05:10:12 ubermicro kernel: [<ffffffff802041e0>] ?
system_call+0x0/0x52


Sep 29 02:54:47 ubermicro kernel: BUG: soft lockup - CPU#3 stuck for
2796s! [swapper:0]
Sep 29 02:54:47 ubermicro kernel: Modules linked in: fuse cifs
nvidia(P) ipv6 coretemp w83627hf w83793 hwmon_vid sco rfcomm bnep
l2cap crc16 xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep
snd_usb_lib snd_rawmidi snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_timer snd sym53c8xx iTCO_wdt i2c_i801
iTCO_vendor_support igb i5k_amb snd_page_alloc btusb bluetooth [last
unloaded: nvidia]
Sep 29 02:54:47 ubermicro kernel: CPU 3
Sep 29 02:54:47 ubermicro kernel: Modules linked in: fuse cifs
nvidia(P) ipv6 coretemp w83627hf w83793 hwmon_vid sco rfcomm bnep
l2cap crc16 xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep
snd_usb_lib snd_rawmidi snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_timer snd sym53c8xx iTCO_wdt i2c_i801
iTCO_vendor_support igb i5k_amb snd_page_alloc btusb bluetooth [last
unloaded: nvidia]
Sep 29 02:54:47 ubermicro kernel:
Sep 29 02:54:47 ubermicro kernel: Pid: 0, comm: swapper Tainted: P
      2.6.34-xen-r4 #1 X7DWA/X7DWA
Sep 29 02:54:47 ubermicro kernel: RIP: e030:[<ffffffff802013aa>]
[<ffffffff802013aa>] 0xffffffff802013aa
Sep 29 02:54:47 ubermicro kernel: RSP: e02b:ffff8803ec4cdf10  EFLAGS: 00000246
Sep 29 02:54:47 ubermicro kernel: RAX: 0000000000000000 RBX:
ffffffff8088a158 RCX: ffffffff802013aa
Sep 29 02:54:47 ubermicro kernel: RDX: 0000000000000000 RSI:
0000000000000000 RDI: 0000000000000001
Sep 29 02:54:47 ubermicro kernel: RBP: ffff8803ec4cdfd8 R08:
0000000000000000 R09: ffffffff8088a158
Sep 29 02:54:47 ubermicro kernel: R10: ffff880071aeecc0 R11:
0000000000000246 R12: 0000000000000000
Sep 29 02:54:47 ubermicro kernel: R13: 0000000000000000 R14:
0000000000000000 R15: 0000000000000000
Sep 29 02:54:47 ubermicro kernel: FS:  00007f1a34310710(0000)
GS:ffff880001049000(0000) knlGS:0000000000000000
Sep 29 02:54:47 ubermicro kernel: CS:  e033 DS: 002b ES: 002b CR0:
000000008005003b
Sep 29 02:54:47 ubermicro kernel: CR2: 00007f5b6b5fa000 CR3:
00000000363be000 CR4: 0000000000002660
Sep 29 02:54:47 ubermicro kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Sep 29 02:54:47 ubermicro kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Sep 29 02:54:47 ubermicro kernel: Process swapper (pid: 0, threadinfo
ffff8803ec4cc000, task ffff8803ec4be000)
Sep 29 02:54:47 ubermicro kernel: Stack:
Sep 29 02:54:47 ubermicro kernel: 000000000000a280 0000000000000000
ffffffff802062e1 ffffffff80209761
Sep 29 02:54:47 ubermicro kernel: <0> ffffffff8088a158
ffffffff8020361f ffffffff804b2e73 0000000000000000
Sep 29 02:54:47 ubermicro kernel: <0> 0000000000000000
0000000000000000 0000000000000000 0000000000000000
Sep 29 02:54:47 ubermicro kernel: Call Trace:
Sep 29 02:54:47 ubermicro kernel: [<ffffffff802062e1>] ?
xen_safe_halt+0xc/0xd
Sep 29 02:54:47 ubermicro kernel: [<ffffffff80209761>] ?
xen_idle+0x4f/0x85
Sep 29 02:54:47 ubermicro kernel: [<ffffffff8020361f>] ?
cpu_idle+0x4b/0x80
Sep 29 02:54:47 ubermicro kernel: [<ffffffff804b2e73>] ?
force_evtchn_callback+0x9/0xa
Sep 29 02:54:47 ubermicro kernel: Code: cc 51 41 53 b8 1c 00 00 00 0f
05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc
Sep 29 02:54:47 ubermicro kernel: Call Trace:
Sep 29 02:54:47 ubermicro kernel: [<ffffffff802062e1>] ?
xen_safe_halt+0xc/0xd
Sep 29 02:54:47 ubermicro kernel: [<ffffffff80209761>] ?
xen_idle+0x4f/0x85
Sep 29 02:54:47 ubermicro kernel: [<ffffffff8020361f>] ?
cpu_idle+0x4b/0x80
Sep 29 02:54:47 ubermicro kernel: [<ffffffff804b2e73>] ?
force_evtchn_callback+0x9/0xa


Sep  8 18:16:30 ubermicro kernel: BUG: soft lockup - CPU#2 stuck for
61s! [xenvnc.sh:29385]
Sep  8 18:16:30 ubermicro kernel: Modules linked in: fuse cifs
nvidia(P) ipv6 coretemp w83627hf w83793 hwmon_vid sco bnep rfcomm
l2cap crc16 xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep
snd_usb_lib snd_rawmidi btusb bluetooth snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm snd_timer snd i2c_i801 iTCO_wdt
iTCO_vendor_support snd_page_alloc igb i5k_amb sym53c8xx [last
unloaded: microcode]
Sep  8 18:16:30 ubermicro kernel: CPU 2
Sep  8 18:16:30 ubermicro kernel: Modules linked in: fuse cifs
nvidia(P) ipv6 coretemp w83627hf w83793 hwmon_vid sco bnep rfcomm
l2cap crc16 xen_scsibk st ftdi_sio usbserial snd_usb_audio snd_hwdep
snd_usb_lib snd_rawmidi btusb bluetooth snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_pcm snd_timer snd i2c_i801 iTCO_wdt
iTCO_vendor_support snd_page_alloc igb i5k_amb sym53c8xx [last
unloaded: microcode]
Sep  8 18:16:30 ubermicro kernel:
Sep  8 18:16:30 ubermicro kernel: Pid: 29385, comm: xenvnc.sh Tainted:
P           2.6.34-xen-r3 #1 X7DWA/X7DWA
Sep  8 18:16:30 ubermicro kernel: RIP: e030:[<ffffffff802531ef>]
[<ffffffff802531ef>] smp_call_function_many+0x185/0x19c
Sep  8 18:16:30 ubermicro kernel: RSP: e02b:ffff88009b18fdd8  EFLAGS: 00000202
Sep  8 18:16:30 ubermicro kernel: RAX: ffff88000103eac0 RBX:
ffff880001041b30 RCX: 00007f89e1853000
Sep  8 18:16:30 ubermicro kernel: RDX: ffff880001037000 RSI:
0000000000000020 RDI: 0000000000000020
Sep  8 18:16:30 ubermicro kernel: RBP: ffff880001041b00 R08:
0000000000000000 R09: ffff880001041b30
Sep  8 18:16:30 ubermicro kernel: R10: 0000000000007ff0 R11:
ffff8803d7d12800 R12: ffff8803d7d12800
Sep  8 18:16:30 ubermicro kernel: R13: ffffffff80216f7f R14:
0000000000000002 R15: ffffffff8088a158
Sep  8 18:16:30 ubermicro kernel: FS:  00007f89e2183700(0000)
GS:ffff880001037000(0000) knlGS:0000000000000000
Sep  8 18:16:30 ubermicro kernel: CS:  e033 DS: 0000 ES: 0000 CR0:
0000000080050033
Sep  8 18:16:30 ubermicro kernel: CR2: 00007f89e1853a90 CR3:
00000000008fd000 CR4: 0000000000002660
Sep  8 18:16:30 ubermicro kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Sep  8 18:16:30 ubermicro kernel: DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Sep  8 18:16:30 ubermicro kernel: Process xenvnc.sh (pid: 29385,
threadinfo ffff88009b18e000, task ffff8801004ad320)
Sep  8 18:16:30 ubermicro kernel: Stack:
Sep  8 18:16:30 ubermicro kernel: 0000000000000000 0100000000000010
ffff880006b3c398 ffff8803d7d12800
Sep  8 18:16:30 ubermicro kernel: <0> ffff8801004ad320
ffff8801004ad7ec ffff8803d7d12860 0000000000000403
Sep  8 18:16:30 ubermicro kernel: <0> 0000000000000001
ffffffff80216f40 00007f89e1853a90 ffff8803d7d12800
Sep  8 18:16:30 ubermicro kernel: Call Trace:
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80216f40>] ?
arch_exit_mmap+0x44/0x83
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80298202>] ?
exit_mmap+0x49/0x16c
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022a2a9>] ? mmput+0x28/0xe5
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022e025>] ?
exit_mm+0x108/0x113
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80248439>] ?
hrtimer_try_to_cancel+0x92/0x9d
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022fc40>] ?
do_exit+0x1f2/0x6e0
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022f8c5>] ?
sys_wait4+0xa5/0xb5
Sep  8 18:16:30 ubermicro kernel: [<ffffffff802301dc>] ?
do_group_exit+0xae/0xd8
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80230218>] ?
sys_exit_group+0x12/0x17
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80204248>] ?
system_call_fastpath+0x16/0x1b
Sep  8 18:16:30 ubermicro kernel: [<ffffffff802041e0>] ?
system_call+0x0/0x52
Sep  8 18:16:30 ubermicro kernel: Code: d0 c1 7e 80 48 89 2d f1 8f 59
00 48 89 c6 48 89 6a 08 e8 0e 08 3c 00 0f ae f0 48 89 df e8 2d c2 fb
ff 80 7c 24 0f 00 75 04 eb 08 <f3> 90 f6 45 20 01 75 f8 48 83 c4 18 5b
5d 41 5c 41 5d 41 5e 41
Sep  8 18:16:30 ubermicro kernel: Call Trace:
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80216f40>] ?
arch_exit_mmap+0x44/0x83
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80298202>] ?
exit_mmap+0x49/0x16c
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022a2a9>] ? mmput+0x28/0xe5
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022e025>] ?
exit_mm+0x108/0x113
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80248439>] ?
hrtimer_try_to_cancel+0x92/0x9d
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022fc40>] ?
do_exit+0x1f2/0x6e0
Sep  8 18:16:30 ubermicro kernel: [<ffffffff8022f8c5>] ?
sys_wait4+0xa5/0xb5
Sep  8 18:16:30 ubermicro kernel: [<ffffffff802301dc>] ?
do_group_exit+0xae/0xd8
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80230218>] ?
sys_exit_group+0x12/0x17
Sep  8 18:16:30 ubermicro kernel: [<ffffffff80204248>] ?
system_call_fastpath+0x16/0x1b
Sep  8 18:16:30 ubermicro kernel: [<ffffffff802041e0>] ?
system_call+0x0/0x52

I should be able to apply the patch tomorrow and will report back as
soon as I have some results.

Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Sep-29 19:50 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 09/29/2010 11:08 AM, Andreas Kinzler wrote:> On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
>>>   I am talking a while (via email) with Jan now to track the
following
>>> problem and he suggested that I report the problem on xen-devel:
>>>
>>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request.
SCSI
>>> hang ?
>>> Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
>>> Jul  9 01:49:10 virt kernel: Calling adapter init
>>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not
>>> guaranteed on shared IRQs
>>> Jul  9 01:49:49 virt kernel: Acquiring adapter information
>>> Jul  9 01:49:49 virt kernel: update_interval=30:00
>>> check_interval=86400s
>>> Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first
asynchronous
>>> command timed out.
>>> Jul  9 01:53:13 virt kernel: Usually a result of a PCI interrupt
>>> routing
>>> problem;
>>> Jul  9 01:53:13 virt kernel: update mother board BIOS or consider
>>> utilizing one of
>>> Jul  9 01:53:13 virt kernel: the SAFE mode kernel options (acpi,
>>> apic etc)
>>>
>>> After the VMs have been running a while the aacraid driver reports
a
>>> non-responding RAID controller. Most of the time the NIC is also no
>>> longer working.
>>> I nearly tried every combination of dom0 kernel (pvops0, xenfied
suse
>>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen
>>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
>>> No success in two month. Every combination earlier or later had the
>>> problem shown above. I did extensive tests to make sure that the
>>> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>>>
>>> Jan suggested to try the fix in c/s 22051 but it did not help. My
>>> answer
>>> to him:
>>>
>>>> In the meantime I did try xen-unstable c/s 22068 (contains
staging c/s
>>> 22051) and
>>>> it did not fix the problem at all. I was able to fix a problem
with
>>> the serial console
>>>> and so I got some debug info that is attached to this email.
The
>>> following line looks
>>>> suspicious to me (irr=1, delivery_status=1):
>>>
>>>> (XEN)     IRQ 16 Vec216:
>>>> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
>>> dest_mode=logical,
>>>>              delivery_status=1, polarity=1, irr=1,
trigger=level,
>>> mask=0, dest_id:1
>>>
>>>> IRQ 16 is the aacraid controller which after some while seems
to be
>>> enable to receive
>>>> interrupts. Can you see from the debug info what is going on?
>>>
>>> I also applied a small patch which disables HPET broadcast. The
machine
>>> is now running
>>> for 110 hours without a crash while normally it crashes within a
few
>>> minutes. Is there
>>> something wrong (race, deadlock) with HPET broadcasts in relation
to
>>> blocked interrupt
>>> reception (see above)?
>> What kind of hardware does this happen on?
>
> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
That''s exactly what my main test/devel machine is.  It has been very
stable for me with xen-unstable.  Is 4.0.1 different from xen-unstable
with respect to HPET?

The big problem I had initially was instability with the integrated
ethernet until I disabled PCIe ASPM.  The symptom was that the ethernet
devices would disappear (ie, their PCI config space would start to read
all 0xff...)
>> Should this patch be merged?
>
> Not easy to answer. I spend more than 10 weeks searching nearly full
> time for the reason of the stability issues. Finally I was able to
> track it down to the HPET broadcast code.
>
> We need to find the developer of the HPET broadcast code. Then, he
> should try to fix the code. I consider it a quite severe bug as it
> renders Xen nearly useless on affected systems. That is why I (and my
> boss who pays me) spend so much time (developing/fixing Xen is not
> really my core job) and money (buying a E5620 machine just for testing
> Xen).
>
> I think many people on affected systems are having problems. See
> http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html
Just out of interest, does disabling ASPM help?   I had to disable it in
the BIOS, and set pcie_aspm=off on the kernel command line.

This is a total shot in the dark, but given that we''re using identical
systems it seems worth a try.

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2010-Sep-29 21:18 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On Wed, Sep 29, 2010 at 08:34:28PM +0100, Andrew Lyon
wrote:> On Wed, Sep 29, 2010 at 7:08 PM, Andreas Kinzler
<ml-xen-devel@hfp.de> wrote:
> > On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
> >>>
> >>>  I am talking a while (via email) with Jan now to track the
following
> >>> problem and he suggested that I report the problem on
xen-devel:
> >>>
> >>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset
request. SCSI
> >>> hang ?
> >>> Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears hung
> >>> Jul  9 01:49:10 virt kernel: Calling adapter init
> >>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is
not
> >>> guaranteed on shared IRQs
> >>> Jul  9 01:49:49 virt kernel: Acquiring adapter information
> >>> Jul  9 01:49:49 virt kernel: update_interval=30:00
check_interval=86400s
> >>> Jul  9 01:53:13 virt kernel: aacraid: aac_fib_send: first
asynchronous
> >>> command timed out.
> >>> Jul  9 01:53:13 virt kernel: Usually a result of a PCI
interrupt routing
> >>> problem;
> >>> Jul  9 01:53:13 virt kernel: update mother board BIOS or
consider
> >>> utilizing one of
> >>> Jul  9 01:53:13 virt kernel: the SAFE mode kernel options
(acpi, apic
> >>> etc)
> >>>
> >>> After the VMs have been running a while the aacraid driver
reports a
> >>> non-responding RAID controller. Most of the time the NIC is
also no
> >>> longer working.
> >>> I nearly tried every combination of dom0 kernel (pvops0,
xenfied suse
> >>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with
Xen
> >>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
> >>> No success in two month. Every combination earlier or later
had the
> >>> problem shown above. I did extensive tests to make sure that
the
> >>> hardware is OK. And it is - I am sure it is a Xen/dom0
problem.
> >>>
> >>> Jan suggested to try the fix in c/s 22051 but it did not help.
My answer
> >>> to him:
> >>>
> >>>> In the meantime I did try xen-unstable c/s 22068 (contains
staging c/s
> >>>
> >>> 22051) and
> >>>>
> >>>> it did not fix the problem at all. I was able to fix a
problem with
> >>>
> >>> the serial console
> >>>>
> >>>> and so I got some debug info that is attached to this
email. The
> >>>
> >>> following line looks
> >>>>
> >>>> suspicious to me (irr=1, delivery_status=1):
> >>>
> >>>> (XEN)     IRQ 16 Vec216:
> >>>> (XEN)       Apic 0x00, Pin 16: vector=216,
delivery_mode=1,
> >>>
> >>> dest_mode=logical,
> >>>>
> >>>>             delivery_status=1, polarity=1, irr=1,
trigger=level,
> >>>
> >>> mask=0, dest_id:1
> >>>
> >>>> IRQ 16 is the aacraid controller which after some while
seems to be
> >>>
> >>> enable to receive
> >>>>
> >>>> interrupts. Can you see from the debug info what is going
on?
> >>>
> >>> I also applied a small patch which disables HPET broadcast.
The machine
> >>> is now running
> >>> for 110 hours without a crash while normally it crashes within
a few
> >>> minutes. Is there
> >>> something wrong (race, deadlock) with HPET broadcasts in
relation to
> >>> blocked interrupt
> >>> reception (see above)?
> >>
> >> What kind of hardware does this happen on?
> >
> > It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
> >
> >> Should this patch be merged?
> >
> > Not easy to answer. I spend more than 10 weeks searching nearly full
time
> > for the reason of the stability issues. Finally I was able to track it
down
> > to the HPET broadcast code.
> >
> > We need to find the developer of the HPET broadcast code. Then, he
should
> > try to fix the code. I consider it a quite severe bug as it renders
Xen
> > nearly useless on affected systems. That is why I (and my boss who
pays me)
> > spend so much time (developing/fixing Xen is not really my core job)
and
> > money (buying a E5620 machine just for testing Xen).
> >
> > I think many people on affected systems are having problems. See
> >
http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html
> >
> > Regards Andreas
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> 
> I will test that patch on my Supermicro X7DWA-N based dual Xeon
> workstation, I always use a Xenified kernel rather than pv_ops as it
> supports some features that I need and is compatible with nvidia
> binary drivers, but I''ve always had problems with very occasional
<hint> The PVOPS kernel works with the nouveau driver</hint>
Look at http://wiki.xensource.com/xenwiki/XenPVOPSDRM for details.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2010-Sep-30 05:00 UTC

head link

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Maybe you can disable pirq_set_affinity to have a try with the following patch.
It may trigger IRQ migration in hypervisor, and the IRQ migration logic
about(especailly shared)level-triggered ioapic IRQ is not well tested because of
no users before.  After intoducing the pirq_set_affinity in #Cset21625, the
logic is used frequently when vcpu migration occurs, so I doubt it maybe expose
the issue you met.  Besides, there is a bug in event driver which is fixed in
latest pv_ops dom0, seems the dom0 you are using doesn''t include the
fix.  This bug may result in lost event in dom0 and invoke dom0 hang eventually.
To workaround this bug,  you can disable irqbalance in dom0. Good luck!
Xiantao 

diff -r fc29e13f669d xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c        Mon Aug 09 16:36:07 2010 +0100
+++ b/xen/arch/x86/irq.c        Thu Sep 30 20:33:11 2010 +0800
@@ -516,6 +516,7 @@ void irq_set_affinity(struct irq_desc *d

 void pirq_set_affinity(struct domain *d, int pirq, const cpumask_t *mask)
 {
+#if 0
     unsigned long flags;
     struct irq_desc *desc = domain_spin_lock_irq_desc(d, pirq, &flags);

@@ -523,6 +524,7 @@ void pirq_set_affinity(struct domain *d,
         return;
     irq_set_affinity(desc, mask);
     spin_unlock_irqrestore(&desc->lock, flags);
+#endif
 }

 DEFINE_PER_CPU(unsigned int, irq_count);


Andreas Kinzler wrote:> On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
>>>   I am talking a while (via email) with Jan now to track the
>>> following problem and he suggested that I report the problem on
>>> xen-devel: 
>>> 
>>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset request.
>>> SCSI hang ? Jul  9 01:49:05 virt kernel: aacraid: SCSI bus appears
>>> hung 
>>> Jul  9 01:49:10 virt kernel: Calling adapter init
>>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not
>>> guaranteed on shared IRQs Jul  9 01:49:49 virt kernel: Acquiring
>>> adapter information 
>>> Jul  9 01:49:49 virt kernel: update_interval=30:00
>>> check_interval=86400s Jul  9 01:53:13 virt kernel: aacraid:
>>> aac_fib_send: first asynchronous command timed out. Jul  9 01:53:13
>>> virt kernel: Usually a result of a PCI interrupt routing problem;
>>> Jul  9 01:53:13 virt kernel: update mother board BIOS or consider
>>> utilizing one of Jul  9 01:53:13 virt kernel: the SAFE mode kernel
>>> options (acpi, apic etc)  
>>> 
>>> After the VMs have been running a while the aacraid driver reports
a
>>> non-responding RAID controller. Most of the time the NIC is also no
>>> longer working. I nearly tried every combination of dom0 kernel
>>> (pvops0, xenfied suse 
>>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen
>>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
>>> No success in two month. Every combination earlier or later had the
>>> problem shown above. I did extensive tests to make sure that the
>>> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>>> 
>>> Jan suggested to try the fix in c/s 22051 but it did not help. My
>>> answer to him: 
>>> 
>>>> In the meantime I did try xen-unstable c/s 22068 (contains
staging
>>>> c/s 22051) and it did not fix the problem at all. I was able to
>>>> fix a problem with the serial console and so I got some debug
info
>>>> that is attached to this email. The following line looks
>>>> suspicious to me (irr=1, delivery_status=1): 
>>> 
>>>> (XEN)     IRQ 16 Vec216:
>>>> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
>>>>              dest_mode=logical, delivery_status=1, polarity=1,
>>>> irr=1, trigger=level, mask=0, dest_id:1 
>>> 
>>>> IRQ 16 is the aacraid controller which after some while seems
to
>>>> be enable to receive interrupts. Can you see from the debug
info
>>>> what is going on? 
>>> 
>>> I also applied a small patch which disables HPET broadcast. The
>>> machine is now running for 110 hours without a crash while normally
>>> it crashes within a few minutes. Is there something wrong (race,
>>> deadlock) with HPET broadcasts in relation to blocked interrupt
>>> reception (see above)? 
>> What kind of hardware does this happen on?
> 
> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
> 
>> Should this patch be merged?
> 
> Not easy to answer. I spend more than 10 weeks searching nearly full
> time for the reason of the stability issues. Finally I was able to
> track 
> it down to the HPET broadcast code.
> 
> We need to find the developer of the HPET broadcast code. Then, he
> should try to fix the code. I consider it a quite severe bug as it
> renders Xen nearly useless on affected systems. That is why I (and my
> boss who pays me) spend so much time (developing/fixing Xen is not
> really my core job) and money (buying a E5620 machine just for
> testing Xen). 
> 
> I think many people on affected systems are having problems. See
> http://lists.xensource.com/archives/html/xen-users/2010-09/msg00370.html
> 
> Regards Andreas
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wei, Gang

2010-Sep-30 06:02 UTC

head link

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

I am the original developer of HPET broadcast code. 

First of all, to disable HPET broadcast, no additional patch is required. Please
simply add option "cpuidle=off" or "max_cstate=1" at xen
cmdline in /boot/grub/grub.conf.

Second, I noticed that the issue just occur on pre-nehalem server processors. I
will check whether I can reproduce it.

Meanwhile, I am looking forward to see whether Jeremy & Xiantao''s
suggestions have effects. So Andreas, could you help to have a try on their
suggestions?

Jimmy

On , xen-devel-bounces@lists.xensource.com wrote:> Maybe you can disable pirq_set_affinity to have a try with the
> following patch. It may trigger IRQ migration in hypervisor,
> and the IRQ migration logic about(especailly
> shared)level-triggered ioapic IRQ is not well tested because
> of no users before.  After intoducing the pirq_set_affinity in
> #Cset21625, the logic is used frequently when vcpu migration
> occurs, so I doubt it maybe expose the issue you met.
> Besides, there is a bug in event driver which is fixed in
> latest pv_ops dom0, seems the dom0 you are using doesn''t
> include the fix.  This bug may result in lost event in dom0
> and invoke dom0 hang eventually. To workaround this bug,  you
> can disable irqbalance in dom0. Good luck!
> Xiantao
> 
> diff -r fc29e13f669d xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c        Mon Aug 09 16:36:07 2010 +0100
> +++ b/xen/arch/x86/irq.c        Thu Sep 30 20:33:11 2010 +0800
> @@ -516,6 +516,7 @@ void irq_set_affinity(struct irq_desc *d
> 
> void pirq_set_affinity(struct domain *d, int pirq, const cpumask_t
> *mask) {
> +#if 0
>     unsigned long flags;
>     struct irq_desc *desc = domain_spin_lock_irq_desc(d, pirq,
> &flags); 
> 
> @@ -523,6 +524,7 @@ void pirq_set_affinity(struct domain *d,        
>     return; irq_set_affinity(desc, mask);
>     spin_unlock_irqrestore(&desc->lock, flags);
> +#endif
> }
> 
> DEFINE_PER_CPU(unsigned int, irq_count);
> 
> 
> Andreas Kinzler wrote:
>> On 21.09.2010 13:56, Pasi Kärkkäinen wrote:
>>>>   I am talking a while (via email) with Jan now to track the
>>>> following problem and he suggested that I report the problem on
>>>> xen-devel: 
>>>> 
>>>> Jul  9 01:48:04 virt kernel: aacraid: Host adapter reset
request.
>>>> SCSI hang ? Jul  9 01:49:05 virt kernel: aacraid: SCSI bus
appears
>>>> hung Jul  9 01:49:10 virt kernel: Calling adapter init
>>>> Jul  9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is
not
>>>> guaranteed on shared IRQs Jul  9 01:49:49 virt kernel:
Acquiring
>>>> adapter information Jul  9 01:49:49 virt kernel:
>>>> update_interval=30:00 check_interval=86400s Jul  9 01:53:13
virt
>>>> kernel: aacraid: aac_fib_send: first asynchronous command timed
>>>> out. Jul  9 01:53:13 virt kernel: Usually a result of a PCI
>>>> interrupt routing problem; Jul  9 01:53:13 virt kernel: update
>>>> mother board BIOS or consider utilizing one of Jul  9 01:53:13
>>>> virt kernel: the SAFE mode kernel options (acpi, apic etc) 
>>>> 
>>>> After the VMs have been running a while the aacraid driver
reports
>>>> a non-responding RAID controller. Most of the time the NIC is
also
>>>> no longer working. I nearly tried every combination of dom0
kernel
>>>> (pvops0, xenfied suse 
>>>> 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with
Xen
>>>> hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable.
>>>> No success in two month. Every combination earlier or later had
the
>>>> problem shown above. I did extensive tests to make sure that
the
>>>> hardware is OK. And it is - I am sure it is a Xen/dom0 problem.
>>>> 
>>>> Jan suggested to try the fix in c/s 22051 but it did not help.
My
>>>> answer to him: 
>>>> 
>>>>> In the meantime I did try xen-unstable c/s 22068 (contains
staging
>>>>> c/s 22051) and it did not fix the problem at all. I was
able to
>>>>> fix a problem with the serial console and so I got some
debug info
>>>>> that is attached to this email. The following line looks
>>>>> suspicious to me (irr=1, delivery_status=1):
>>>> 
>>>>> (XEN)     IRQ 16 Vec216:
>>>>> (XEN)       Apic 0x00, Pin 16: vector=216, delivery_mode=1,
>>>>>              dest_mode=logical, delivery_status=1,
polarity=1,
>>>>> irr=1, trigger=level, mask=0, dest_id:1
>>>> 
>>>>> IRQ 16 is the aacraid controller which after some while
seems to
>>>>> be enable to receive interrupts. Can you see from the debug
info
>>>>> what is going on?
>>>> 
>>>> I also applied a small patch which disables HPET broadcast. The
>>>> machine is now running for 110 hours without a crash while
normally
>>>> it crashes within a few minutes. Is there something wrong
(race,
>>>> deadlock) with HPET broadcasts in relation to blocked interrupt
>>>> reception (see above)?
>>> What kind of hardware does this happen on?
>> 
>> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
>> 
>>> Should this patch be merged?
>> 
>> Not easy to answer. I spend more than 10 weeks searching nearly full
>> time for the reason of the stability issues. Finally I was able to
>> track it down to the HPET broadcast code.
>> 
>> We need to find the developer of the HPET broadcast code. Then, he
>> should try to fix the code. I consider it a quite severe bug as it
>> renders Xen nearly useless on affected systems. That is why I (and my
>> boss who pays me) spend so much time (developing/fixing Xen is not
>> really my core job) and money (buying a E5620 machine just for
>> testing Xen). 
>> 
>> I think many people on affected systems are having problems. See
>> 
> http://lists.xensource.com/archives/html/xen-users/2010-09/msg0
> 0370.html
>> 
>> Regards Andreas
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-30 09:42 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 30.09.2010 07:00, Zhang, Xiantao wrote:> Maybe you can disable pirq_set_affinity to have a try with the following
patch. > It may trigger IRQ migration in hypervisor, and the IRQ migration 
logic about(especailly shared)
 > level-triggered ioapic IRQ is not well tested because of
 > no users before. After intoducing the pirq_set_affinity in
 > #Cset21625, the logic is used frequently when vcpu migration occurs

I am using Xen 4.0.1 which is c/s 21324 so I should not be affected?

 > Besides, there is a bug in event driver which is fixed in latest
 > pv_ops dom0, seems the dom0 you are using doesn''t include the
fix.
 > This bug may result in lost event in dom0 and invoke dom0
 > hang eventually.

Hmm, this really does not explain why everything is rock solid after 
disabling HPET broadcast? And the problem occured with every kernel 
(xenfied, pvops, all versions). Please correct me if I am wrong.

 > To workaround this bug,  you can disable irqbalance in dom0. Good luck!

As far as I know I am not using irq balancing (certainly not using the 
irqbalance daemon).

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-30 10:16 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 29.09.2010 21:50, Jeremy Fitzhardinge wrote:>> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
> The big problem I had initially was instability with the integrated
> ethernet until I disabled PCIe ASPM.  The symptom was that the ethernet
> devices would disappear (ie, their PCI config space would start to read
> all 0xff...)
I know that this is a known problem of Intel 82574L chips (on X8SIL) - 
it is discussed on "Intel Wired Ethernet" 
(http://sourceforge.net/projects/e1000/). That is why I tested different 
NICs (Intel ET Server Adapter (82576 [igb]) and Realtek 8168) and the 
problem remained. So I can say with certainty that the NIC and/or its 
power management is not the problem.

I also spend extensive time changing hardware components. I used a 
different mainboard (ASUS P7F-M), a different power supply, changed CPU, 
changed NICs (see above) - problems remained.

 > That''s exactly what my main test/devel machine is.  It has been
very
 > stable for me with xen-unstable.

We have a second Supermicro X8SIL-F, Intel Xeon 3450 system which only 
runs Linux PVM domains and it is totally stable (without my HPET patch). 
So I think as with all timing/race/deadlock/... issues it depends on 
what you do on your system. Let me give you my crash "recipe" [quite 
reliable ;-)]

Have two HVMs (called win1, win2) with Windows 7 x64 installed (do 
install everything twice, never clone, VM config attached). Install 
GPLPV 0.11.0.213, iometer 2006.07.27, prime95 25.11 x64. On both 
systems: start prime95 torture test (in-place large FFT) and using 
Windows task manager set CPU affinity on win1 of process prime95 to use 
only CPU1. On win2 do the same thing but to use only CPU0. Then start 
iometer on both VMs using the following parameters: have a second 
virtual disk in both VMs (so every windows has 2 virtual disks, one for 
Windows and one for iometer), use "# of outstanding I/Os" = 4, access 
spec = "All in one". Wait some minutes. Crash!

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Sep-30 17:12 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 09/30/2010 03:16 AM, Andreas Kinzler wrote:> On 29.09.2010 21:50, Jeremy Fitzhardinge wrote:
>>> It is a Supermicro X8SIL-F, Intel Xeon 3450 system.
>> The big problem I had initially was instability with the integrated
>> ethernet until I disabled PCIe ASPM.  The symptom was that the ethernet
>> devices would disappear (ie, their PCI config space would start to read
>> all 0xff...)
>
> I know that this is a known problem of Intel 82574L chips (on X8SIL) -
> it is discussed on "Intel Wired Ethernet"
> (http://sourceforge.net/projects/e1000/).
Aha, specifically
http://sourceforge.net/tracker/index.php?func=detail&aid=2908463&group_id=42302&atid=447449,
in which several people invoke me, but nobody bothered to tell me that
this bug existed on sf :/
> That is why I tested different NICs (Intel ET Server Adapter (82576
> [igb]) and Realtek 8168) and the problem remained. So I can say with
> certainty that the NIC and/or its power management is not the problem.
OK.
>
> I also spend extensive time changing hardware components. I used a
> different mainboard (ASUS P7F-M), a different power supply, changed
> CPU, changed NICs (see above) - problems remained.
>
> > That''s exactly what my main test/devel machine is.  It has
been very
> > stable for me with xen-unstable.
>
> We have a second Supermicro X8SIL-F, Intel Xeon 3450 system which only
> runs Linux PVM domains and it is totally stable (without my HPET
> patch). So I think as with all timing/race/deadlock/... issues it
> depends on what you do on your system. Let me give you my crash
> "recipe" [quite reliable ;-)]
OK.  My machine is mostly running PV domains, with some low-intensity
hvm ones.
>
> Have two HVMs (called win1, win2) with Windows 7 x64 installed (do
> install everything twice, never clone, VM config attached). Install
> GPLPV 0.11.0.213, iometer 2006.07.27, prime95 25.11 x64. On both
> systems: start prime95 torture test (in-place large FFT) and using
> Windows task manager set CPU affinity on win1 of process prime95 to
> use only CPU1. On win2 do the same thing but to use only CPU0. Then
> start iometer on both VMs using the following parameters: have a
> second virtual disk in both VMs (so every windows has 2 virtual disks,
> one for Windows and one for iometer), use "# of outstanding I/Os"
= 4,
> access spec = "All in one". Wait some minutes. Crash!
Yes, that''s a very different workload from mine.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2010-Oct-01 04:14 UTC

head link

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Andreas Kinzler wrote:> On 30.09.2010 07:00, Zhang, Xiantao wrote:
>> Maybe you can disable pirq_set_affinity to have a try with the
>> following patch. 
>  > It may trigger IRQ migration in hypervisor, and the IRQ migration
> logic about(especailly shared)
>  > level-triggered ioapic IRQ is not well tested because of
>  > no users before. After intoducing the pirq_set_affinity in
>  > #Cset21625, the logic is used frequently when vcpu migration occurs
> 
> I am using Xen 4.0.1 which is c/s 21324 so I should not be affected?
Which Cset is adopted when you collected the suspecious
''irr=1'' log ? Xen-4.0.1 or 22068 ?
In addition, did you always see the above strange log for every hang?  You know,
IRQ 16 is assigned with a relatively big vector 216, if it is not correctly
acked, the other interrupt source will be masked automatically, so the dom0
maybe go hang.  Another try is to hack assign_irq_vector to allocate a small
vector for IRQ16, and when the aacraid controller has something wrong, you
stilll have a chance to logon to dom0 to get more information.  Besides, could
you enable MSI for accraid controller to have a try ?
>  > Besides, there is a bug in event driver which is fixed in latest
>  > pv_ops dom0, seems the dom0 you are using doesn''t include
the fix.
>  > This bug may result in lost event in dom0 and invoke dom0
>  > hang eventually.
> 
> Hmm, this really does not explain why everything is rock solid after
> disabling HPET broadcast? And the problem occured with every kernel
> (xenfied, pvops, all versions). Please correct me if I am wrong
Just guess hpet broadcase maybe not the real killer, and it just exposes the bug
accidentally according to the log you attached.
>  > To workaround this bug,  you can disable irqbalance in dom0. Good
> luck! 
> 
> As far as I know I am not using irq balancing (certainly not using the
> irqbalance daemon).Okay. 
Xiantao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2010-Dec-31 14:31 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On Thu, Sep 09, 2010 at 11:20:51AM +0200, Andreas Kinzler
wrote:>  I am talking a while (via email) with Jan now to track the following  
> problem and he suggested that I report the problem on xen-devel:
>
...
>
> I also applied a small patch which disables HPET broadcast. The machine  
> is now running
> for 110 hours without a crash while normally it crashes within a few  
> minutes. Is there
> something wrong (race, deadlock) with HPET broadcasts in relation to  
> blocked interrupt
> reception (see above)?
>
Hello,

Was this issue resolved? Just wondering since many people
have reported it on xen-users list..

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2011-Jan-09 19:10 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 31.12.2010 15:31, Pasi Kärkkäinen wrote:> On Thu, Sep 09, 2010 at 11:20:51AM +0200, Andreas Kinzler wrote:
>>   I am talking a while (via email) with Jan now to track the following
>> problem and he suggested that I report the problem on xen-devel:
>> I also applied a small patch which disables HPET broadcast. The machine
>> is now running
>> for 110 hours without a crash while normally it crashes within a few
>> minutes. Is there
>> something wrong (race, deadlock) with HPET broadcasts in relation to
>> blocked interrupt
>> reception (see above)?
> Hello,
> Was this issue resolved? Just wondering since many people
> have reported it on xen-users list..
> -- Pasi
As to my knowledge: not at all. Somehow none of the developers took care 
of it.

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2011-Jan-09 19:21 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On Sun, Jan 09, 2011 at 08:10:52PM +0100, Andreas Kinzler
wrote:> On 31.12.2010 15:31, Pasi Kärkkäinen wrote:
>> On Thu, Sep 09, 2010 at 11:20:51AM +0200, Andreas Kinzler wrote:
>>>   I am talking a while (via email) with Jan now to track the
following
>>> problem and he suggested that I report the problem on xen-devel:
>>> I also applied a small patch which disables HPET broadcast. The
machine
>>> is now running
>>> for 110 hours without a crash while normally it crashes within a
few
>>> minutes. Is there
>>> something wrong (race, deadlock) with HPET broadcasts in relation
to
>>> blocked interrupt
>>> reception (see above)?
>
>> Hello,
>> Was this issue resolved? Just wondering since many people
>> have reported it on xen-users list..
>> -- Pasi
>
> As to my knowledge: not at all. Somehow none of the developers took care  
> of it.
>
So you can still reproduce it with latest xen-4.0-testing.hg (Xen 4.0.2-rc1-pre)
and latest xen/stable-2.6.32.x pvops dom0 kernel (2.6.32.27) ? 

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Jan-09 20:04 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 09/01/2011 19:21, "Pasi Kärkkäinen" <pasik@iki.fi> wrote:
>>> Was this issue resolved? Just wondering since many people
>>> have reported it on xen-users list..
>>> -- Pasi
>> 
>> As to my knowledge: not at all. Somehow none of the developers took
care
>> of it.
>> 
> 
> So you can still reproduce it with latest xen-4.0-testing.hg (Xen
> 4.0.2-rc1-pre)
> and latest xen/stable-2.6.32.x pvops dom0 kernel (2.6.32.27) ?
Interested in latest xen-unstable.hg too. With both trees frozen for 4.0/4.1
releases very soon, we should disable HPET broadcast if the bug is still
reproducible and no ''proper'' fix is forthcoming from the
original authors.

 -- Keir
> -- Pasi
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2011-Jan-19 10:19 UTC

head link

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

On 09.01.2011 21:04, Keir Fraser wrote:>>>> Was this issue resolved? Just wondering since many people
>>>> have reported it on xen-users list..
>>> As to my knowledge: not at all. Somehow none of the developers took
care
>>> of it.
>> So you can still reproduce it with latest xen-4.0-testing.hg (Xen
>> 4.0.2-rc1-pre)
>> and latest xen/stable-2.6.32.x pvops dom0 kernel (2.6.32.27) ?
> Interested in latest xen-unstable.hg too. With both trees frozen for
4.0/4.1
> releases very soon, we should disable HPET broadcast if the bug is still
> reproducible and no ''proper'' fix is forthcoming from the
original authors.
>   -- Keir
I spend some hours testing but for some reason I was unable to reproduce 
the crash even with the old configuration and my own crash recipe 
(http://lists.xensource.com/archives/html/xen-devel/2010-09/msg01755.html). 
Quite odd.

However, pvops0 2.6.32.18 is still the latest version working at all on 
my systems [see large thread "Xen dom0 crash: "d0:v0: unhandled page 
fault (ec=0000)"].

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Sep 2010 - Instability with Xen, interrupt routing frozen, HPET broadcast

[Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

RE: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast

Re: [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast