thr3ads.net - Xen devel - [Xen-devel] Network dies and kernel errors [Jul 2011]

If this information is useful, please help other people find it:
Share via:

John McMonagle

2011-Jul-25 19:18 UTC

[Xen-devel] Network dies and kernel errors

Have a new amd 6100 based server.
http://www.supermicro.com/Aplus/system/2U/2022/AS-2022G-URF.cfm
Running debian squeeze with debian 2.6.32 xen kernel
Running xen 4.1.1 built from source from xen.org

I''m seeing 2 errors.
during boot get this:

[    0.004823] ------------[ cut here ]------------
[    0.004833] WARNING: 
at
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_xen/arch/x86/xen/enlighten.c:726
init_hw_perf_events+0x32d/0x3cd()
[    0.004838] Hardware name: H8DGU
[    0.004841] Modules linked in:
[    0.004847] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1
[    0.004850] Call Trace:
[    0.004857]  [<ffffffff81510efc>] ? init_hw_perf_events+0x32d/0x3cd
[    0.004862]  [<ffffffff81510efc>] ? init_hw_perf_events+0x32d/0x3cd
[    0.004870]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
[    0.004875]  [<ffffffff81510efc>] ? init_hw_perf_events+0x32d/0x3cd
[    0.004881]  [<ffffffff813044dc>] ? identify_cpu+0x2f7/0x300
[    0.004888]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
[    0.004895]  [<ffffffff810e81d5>] ? kmem_cache_alloc+0x8c/0xf0
[    0.004900]  [<ffffffff81510a16>] ? identify_boot_cpu+0x15/0x3e
[    0.004904]  [<ffffffff81510baa>] ? check_bugs+0x9/0x2e
[    0.004910]  [<ffffffff81509cce>] ? start_kernel+0x3cd/0x3e8
[    0.004915]  [<ffffffff8150bc93>] ? xen_start_kernel+0x586/0x58a
[    0.004926] ---[ end trace a7919e7f17c0a725 ]---
[    0.004930] ... version:                0
[    0.004932] ... bit width:              48
[    0.004935] ... generic registers:      4
[    0.004938] ... value mask:             0000ffffffffffff
[    0.004940] ... max period:             00007fffffffffff
[    0.004943] ... fixed-purpose events:   0
[    0.004946] ... event mask:             000000000000000f
 
Concerning the enlighten.c error. While it does not seem to cause problems it 
concerns me  power management may be it''s the real cause  network
problems
that are getting my attention.

I got the debian kernel source and line 726 is the line with the *****

static void xen_clts(void)
{
        struct multicall_space mcs;  *****

        mcs = xen_mc_entry(0);

        MULTI_fpu_taskswitch(mcs.mc, 0);

        xen_mc_issue(PARAVIRT_LAZY_CPU);
}

Means nothing to me :-(

What is it?
What can be done?

Then next one may not be xen but I only had the problem after running a domu.
After a while I get kernel error and networking stops.
This is the error:
[ 1411.813376] ------------[ cut here ]------------
[ 1411.813398] WARNING: 
at
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_xen/net/sched/s
ch_generic.c:261 dev_watchdog+0xe2/0x194()
[ 1411.813410] Hardware name: H8DGU
[ 1411.813417] NETDEV WATCHDOG: peth0 (igb): transmit queue 1 timed out
[ 1411.813424] Modules linked in: xt_physdev iptable_filter tun ip_tables 
x_tables bridge stp sg sr_mod cdrom xfs exportfs ipmi_si i
pmi_devintf ipmi_watchdog ipmi_msghandler xen_evtchn blktap xenfs loop snd_pcm 
snd_timer snd soundcore snd_page_alloc pcspkr psmouse joydev evdev serio_raw 
i2c_piix
4 edac_core k10temp edac_mce_amd i2c_core processor button acpi_processor ext4 
mbcache jbd2 crc16 usbhid hid dm_mod raid1 md_mod sd_mod crc_t10dif 
ata_generic usb_s
torage pata_atiixp ahci ohci_hcd libata ehci_hcd usbcore nls_base scsi_mod igb 
dca thermal thermal_sys [last unloaded: scsi_wait_scan]
 [ 1411.813656] Pid: 4, comm: ksoftirqd/0 Tainted: G        W  
2.6.32-5-xen-amd64 #1
[ 1411.813664] Call Trace:
[ 1411.813671]  <IRQ>  [<ffffffff81272e42>] ?
dev_watchdog+0xe2/0x194
[ 1411.813697]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
[ 1411.813711]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
[ 1411.813724]  [<ffffffff81272d60>] ? dev_watchdog+0x0/0x194
[ 1411.813736]  [<ffffffff8104ef88>] ? warn_slowpath_fmt+0x51/0x59
[ 1411.813751]  [<ffffffff8130d42a>] ? _spin_unlock_irqrestore+0xd/0xe
[ 1411.813762]  [<ffffffff8104b41e>] ? try_to_wake_up+0x289/0x29b
[ 1411.813778]  [<ffffffff81272d34>] ? netif_tx_lock+0x3d/0x69
[ 1411.813791]  [<ffffffff8125d7da>] ? netdev_drivername+0x3b/0x40
[ 1411.813803]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
[ 1411.813816]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
[ 1411.813827]  [<ffffffff81040e42>] ? check_preempt_wakeup+0x0/0x268
[ 1411.813841]  [<ffffffff8105b5ef>] ? run_timer_softirq+0x1c9/0x268
[ 1411.813855]  [<ffffffff81054c9b>] ? __do_softirq+0xdd/0x1a6
[ 1411.813867]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
[ 1411.813873]  <EOI>  [<ffffffff8101422b>] ? do_softirq+0x3f/0x7c
[ 1411.813893]  [<ffffffff810548c2>] ? ksoftirqd+0x5f/0xd3
[ 1411.813905]  [<ffffffff81054863>] ? ksoftirqd+0x0/0xd3
[ 1411.813915]  [<ffffffff81065c39>] ? kthread+0x79/0x81
[ 1411.813926]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[ 1411.813937]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[ 1411.813948]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[ 1411.813958]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
[ 1411.813966] ---[ end trace a7919e7f17c0a727 ]---
[ 1412.052253] eth0: port 1(peth0) entering disabled state
[ 1635.796207] frontend_changed: backend/vbd/3/768: prepare for reconnect
[ 1647.137513] eth0: port 3(vif3.0) entering disabled state
[ 1647.157527] eth0: port 3(vif3.0) entering disabled state
 Kernel logging (proc) stopped.

In this case dom0 locked up. Some times just networking stops and some times 
networking recovers.

Looks like it uses msi-x interrupts.

Concerning igb error I have tried the following  one at a time:
New igb driver from Intel site.
kernel parameter  pcie_aspm=off
ethtool -K eth0 tx off  on dom0
ethtool -K eth0 gro off  on dom0 

It has never died doing iperf from dom0 or domu  <> external.
Never died during network backup.

Usually takes a least a few hours and has never made it a day running a domu.
Wish I could get it to die faster :-)
Any ideas?
I''m pretty much down to trying different network cards

Any ideas?

John



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jul-29 15:03 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Mon, Jul 25, 2011 at 02:18:21PM -0500, John McMonagle
wrote:> Have a new amd 6100 based server.
> http://www.supermicro.com/Aplus/system/2U/2022/AS-2022G-URF.cfm
> Running debian squeeze with debian 2.6.32 xen kernel
> Running xen 4.1.1 built from source from xen.org
> 
> I''m seeing 2 errors.
> during boot get this:
> 
> [    0.004823] ------------[ cut here ]------------
> [    0.004833] WARNING: 
> at
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_xen/arch/x86/xen/enlighten.c:726
> init_hw_perf_events+0x32d/0x3cd()
> [    0.004838] Hardware name: H8DGU
> [    0.004841] Modules linked in:
> [    0.004847] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1
> [    0.004850] Call Trace:
> [    0.004857]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> [    0.004862]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> [    0.004870]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> [    0.004875]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> [    0.004881]  [<ffffffff813044dc>] ? identify_cpu+0x2f7/0x300
> [    0.004888]  [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
> [    0.004895]  [<ffffffff810e81d5>] ? kmem_cache_alloc+0x8c/0xf0
> [    0.004900]  [<ffffffff81510a16>] ? identify_boot_cpu+0x15/0x3e
> [    0.004904]  [<ffffffff81510baa>] ? check_bugs+0x9/0x2e
> [    0.004910]  [<ffffffff81509cce>] ? start_kernel+0x3cd/0x3e8
> [    0.004915]  [<ffffffff8150bc93>] ? xen_start_kernel+0x586/0x58a
You can ignore that one. It just means that you can''t do profiling
which we haven''t
yet up-ported.

..> Then next one may not be xen but I only had the problem after running a
domu.
> After a while I get kernel error and networking stops.
And some other user with a bnx2 driver seems to see a similar problem. Let me CC
them here.
> This is the error:
> [ 1411.813376] ------------[ cut here ]------------
> [ 1411.813398] WARNING: 
> at
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/build/source_amd64_xen/net/sched/s
> ch_generic.c:261 dev_watchdog+0xe2/0x194()
OK, this is one is more worrysome.
> [ 1411.813410] Hardware name: H8DGU
> [ 1411.813417] NETDEV WATCHDOG: peth0 (igb): transmit queue 1 timed out
> [ 1411.813424] Modules linked in: xt_physdev iptable_filter tun ip_tables 
> x_tables bridge stp sg sr_mod cdrom xfs exportfs ipmi_si i
> pmi_devintf ipmi_watchdog ipmi_msghandler xen_evtchn blktap xenfs loop
snd_pcm
> snd_timer snd soundcore snd_page_alloc pcspkr psmouse joydev evdev
serio_raw
> i2c_piix
> 4 edac_core k10temp edac_mce_amd i2c_core processor button acpi_processor
ext4
> mbcache jbd2 crc16 usbhid hid dm_mod raid1 md_mod sd_mod crc_t10dif 
> ata_generic usb_s
> torage pata_atiixp ahci ohci_hcd libata ehci_hcd usbcore nls_base scsi_mod
igb
> dca thermal thermal_sys [last unloaded: scsi_wait_scan]
>  [ 1411.813656] Pid: 4, comm: ksoftirqd/0 Tainted: G        W  
> 2.6.32-5-xen-amd64 #1
> [ 1411.813664] Call Trace:
> [ 1411.813671]  <IRQ>  [<ffffffff81272e42>] ?
dev_watchdog+0xe2/0x194
> [ 1411.813697]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
> [ 1411.813711]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> [ 1411.813724]  [<ffffffff81272d60>] ? dev_watchdog+0x0/0x194
> [ 1411.813736]  [<ffffffff8104ef88>] ? warn_slowpath_fmt+0x51/0x59
> [ 1411.813751]  [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
> [ 1411.813762]  [<ffffffff8104b41e>] ? try_to_wake_up+0x289/0x29b
> [ 1411.813778]  [<ffffffff81272d34>] ? netif_tx_lock+0x3d/0x69
> [ 1411.813791]  [<ffffffff8125d7da>] ? netdev_drivername+0x3b/0x40
> [ 1411.813803]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
> [ 1411.813816]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> [ 1411.813827]  [<ffffffff81040e42>] ? check_preempt_wakeup+0x0/0x268
> [ 1411.813841]  [<ffffffff8105b5ef>] ? run_timer_softirq+0x1c9/0x268
> [ 1411.813855]  [<ffffffff81054c9b>] ? __do_softirq+0xdd/0x1a6
> [ 1411.813867]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
> [ 1411.813873]  <EOI>  [<ffffffff8101422b>] ?
do_softirq+0x3f/0x7c
> [ 1411.813893]  [<ffffffff810548c2>] ? ksoftirqd+0x5f/0xd3
> [ 1411.813905]  [<ffffffff81054863>] ? ksoftirqd+0x0/0xd3
> [ 1411.813915]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> [ 1411.813926]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> [ 1411.813937]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
> [ 1411.813948]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> [ 1411.813958]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> [ 1411.813966] ---[ end trace a7919e7f17c0a727 ]---
> [ 1412.052253] eth0: port 1(peth0) entering disabled state
> [ 1635.796207] frontend_changed: backend/vbd/3/768: prepare for reconnect
> [ 1647.137513] eth0: port 3(vif3.0) entering disabled state
> [ 1647.157527] eth0: port 3(vif3.0) entering disabled state
>  Kernel logging (proc) stopped.
> 
> In this case dom0 locked up. Some times just networking stops and some
times
> networking recovers.
> 
> Looks like it uses msi-x interrupts.
> 
> Concerning igb error I have tried the following  one at a time:
> New igb driver from Intel site.
> kernel parameter  pcie_aspm=off
> ethtool -K eth0 tx off  on dom0
> ethtool -K eth0 gro off  on dom0 
> 
OK.> It has never died doing iperf from dom0 or domu  <> external.
> Never died during network backup.
> 
> Usually takes a least a few hours and has never made it a day running a
domu.
> Wish I could get it to die faster :-)
> Any ideas?
> I''m pretty much down to trying different network cards
Did you try that? Did that make any difference?> 
> Any ideas?
There is a Xen parameter called ''noirqbalance'' . Try that.
Also see if you can
limit the CPUs in the dom0 using these two arguments on Xen hypervisor:

dom0_vcpus=2 dom0_vcpus_pin=1


It would be interesting to narrow down _when_ you trigger this failure. B/c we
can pull Xen to see what the MSI''s are ''xl debug-keys
M'' _before_ and _after_ your
failure to see if something is amiss.

Mainly to figure out if the vectors are moving around the CPUs (or not)

(XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest dest=00000001
mask=0/0/-1

and also ''xl debug-keys i'' to see if the domain has ACK-ed the
interrupt:
(XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:21
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:275(----),

(the last ''----'' might have something else in in them - if so
that is a sign that
dom0 hasn''t picked up the event/vector).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McMonagle

2011-Jul-29 15:38 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Friday, July 29, 2011 10:03:48 am Konrad Rzeszutek Wilk
wrote:> On Mon, Jul 25, 2011 at 02:18:21PM -0500, John McMonagle wrote:
> > Have a new amd 6100 based server.
> > http://www.supermicro.com/Aplus/system/2U/2022/AS-2022G-URF.cfm
> > Running debian squeeze with debian 2.6.32 xen kernel
> > Running xen 4.1.1 built from source from xen.org
> > 
> > I''m seeing 2 errors.
> > during boot get this:
> > 
> > [    0.004823] ------------[ cut here ]------------
> > [    0.004833] WARNING:
> > at
> >
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/b
> > uild/source_amd64_xen/arch/x86/xen/enlighten.c:726
> > init_hw_perf_events+0x32d/0x3cd()
> > [    0.004838] Hardware name: H8DGU
> > [    0.004841] Modules linked in:
> > [    0.004847] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1
> > [    0.004850] Call Trace:
> > [    0.004857]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> > [    0.004862]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> > [    0.004870]  [<ffffffff8104ef00>] ?
warn_slowpath_common+0x77/0xa3
> > [    0.004875]  [<ffffffff81510efc>] ?
init_hw_perf_events+0x32d/0x3cd
> > [    0.004881]  [<ffffffff813044dc>] ? identify_cpu+0x2f7/0x300
> > [    0.004888]  [<ffffffff8100eccf>] ?
xen_restore_fl_direct_end+0x0/0x1
> > [    0.004895]  [<ffffffff810e81d5>] ?
kmem_cache_alloc+0x8c/0xf0
> > [    0.004900]  [<ffffffff81510a16>] ?
identify_boot_cpu+0x15/0x3e
> > [    0.004904]  [<ffffffff81510baa>] ? check_bugs+0x9/0x2e
> > [    0.004910]  [<ffffffff81509cce>] ? start_kernel+0x3cd/0x3e8
> > [    0.004915]  [<ffffffff8150bc93>] ?
xen_start_kernel+0x586/0x58a
> 
> You can ignore that one. It just means that you can''t do profiling
which we
> haven''t yet up-ported.
> 
> ..
> 
> > Then next one may not be xen but I only had the problem after running
a
> > domu. After a while I get kernel error and networking stops.
> 
> And some other user with a bnx2 driver seems to see a similar problem. Let
> me CC them here.
> 
> > This is the error:
> > [ 1411.813376] ------------[ cut here ]------------
> > [ 1411.813398] WARNING:
> > at
> >
/build/buildd-linux-2.6_2.6.32-35-amd64-aZSlKL/linux-2.6-2.6.32/debian/b
> > uild/source_amd64_xen/net/sched/s ch_generic.c:261
> > dev_watchdog+0xe2/0x194()
> 
> OK, this is one is more worrysome.
> 
> > [ 1411.813410] Hardware name: H8DGU
> > [ 1411.813417] NETDEV WATCHDOG: peth0 (igb): transmit queue 1 timed
out
> > [ 1411.813424] Modules linked in: xt_physdev iptable_filter tun
ip_tables
> > x_tables bridge stp sg sr_mod cdrom xfs exportfs ipmi_si i
> > pmi_devintf ipmi_watchdog ipmi_msghandler xen_evtchn blktap xenfs loop
> > snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr psmouse joydev
> > evdev serio_raw i2c_piix
> > 4 edac_core k10temp edac_mce_amd i2c_core processor button
acpi_processor
> > ext4 mbcache jbd2 crc16 usbhid hid dm_mod raid1 md_mod sd_mod
crc_t10dif
> > ata_generic usb_s
> > torage pata_atiixp ahci ohci_hcd libata ehci_hcd usbcore nls_base
> > scsi_mod igb dca thermal thermal_sys [last unloaded: scsi_wait_scan]
> > 
> >  [ 1411.813656] Pid: 4, comm: ksoftirqd/0 Tainted: G        W
> > 
> > 2.6.32-5-xen-amd64 #1
> > [ 1411.813664] Call Trace:
> > [ 1411.813671]  <IRQ>  [<ffffffff81272e42>] ?
dev_watchdog+0xe2/0x194
> > [ 1411.813697]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
> > [ 1411.813711]  [<ffffffff8104ef00>] ?
warn_slowpath_common+0x77/0xa3
> > [ 1411.813724]  [<ffffffff81272d60>] ? dev_watchdog+0x0/0x194
> > [ 1411.813736]  [<ffffffff8104ef88>] ?
warn_slowpath_fmt+0x51/0x59
> > [ 1411.813751]  [<ffffffff8130d42a>] ?
_spin_unlock_irqrestore+0xd/0xe
> > [ 1411.813762]  [<ffffffff8104b41e>] ?
try_to_wake_up+0x289/0x29b
> > [ 1411.813778]  [<ffffffff81272d34>] ? netif_tx_lock+0x3d/0x69
> > [ 1411.813791]  [<ffffffff8125d7da>] ?
netdev_drivername+0x3b/0x40
> > [ 1411.813803]  [<ffffffff81272e42>] ? dev_watchdog+0xe2/0x194
> > [ 1411.813816]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> > [ 1411.813827]  [<ffffffff81040e42>] ?
check_preempt_wakeup+0x0/0x268
> > [ 1411.813841]  [<ffffffff8105b5ef>] ?
run_timer_softirq+0x1c9/0x268
> > [ 1411.813855]  [<ffffffff81054c9b>] ? __do_softirq+0xdd/0x1a6
> > [ 1411.813867]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
> > [ 1411.813873]  <EOI>  [<ffffffff8101422b>] ?
do_softirq+0x3f/0x7c
> > [ 1411.813893]  [<ffffffff810548c2>] ? ksoftirqd+0x5f/0xd3
> > [ 1411.813905]  [<ffffffff81054863>] ? ksoftirqd+0x0/0xd3
> > [ 1411.813915]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> > [ 1411.813926]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> > [ 1411.813937]  [<ffffffff81011d61>] ?
int_ret_from_sys_call+0x7/0x1b
> > [ 1411.813948]  [<ffffffff8101251d>] ?
retint_restore_args+0x5/0x6
> > [ 1411.813958]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> > [ 1411.813966] ---[ end trace a7919e7f17c0a727 ]---
> > [ 1412.052253] eth0: port 1(peth0) entering disabled state
> > [ 1635.796207] frontend_changed: backend/vbd/3/768: prepare for
reconnect
> > [ 1647.137513] eth0: port 3(vif3.0) entering disabled state
> > [ 1647.157527] eth0: port 3(vif3.0) entering disabled state
> > 
> >  Kernel logging (proc) stopped.
> > 
> > In this case dom0 locked up. Some times just networking stops and some
> > times networking recovers.
> > 
> > Looks like it uses msi-x interrupts.
> > 
> > Concerning igb error I have tried the following  one at a time:
> > New igb driver from Intel site.
> > kernel parameter  pcie_aspm=off
> > ethtool -K eth0 tx off  on dom0
> > ethtool -K eth0 gro off  on dom0
> 
> OK.
> 
> > It has never died doing iperf from dom0 or domu  <> external.
> > Never died during network backup.
> > 
> > Usually takes a least a few hours and has never made it a day running
a
> > domu. Wish I could get it to die faster :-)
> > Any ideas?
> > I''m pretty much down to trying different network cards
> 
> Did you try that? Did that make any difference?
Not tested I did install one.

I think I found a way to keep it running.
On the new igb driver I built from new intel source added module parameter 
IntMode=1.

This puts it in msi mode. It was in msi-x mode.
It''s never died with that setting.
It''s up now over a day.
No real experience with msi-x. I think it''s the first time I have seen
a
driver use msi-x interrupts.
Maybe that gives you more ideas?

> 
> > Any ideas?
> 
> There is a Xen parameter called ''noirqbalance'' . Try
that. Also see if you
> can limit the CPUs in the dom0 using these two arguments on Xen
> hypervisor:
> Should I turn off the irqbalence daemon also?
Just in case you wonder it does with out it.
> dom0_vcpus=2 dom0_vcpus_pin=1
> 
> 
> It would be interesting to narrow down _when_ you trigger this failure. B/c
> we can pull Xen to see what the MSI''s are ''xl debug-keys
M'' _before_ and
> _after_ your failure to see if something is amiss.
> 
> Mainly to figure out if the vectors are moving around the CPUs (or not)
> 
> (XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest dest=00000001
> mask=0/0/-1
> 
> and also ''xl debug-keys i'' to see if the domain has
ACK-ed the interrupt:
> (XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:21
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:275(----),
> 
> (the last ''----'' might have something else in in them -
if so that is a
> sign that dom0 hasn''t picked up the event/vector).
Much of my frustration is that I have not found a way to get it to fail other 
than waiting a long time :-(

Thanks

John



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jul-29 17:31 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

> > Did you try that? Did that make any difference?
> 
> Not tested I did install one.
> 
> I think I found a way to keep it running.
> On the new igb driver I built from new intel source added module parameter 
> IntMode=1.
> 
> This puts it in msi mode. It was in msi-x mode.
> It''s never died with that setting.
> It''s up now over a day.
> No real experience with msi-x. I think it''s the first time I have
seen a
> driver use msi-x interrupts.
> Maybe that gives you more ideas?
That was my thought - the MSI-X aren''t somehow being ACKed properly.
But I don''t
know if the issue with Dom0 or Xen.> 
> 
> > 
> > > Any ideas?
> > 
> > There is a Xen parameter called ''noirqbalance'' . Try
that. Also see if you
> > can limit the CPUs in the dom0 using these two arguments on Xen
> > hypervisor:
> > 
> Should I turn off the irqbalence daemon also?
Sure.
> Just in case you wonder it does with out it.
> 
> > dom0_vcpus=2 dom0_vcpus_pin=1
> > 
> > 
> > It would be interesting to narrow down _when_ you trigger this
failure. B/c
> > we can pull Xen to see what the MSI''s are ''xl
debug-keys M'' _before_ and
> > _after_ your failure to see if something is amiss.
> > 
> > Mainly to figure out if the vectors are moving around the CPUs (or
not)
> > 
> > (XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest
dest=00000001
> > mask=0/0/-1
> > 
> > and also ''xl debug-keys i'' to see if the domain has
ACK-ed the interrupt:
> > (XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:21
> > type=PCI-MSI         status=00000010 in-flight=0
domain-list=0:275(----),
> > 
> > (the last ''----'' might have something else in in
them - if so that is a
> > sign that dom0 hasn''t picked up the event/vector).
> 
> Much of my frustration is that I have not found a way to get it to fail
other
> than waiting a long time :-(
Ah that sucks. Well, just make a nice shell script that will run those
continously
(and also ''xl dmesg'') and pipe the log to a file.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McMonagle

2011-Jul-29 21:12 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Friday, July 29, 2011 12:31:29 pm you wrote:> > > Did you try that? Did that make any difference?
> > 
> > Not tested I did install one.
> > 
> > I think I found a way to keep it running.
> > On the new igb driver I built from new intel source added module
> > parameter IntMode=1.
> > 
> > This puts it in msi mode. It was in msi-x mode.
> > It''s never died with that setting.
> > It''s up now over a day.
> > No real experience with msi-x. I think it''s the first time I
have seen a
> > driver use msi-x interrupts.
> > Maybe that gives you more ideas?
> 
> That was my thought - the MSI-X aren''t somehow being ACKed
properly. But I
> don''t know if the issue with Dom0 or Xen.
> 
> > > > Any ideas?
> > > 
> > > There is a Xen parameter called ''noirqbalance''
. Try that. Also see if
> > > you can limit the CPUs in the dom0 using these two arguments on
Xen
> > 
> > > hypervisor:
> > Should I turn off the irqbalence daemon also?
> 
> Sure.
> 
> > Just in case you wonder it does with out it.
> > 
> > > dom0_vcpus=2 dom0_vcpus_pin=1
> > > 
> > > 
> > > It would be interesting to narrow down _when_ you trigger this
failure.
> > > B/c we can pull Xen to see what the MSI''s are
''xl debug-keys M''
> > > _before_ and _after_ your failure to see if something is amiss.
> > > 
> > > Mainly to figure out if the vectors are moving around the CPUs
(or not)
> > > 
> > > (XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest
dest=00000001
> > > mask=0/0/-1
> > > 
> > > and also ''xl debug-keys i'' to see if the domain
has ACK-ed the
> > > interrupt: (XEN)    IRQ:  29
> > > affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI
> > >   status=00000010 in-flight=0 domain-list=0:275(----),
> > > 
> > > (the last ''----'' might have something else in
in them - if so that is a
> > > sign that dom0 hasn''t picked up the event/vector).
> > 
> > Much of my frustration is that I have not found a way to get it to
fail
> > other than waiting a long time :-(
> 
> Ah that sucks. Well, just make a nice shell script that will run those
> continously (and also ''xl dmesg'') and pipe the log to a
file.I was just setting up to run your test.
About 10 minutes after removing irqbalence I lost networking.
I''m remote and few minutes later lost ipmi sol so odds I lost the
serial port
interrupt also. So went to ikvm.

Was able to able to restore network with
ifdown xenbr0
rmmod igb
modprobe igb
ifup xenbr0

The runnning domu still had no networking.

Attached is
dom0.dmesg.gz  dom0 dmesg I see nothing in particular myself.
xen.dmesg.gz    xl dmesg  I had done the M and i earlier and ran them just 
before running xl dmesg so it should have before and after.
Did these before bringing the network back up.

I''ll reboot and run as you requested.

John



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McMonagle

2011-Jul-31 23:21 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

Konrad

I ran as you requested and after 2 days it''s still up.

Attached are xen.dmesg and dom0.dmesg  all the command line parameters are in 
them. 

At this point I''d be happy if was on line with a reasonably efficient
stable
configuration.

Supermicro has been helpful on other issues but not with this.
Suspect I''ll need to have some idea what the problem is to be able to
get it
sent to the right person.

Any suggestions?

John


On Friday, July 29, 2011 12:31:29 pm Konrad Rzeszutek Wilk
wrote:> > > Did you try that? Did that make any difference?
> > 
> > Not tested I did install one.
> > 
> > I think I found a way to keep it running.
> > On the new igb driver I built from new intel source added module
> > parameter IntMode=1.
> > 
> > This puts it in msi mode. It was in msi-x mode.
> > It''s never died with that setting.
> > It''s up now over a day.
> > No real experience with msi-x. I think it''s the first time I
have seen a
> > driver use msi-x interrupts.
> > Maybe that gives you more ideas?
> 
> That was my thought - the MSI-X aren''t somehow being ACKed
properly. But I
> don''t know if the issue with Dom0 or Xen.
> 
> > > > Any ideas?
> > > 
> > > There is a Xen parameter called ''noirqbalance''
. Try that. Also see if
> > > you can limit the CPUs in the dom0 using these two arguments on
Xen
> > 
> > > hypervisor:
> > Should I turn off the irqbalence daemon also?
> 
> Sure.
> 
> > Just in case you wonder it does with out it.
> > 
> > > dom0_vcpus=2 dom0_vcpus_pin=1
> > > 
> > > 
> > > It would be interesting to narrow down _when_ you trigger this
failure.
> > > B/c we can pull Xen to see what the MSI''s are
''xl debug-keys M''
> > > _before_ and _after_ your failure to see if something is amiss.
> > > 
> > > Mainly to figure out if the vectors are moving around the CPUs
(or not)
> > > 
> > > (XEN)  MSI    29 vec=21 lowest  edge   assert  log lowest
dest=00000001
> > > mask=0/0/-1
> > > 
> > > and also ''xl debug-keys i'' to see if the domain
has ACK-ed the
> > > interrupt: (XEN)    IRQ:  29
> > > affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI
> > >   status=00000010 in-flight=0 domain-list=0:275(----),
> > > 
> > > (the last ''----'' might have something else in
in them - if so that is a
> > > sign that dom0 hasn''t picked up the event/vector).
> > 
> > Much of my frustration is that I have not found a way to get it to
fail
> > other than waiting a long time :-(
> 
> Ah that sucks. Well, just make a nice shell script that will run those
> continously (and also ''xl dmesg'') and pipe the log to a
file.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-02 16:17 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle
wrote:> Konrad
> 
> I ran as you requested and after 2 days it''s still up.
Good!> 
> Attached are xen.dmesg and dom0.dmesg  all the command line parameters are
in
> them. 
> At this point I''d be happy if was on line with a reasonably
efficient stable
> configuration.
So it looks like from your runs (before you had irqbalance):

(XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu dest=00000023
mask=1/0/0
(XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu dest=00000023
mask=1/0/0
(XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu dest=00000023
mask=1/0/0
(XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu dest=00000023
mask=1/0/0
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00004000 vec:b5
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:273(----),
(XEN)  MSI    31 vec=39  fixed  edge   assert phys    cpu dest=00000020
mask=1/0/0
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000100 vec:39
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(----),

The affinity (and the corresponding vector) moves from one CPU to another
and then dom0 somehow does not get it. Do you have another of these
boxes available remotly to debug this further?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John McMonagle

2011-Aug-02 18:07 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Tuesday, August 02, 2011 11:17:36 am Konrad Rzeszutek Wilk
wrote:> On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle wrote:
> > Konrad
> > 
> > I ran as you requested and after 2 days it''s still up.
> 
> Good!
> 
> > Attached are xen.dmesg and dom0.dmesg  all the command line parameters
> > are in them.
> > 
> > At this point I''d be happy if was on line with a reasonably
efficient
> > stable configuration.
> 
> So it looks like from your runs (before you had irqbalance):
> 
> (XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu dest=00000023
> mask=1/0/0 (XEN)  MSI    31 vec=ad  fixed  edge   assert phys    cpu
> dest=00000023 mask=1/0/0 (XEN)  MSI    31 vec=ad  fixed  edge   assert
> phys    cpu dest=00000023 mask=1/0/0 (XEN)  MSI    31 vec=ad  fixed  edge 
>  assert phys    cpu dest=00000023 mask=1/0/0 (XEN)    IRQ:  31
> affinity:00000000,00000000,00000000,00004000 vec:b5 type=PCI-MSI        
> status=00000010 in-flight=0 domain-list=0:273(----), (XEN)  MSI    31
> vec=39  fixed  edge   assert phys    cpu dest=00000020 mask=1/0/0 (XEN)   
> IRQ:  31 affinity:00000000,00000000,00000000,00000100 vec:39 type=PCI-MSI 
>        status=00000050 in-flight=0 domain-list=0:273(----),
> 
> The affinity (and the corresponding vector) moves from one CPU to another
> and then dom0 somehow does not get it. Do you have another of these
> boxes available remotly to debug this further?
Konrad

Just one box but I''m willing to try more things.
A shame this is otherwise a really nice box for the price.
I sent Supermicro another email with updated info.
I''m not holding my breath :-(
They did fix the ipmi firmware for me so I guess I can hope for a fix.

In my last test shouldn''t dom0_vcpus=2 have been dom0_max_vcpus=2?
Dom0 still has 16 cpus.

I''d prefer not pining dom0 but stability is more important :-)

Thanks for all the help so far.

John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Aug-03 09:01 UTC

head link

RE: [Xen-devel] Network dies and kernel errors

in my test environment, the irqbalance is already stopped.
 
So my bnx bug may not related  to this.
 
Many thanks.
 
> Date: Tue, 2 Aug 2011 12:17:36 -0400
> From: konrad.wilk@oracle.com
> To: johnm@advocap.org
> CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Network dies and kernel errors
> 
> On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle wrote:
> > Konrad
> > 
> > I ran as you requested and after 2 days it''s still up.
> 
> Good!
> > 
> > Attached are xen.dmesg and dom0.dmesg all the command line parameters
are in
> > them. 
> 
> > At this point I''d be happy if was on line with a reasonably
efficient stable
> > configuration.
> 
> So it looks like from your runs (before you had irqbalance):
> 
> (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0
> (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0
> (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0
> (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0
> (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00004000 vec:b5
type=PCI-MSI status=00000010 in-flight=0 domain-list=0:273(----),
> (XEN) MSI 31 vec=39 fixed edge assert phys cpu dest=00000020 mask=1/0/0
> (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00000100 vec:39
type=PCI-MSI status=00000050 in-flight=0 domain-list=0:273(----),
> 
> The affinity (and the corresponding vector) moves from one CPU to another
> and then dom0 somehow does not get it. Do you have another of these
> boxes available remotly to debug this further?
>  		 	   		  
--_6ba3e06d-5d31-42c8-adc9-a27c3fb850ca_
Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: 8bit

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:微软雅黑
}
--></style>
</head>
<body class=''hmmessage''><div
dir=''ltr''>
in my test environment, the irqbalance is already stopped.<BR>
&nbsp;<BR>
So&nbsp;my bnx bug may not related&nbsp; to this.<BR>
&nbsp;<BR>
Many thanks.<BR>&nbsp;<BR>
<DIV>
&gt; Date: Tue, 2 Aug 2011 12:17:36 -0400<BR>&gt; From:
konrad.wilk@oracle.com<BR>&gt; To: johnm@advocap.org<BR>&gt;
CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com<BR>&gt;
Subject: Re: [Xen-devel] Network dies and kernel errors<BR>&gt;
<BR>&gt; On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle
wrote:<BR>&gt; &gt; Konrad<BR>&gt; &gt;
<BR>&gt; &gt; I ran as you requested and after 2 days
it''s still up.<BR>&gt; <BR>&gt;
Good!<BR>&gt; &gt; <BR>&gt; &gt; Attached are
xen.dmesg and dom0.dmesg all the command line parameters are in
<BR>&gt; &gt; them. <BR>&gt; <BR>&gt; &gt;
At this point I''d be happy if was on line with a reasonably efficient
stable <BR>&gt; &gt; configuration.<BR>&gt;
<BR>&gt; So it looks like from your runs (before you had
irqbalance):<BR>&gt; <BR>&gt; (XEN) MSI 31 vec=ad fixed edge
assert phys cpu dest=00000023 mask=1/0/0<BR>&gt; (XEN) MSI 31 vec=ad
fixed edge assert phys cpu dest=00000023 mask=1/0/0<BR>&gt; (XEN) MSI
31 vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0<BR>&gt;
(XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0<BR>&gt; (XEN) IRQ: 31
affinity:00000000,00000000,00000000,00004000 vec:b5 type=PCI-MSI status=00000010
in-flight=0 domain-list=0:273(----),<BR>&gt; (XEN) MSI 31 vec=39 fixed
edge assert phys cpu dest=00000020 mask=1/0/0<BR>&gt; (XEN) IRQ: 31
affinity:00000000,00000000,00000000,00000100 vec:39 type=PCI-MSI status=00000050
in-flight=0 domain-list=0:273(----),<BR>&gt; <BR>&gt; The
affinity (and the corresponding vector) moves from one CPU to
another<BR>&gt; and then dom0 somehow does not get it. Do you have
another of these<BR>&gt; boxes available remotly to debug this
further?<BR>&gt; <BR></DIV> 		 	   		 
</div></body>
</html>
--_6ba3e06d-5d31-42c8-adc9-a27c3fb850ca_--


--===============0252885102=Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============0252885102==--

Konrad Rzeszutek Wilk

2011-Aug-03 13:37 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

On Wed, Aug 03, 2011 at 05:01:38PM +0800, MaoXiaoyun
wrote:> 
> in my test environment, the irqbalance is already stopped.
>  
> So my bnx bug may not related  to this.

Did you use ''noirqbalance'' on the Xen hypervisor
line?>  
> Many thanks.
>  
> 
> > Date: Tue, 2 Aug 2011 12:17:36 -0400
> > From: konrad.wilk@oracle.com
> > To: johnm@advocap.org
> > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
> > Subject: Re: [Xen-devel] Network dies and kernel errors
> > 
> > On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle wrote:
> > > Konrad
> > > 
> > > I ran as you requested and after 2 days it''s still up.
> > 
> > Good!
> > > 
> > > Attached are xen.dmesg and dom0.dmesg all the command line
parameters are in
> > > them. 
> > 
> > > At this point I''d be happy if was on line with a
reasonably efficient stable
> > > configuration.
> > 
> > So it looks like from your runs (before you had irqbalance):
> > 
> > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00004000 vec:b5
type=PCI-MSI status=00000010 in-flight=0 domain-list=0:273(----),
> > (XEN) MSI 31 vec=39 fixed edge assert phys cpu dest=00000020
mask=1/0/0
> > (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00000100 vec:39
type=PCI-MSI status=00000050 in-flight=0 domain-list=0:273(----),
> > 
> > The affinity (and the corresponding vector) moves from one CPU to
another
> > and then dom0 somehow does not get it. Do you have another of these
> > boxes available remotly to debug this further?
> > 
>  		 	   		  
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

MaoXiaoyun

2011-Aug-04 03:55 UTC

head link

RE: [Xen-devel] Network dies and kernel errors

> Date: Wed, 3 Aug 2011 09:37:57 -0400
> From: konrad.wilk@oracle.com
> To: tinnycloud@hotmail.com
> CC: johnm@advocap.org; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Network dies and kernel errors
> 
> On Wed, Aug 03, 2011 at 05:01:38PM +0800, MaoXiaoyun wrote:
> > 
> > in my test environment, the irqbalance is already stopped.
> > 
> > So my bnx bug may not related to this.
> 
> 
> Did you use ''noirqbalance'' on the Xen hypervisor line? 
No. I just stop irqbalance service, any difference with
"noirqbalance"?
If so, I would like to have a try.
 
Thanks.
> > 
> > Many thanks.
> > 
> > 
> > > Date: Tue, 2 Aug 2011 12:17:36 -0400
> > > From: konrad.wilk@oracle.com
> > > To: johnm@advocap.org
> > > CC: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
> > > Subject: Re: [Xen-devel] Network dies and kernel errors
> > > 
> > > On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle wrote:
> > > > Konrad
> > > > 
> > > > I ran as you requested and after 2 days it''s still
up.
> > > 
> > > Good!
> > > > 
> > > > Attached are xen.dmesg and dom0.dmesg all the command line
parameters are in
> > > > them. 
> > > 
> > > > At this point I''d be happy if was on line with a
reasonably efficient stable
> > > > configuration.
> > > 
> > > So it looks like from your runs (before you had irqbalance):
> > > 
> > > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > > (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0
> > > (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00004000 vec:b5
type=PCI-MSI status=00000010 in-flight=0 domain-list=0:273(----),
> > > (XEN) MSI 31 vec=39 fixed edge assert phys cpu dest=00000020
mask=1/0/0
> > > (XEN) IRQ: 31 affinity:00000000,00000000,00000000,00000100 vec:39
type=PCI-MSI status=00000050 in-flight=0 domain-list=0:273(----),
> > > 
> > > The affinity (and the corresponding vector) moves from one CPU to
another
> > > and then dom0 somehow does not get it. Do you have another of
these
> > > boxes available remotly to debug this further?
> > > 
> > 
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>  		 	   		  
--_68a51b7e-93c7-4e06-a2d6-7b71f4e68446_
Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: 8bit

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:微软雅黑
}
--></style>
</head>
<body class=''hmmessage''><div
dir=''ltr''>
<BR>&nbsp;<BR>

<DIV>&gt; Date: Wed, 3 Aug 2011 09:37:57 -0400<BR>&gt; From:
konrad.wilk@oracle.com<BR>&gt; To:
tinnycloud@hotmail.com<BR>&gt; CC: johnm@advocap.org;
xen-devel@lists.xensource.com<BR>&gt; Subject: Re: [Xen-devel] Network
dies and kernel errors<BR>&gt; <BR>&gt; On Wed, Aug 03, 2011
at 05:01:38PM +0800, MaoXiaoyun wrote:<BR>&gt; &gt;
<BR>&gt; &gt; in my test environment, the irqbalance is already
stopped.<BR>&gt; &gt; <BR>&gt; &gt; So my bnx bug
may not related to this.<BR>&gt; <BR>&gt; <BR>&gt;
Did you use ''noirqbalance'' on the Xen hypervisor
line?</DIV>
<DIV>&nbsp;</DIV>
<DIV>No. I just stop irqbalance service, any difference with
"noirqbalance"?</DIV>
<DIV>If so, I would like to have a try.</DIV>
<DIV>&nbsp;</DIV>
<DIV>Thanks.</DIV>
<DIV><BR>&gt; &gt; <BR>&gt; &gt; Many
thanks.<BR>&gt; &gt; <BR>&gt; &gt;
<BR>&gt; &gt; &gt; Date: Tue, 2 Aug 2011 12:17:36
-0400<BR>&gt; &gt; &gt; From:
konrad.wilk@oracle.com<BR>&gt; &gt; &gt; To:
johnm@advocap.org<BR>&gt; &gt; &gt; CC:
tinnycloud@hotmail.com; xen-devel@lists.xensource.com<BR>&gt; &gt;
&gt; Subject: Re: [Xen-devel] Network dies and kernel
errors<BR>&gt; &gt; &gt; <BR>&gt; &gt; &gt;
On Sun, Jul 31, 2011 at 06:21:47PM -0500, John McMonagle
wrote:<BR>&gt; &gt; &gt; &gt; Konrad<BR>&gt;
&gt; &gt; &gt; <BR>&gt; &gt; &gt; &gt; I ran
as you requested and after 2 days it''s still up.<BR>&gt;
&gt; &gt; <BR>&gt; &gt; &gt; Good!<BR>&gt;
&gt; &gt; &gt; <BR>&gt; &gt; &gt; &gt;
Attached are xen.dmesg and dom0.dmesg all the command line parameters are in
<BR>&gt; &gt; &gt; &gt; them. <BR>&gt; &gt;
&gt; <BR>&gt; &gt; &gt; &gt; At this point
I''d be happy if was on line with a reasonably efficient stable
<BR>&gt; &gt; &gt; &gt; configuration.<BR>&gt;
&gt; &gt; <BR>&gt; &gt; &gt; So it looks like from
your runs (before you had irqbalance):<BR>&gt; &gt; &gt;
<BR>&gt; &gt; &gt; (XEN) MSI 31 vec=ad fixed edge assert phys
cpu dest=00000023 mask=1/0/0<BR>&gt; &gt; &gt; (XEN) MSI 31
vec=ad fixed edge assert phys cpu dest=00000023 mask=1/0/0<BR>&gt;
&gt; &gt; (XEN) MSI 31 vec=ad fixed edge assert phys cpu dest=00000023
mask=1/0/0<BR>&gt; &gt; &gt; (XEN) MSI 31 vec=ad fixed edge
assert phys cpu dest=00000023 mask=1/0/0<BR>&gt; &gt; &gt;
(XEN) IRQ: 31 affinity:00000000,00000000,00000000,00004000 vec:b5 type=PCI-MSI
status=00000010 in-flight=0 domain-list=0:273(----),<BR>&gt; &gt;
&gt; (XEN) MSI 31 vec=39 fixed edge assert phys cpu dest=00000020
mask=1/0/0<BR>&gt; &gt; &gt; (XEN) IRQ: 31
affinity:00000000,00000000,00000000,00000100 vec:39 type=PCI-MSI status=00000050
in-flight=0 domain-list=0:273(----),<BR>&gt; &gt; &gt;
<BR>&gt; &gt; &gt; The affinity (and the corresponding vector)
moves from one CPU to another<BR>&gt; &gt; &gt; and then dom0
somehow does not get it. Do you have another of these<BR>&gt; &gt;
&gt; boxes available remotly to debug this further?<BR>&gt;
&gt; &gt; <BR>&gt; &gt; <BR>&gt;
<BR>&gt; &gt;
_______________________________________________<BR>&gt; &gt;
Xen-devel mailing list<BR>&gt; &gt;
Xen-devel@lists.xensource.com<BR>&gt; &gt;
http://lists.xensource.com/xen-devel<BR>&gt; <BR></DIV> 		
</div></body>
</html>
--_68a51b7e-93c7-4e06-a2d6-7b71f4e68446_--


--===============1880987715=Content-Type: text/plain;
charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

--===============1880987715==--

Konrad Rzeszutek Wilk

2011-Aug-04 18:03 UTC

head link

Re: [Xen-devel] Network dies and kernel errors

> > > in my test environment, the irqbalance is already stopped.
> > > 
> > > So my bnx bug may not related to this.
> > 
> > 
> > Did you use ''noirqbalance'' on the Xen hypervisor
line?
>  
> No. I just stop irqbalance service, any difference with
"noirqbalance"?
> If so, I would like to have a try.
No that is OK.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2011 - Network dies and kernel errors

[Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

RE: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors

RE: [Xen-devel] Network dies and kernel errors

Re: [Xen-devel] Network dies and kernel errors