thr3ads.net - Xen devel - [Xen-devel] MSI badness in xen-unstable [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Gianni Tedesco

2010-Oct-08 09:33 UTC

[Xen-devel] MSI badness in xen-unstable

Hi,

I''ve been trying to boot stefano''s minimal dom0 kernel from
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
2.6.36-rc1-initial-domain-v2+pat

On xen-unstable, I get the following WARN_ON()''s from Xen when bringing
up the NIC''s, then the machine hangs forever when trying to login
either
over serial or NIC.

(XEN) Xen WARN at msi.c:649
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c48015b450>] pci_enable_msi+0x466/0x945
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000da000000   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480247940
(XEN) rbp: ffff83023ff2fdc8   rsp: ffff83023ff2fd08   r8:  ffff83023fff4004
(XEN) r9:  ffff830000000000   r10: ffff82c48020d040   r11: 0000000000000217
(XEN) r12: 0000000000000149   r13: ffff83023ff7caa0   r14: ffff83023ff2fe10
(XEN) r15: 0000000000000009   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000236c1a000   cr2: 0000003e746f8050
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83023ff2fd08:
(XEN)    800000023510b067 000000000023510b 000883023fecc000 ffff83023ff7cb38
(XEN)    ffff83020000c000 ffff83023ff2ff18 ffff83023ff2ff18 00000000000da00c
(XEN)    00000000da00c000 000000a23ff2fd68 0000000000000000 00000000da00c000
(XEN)    000000023ff2fdb8 0000000000000000 00000000000da00c 00000002000000a0
(XEN)    0000000000000000 ffff8302101b7540 ffff82c48011fe91 ffff83023fecc000
(XEN)    0000000000000113 0000000000000025 00000000ffffffed ffff83023ff81300
(XEN)    ffff83023ff2fe48 ffff82c48015cd07 0000000000000cfc 0000000000000025
(XEN)    ffff83023ff2fea8 000000000000044c 0000000000000094 ffff83023ff7caa0
(XEN)    0000000000000246 ffff83023ff2fe28 ffff82c48011fe91 ffff88002db9d568
(XEN)    0000000000000113 ffff83023fecc000 0000000000000025 ffff83023fecc190
(XEN)    ffff83023ff2fef8 ffff82c4801700a2 ffff830200000000 ffff830200000004
(XEN)    ffffffff8127d3d6 ffff83023ff2fea8 0000000000007ff0 ffffffffffffffff
(XEN)    0000000000000002 0000000000000000 00000000da000000 aaaaaaaaaaaaaaaa
(XEN)    0000000000000002 0000000000000025 00000000da000000 0000000000000000
(XEN)    ffff83023ff2fed8 ffff8300bf4c8000 0000000000000011 ffff88002efa3a00
(XEN)    0000000000000011 00000000000000a0 00007cfdc00d00c7 ffff82c4801fb8e2
(XEN)    ffffffff8100142a 0000000000000021 00000000000000a0 0000000000000011
(XEN)    ffff88002efa3a00 0000000000000011 ffff88002db9d608 00000000000007c9
(XEN)    0000000000000217 0000000000000000 ffff88002bab02a0 ffff88002e45ac00
(XEN)    0000000000000021 ffffffff8100142a 0000000000000000 ffff88002db9d568
(XEN) Xen call trace:
(XEN)    [<ffff82c48015b450>] pci_enable_msi+0x466/0x945
(XEN)    [<ffff82c48015cd07>] map_domain_pirq+0x28a/0x377
(XEN)    [<ffff82c4801700a2>] do_physdev_op+0x7f2/0x1040
(XEN)    [<ffff82c4801fb8e2>] syscall_enter+0xf2/0x14c
(XEN)    
(XEN) Xen WARN at msi.c:649
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c48015b4ce>] pci_enable_msi+0x4e4/0x945
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 000000000000e000   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480247940
(XEN) rbp: ffff83023ff2fdc8   rsp: ffff83023ff2fd08   r8:  ffff83023fff4004
(XEN) r9:  ffff830000000000   r10: ffff82c48020d040   r11: 0000000000000217
(XEN) r12: 0000000000000149   r13: ffff83023ff7caa0   r14: ffff83023ff2fe10
(XEN) r15: 0000000000000009   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000236c1a000   cr2: 0000003e746f8050
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83023ff2fd08:
(XEN)    800000023510b067 000000000023510b 000883023fecc000 ffff83023ff7cb38
(XEN)    ffff83020000c000 ffff83023ff2ff18 ffff83023ff2ff18 00000000000da00c
(XEN)    00000000da00c000 000000a23ff2fd68 0000000000000000 00000000da00c000
(XEN)    000000023ff2fdb8 0000000000000000 00000000000da00c 00000002000000a0
(XEN)    0000000000000000 ffff8302101b7540 ffff82c48011fe91 ffff83023fecc000
(XEN)    0000000000000113 0000000000000025 00000000ffffffed ffff83023ff81300
(XEN)    ffff83023ff2fe48 ffff82c48015cd07 0000000000000cfc 0000000000000025
(XEN)    ffff83023ff2fea8 000000000000044c 0000000000000094 ffff83023ff7caa0
(XEN)    0000000000000246 ffff83023ff2fe28 ffff82c48011fe91 ffff88002db9d568
(XEN)    0000000000000113 ffff83023fecc000 0000000000000025 ffff83023fecc190
(XEN)    ffff83023ff2fef8 ffff82c4801700a2 ffff830200000000 ffff830200000004
(XEN)    ffffffff8127d3d6 ffff83023ff2fea8 0000000000007ff0 ffffffffffffffff
(XEN)    0000000000000002 0000000000000000 00000000da000000 aaaaaaaaaaaaaaaa
(XEN)    0000000000000002 0000000000000025 00000000da000000 0000000000000000
(XEN)    ffff83023ff2fed8 ffff8300bf4c8000 0000000000000011 ffff88002efa3a00
(XEN)    0000000000000011 00000000000000a0 00007cfdc00d00c7 ffff82c4801fb8e2
(XEN)    ffffffff8100142a 0000000000000021 00000000000000a0 0000000000000011
(XEN)    ffff88002efa3a00 0000000000000011 ffff88002db9d608 00000000000007c9
(XEN)    0000000000000217 0000000000000000 ffff88002bab02a0 ffff88002e45ac00
(XEN)    0000000000000021 ffffffff8100142a 0000000000000000 ffff88002db9d568
(XEN) Xen call trace:
(XEN)    [<ffff82c48015b4ce>] pci_enable_msi+0x4e4/0x945
(XEN)    [<ffff82c48015cd07>] map_domain_pirq+0x28a/0x377
(XEN)    [<ffff82c4801700a2>] do_physdev_op+0x7f2/0x1040
(XEN)    [<ffff82c4801fb8e2>] syscall_enter+0xf2/0x14c
(XEN)

If I disable MSI in the kernel then these messages go away but then the
kernel hangs forever while bringing up the NIC''s and therefore never
gets to the login prompt.

Everything works fine if I use xen-3.4.4-pre so it leads me to believe
it''s a hypervisor bug?

And when I say "fine", what I mean is that I can login and run stuff
at
almost bare-metal performance except that ssh feels as if the machine is
somewhere in new zealand and I am on a 56K modem. It''s characterized by
long pauses (over serial and ssh) and then all my key-presses get
responded to at once.

Any ideas?

Thanks


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gianni Tedesco

2010-Oct-08 10:01 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:> If I disable MSI in the kernel then these messages go away but then the
> kernel hangs forever while bringing up the NIC''s and therefore
never
> gets to the login prompt.
Actually as an additional data-point, with MSI disabled it also hangs
forever on bare-metal...

Gianni


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gianni Tedesco

2010-Oct-08 10:03 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Fri, 2010-10-08 at 11:01 +0100, Gianni Tedesco wrote:> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
> > If I disable MSI in the kernel then these messages go away but then
the
> > kernel hangs forever while bringing up the NIC''s and
therefore never
> > gets to the login prompt.
> 
> Actually as an additional data-point, with MSI disabled it also hangs
> forever on bare-metal...
Strike that, my mistake, s/bare-metal/xen-3.4.2 from xenserver/ -
xenserver works fine on this hardware - as does stefanos kernel on the
bare-metal...

Gianni


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gianni Tedesco

2010-Oct-11 17:12 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:> Hi,
> 
> I''ve been trying to boot stefano''s minimal dom0 kernel
from
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
> 2.6.36-rc1-initial-domain-v2+pat
> 
> On xen-unstable, I get the following WARN_ON()''s from Xen when
bringing
> up the NIC''s, then the machine hangs forever when trying to login
either
> over serial or NIC.
> 
> (XEN) Xen WARN at msi.c:649
Hmm so this appears not to be an issue with XCP kernel, in that case I
get the warnings but everything still works fine.

I will investigate further when I have some time.

Gianni


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-11 21:05 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
<gianni.tedesco@citrix.com> wrote:> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>> Hi,
>>
>> I''ve been trying to boot stefano''s minimal dom0
kernel from
>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>> 2.6.36-rc1-initial-domain-v2+pat
>>
>> On xen-unstable, I get the following WARN_ON()''s from Xen when
bringing
>> up the NIC''s, then the machine hangs forever when trying to
login either
>> over serial or NIC.
>>
>> (XEN) Xen WARN at msi.c:649
I get the same Xen WARN messages using the current pvops/xen-next with
xen-unstable, here''s the complete list for one boot, grep''d
for WARN:

(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:656
(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:656
(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:656
(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
(XEN) Xen WARN at msi.c:649
(XEN) Xen WARN at msi.c:656

The complete boot seq is attached.

I do get a login at the end of the boot seq though.
My situation goes pear shaped when I try start a pv domU. The dom0
locks up after printing this on the console:

(XEN) tmem: all pools frozen for all domains
(XEN) tmem: all pools thawed for all domains
(XEN) tmem: all pools frozen for all domains
(XEN) tmem: all pools thawed for all domains
mapping kernel into physical memory
about to get started...

then prints these once a minute:
[  589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]

The xen console is still active and I can generate a diag dump, also attached.

This dom0 lockup behavior started with pv-ops 2.6.32.21, all the way
to .24, rendering the later pvops kernels unusable for dom0.
The 2.6.32.18 kernel is the last one that functioned as a dom0.

This behavior is consistent on platforms, HP proliant 380DL G6, and
G7, as well as i7 supermicros.

-Bruce
>
> Hmm so this appears not to be an issue with XCP kernel, in that case I
> get the warnings but everything still works fine.
>
> I will investigate further when I have some time.
>
> Gianni
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-16 16:14 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge <bruce.edge@gmail.com>
wrote:> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
> <gianni.tedesco@citrix.com> wrote:
>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>> Hi,
>>>
>>> I''ve been trying to boot stefano''s minimal dom0
kernel from
>>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>> 2.6.36-rc1-initial-domain-v2+pat
>>>
>>> On xen-unstable, I get the following WARN_ON()''s from Xen
when bringing
>>> up the NIC''s, then the machine hangs forever when trying
to login either
>>> over serial or NIC.
>>>
>>> (XEN) Xen WARN at msi.c:649
>
> I get the same Xen WARN messages using the current pvops/xen-next with
> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:656
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:656
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:656
> (XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
> (XEN) Xen WARN at msi.c:649
> (XEN) Xen WARN at msi.c:656
>
> The complete boot seq is attached.
>
> I do get a login at the end of the boot seq though.
> My situation goes pear shaped when I try start a pv domU. The dom0
> locks up after printing this on the console:
>
> (XEN) tmem: all pools frozen for all domains
> (XEN) tmem: all pools thawed for all domains
> (XEN) tmem: all pools frozen for all domains
> (XEN) tmem: all pools thawed for all domains
> mapping kernel into physical memory
> about to get started...
>
> then prints these once a minute:
> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>
> The xen console is still active and I can generate a diag dump, also
attached.
>
> This dom0 lockup behavior started with pv-ops 2.6.32.21, all the way
> to .24, rendering the later pvops kernels unusable for dom0.
> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>
> This behavior is consistent on platforms, HP proliant 380DL G6, and
> G7, as well as i7 supermicros.
>
> -Bruce
>
>>
>> Hmm so this appears not to be an issue with XCP kernel, in that case I
>> get the warnings but everything still works fine.
>>
>> I will investigate further when I have some time.
>>
>> Gianni
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>
The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN at
msi.c:636 " messages with associated stack traces.

I spent a little more time working with this version, and except for
these disconcerting messages, which do look like they are initiated by
the ethernet card discovery, the system appears functional.
In all cases the first occurrence is immediately after the NIC discovery:

 e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
| e1000e: Copyright (c) 1999-2008 Intel Corporation.
| xen: registering gsi 16 triggering 0 polarity 1
| xen_allocate_pirq: returning irq 16 for gsi 16
  xen: --> irq=16
  Already setup the GSI :16
  e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
  e1000e 0000:06:00.0: setting latency timer to 64
    alloc irq_desc for 493 on node 0
    alloc kstat_irqs on node 0
  (XEN) Xen WARN at msi.c:636
  (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
....

In case it''s a NIC specific issue, I''m seeing it with both
    06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
Network Connection
and
    02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet (rev 20)
NICs

-Bruce

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-16 16:29 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

Hi Bruce,

I tripped over the same warning trying to solve my freezes.
Jan Beulich has posted a patch which is not in xen-unstable yet: [Xen-devel]
[PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a

Signed-off-by: Jan Beulich <jbeulich@novell.com>

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
         return 0;
     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
     {
-        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        addr &= PCI_BASE_ADDRESS_MEM_MASK;
         if ( ++bir >= limit )
             return 0;
         return addr |
                ((u64)pci_conf_read32(bus, slot, func,
                                      PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
     }
-    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+    return addr & PCI_BASE_ADDRESS_MEM_MASK;
 }
 
 /**



That fixes the warn, but my machine still keeps freezing non the less.
(but it also does so with pci=nomsi so it''s not msi specific in my
case)

--

Sander

Saturday, October 16, 2010, 6:14:17 PM, you wrote:
> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge <bruce.edge@gmail.com>
wrote:
>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>> <gianni.tedesco@citrix.com> wrote:
>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>>> Hi,
>>>>
>>>> I''ve been trying to boot stefano''s minimal
dom0 kernel from
>>>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>>> 2.6.36-rc1-initial-domain-v2+pat
>>>>
>>>> On xen-unstable, I get the following WARN_ON()''s from
Xen when bringing
>>>> up the NIC''s, then the machine hangs forever when
trying to login either
>>>> over serial or NIC.
>>>>
>>>> (XEN) Xen WARN at msi.c:649
>>
>> I get the same Xen WARN messages using the current pvops/xen-next with
>> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>>
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:656
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:656
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:656
>> (XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>> (XEN) Xen WARN at msi.c:649
>> (XEN) Xen WARN at msi.c:656
>>
>> The complete boot seq is attached.
>>
>> I do get a login at the end of the boot seq though.
>> My situation goes pear shaped when I try start a pv domU. The dom0
>> locks up after printing this on the console:
>>
>> (XEN) tmem: all pools frozen for all domains
>> (XEN) tmem: all pools thawed for all domains
>> (XEN) tmem: all pools frozen for all domains
>> (XEN) tmem: all pools thawed for all domains
>> mapping kernel into physical memory
>> about to get started...
>>
>> then prints these once a minute:
>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>>
>> The xen console is still active and I can generate a diag dump, also
attached.
>>
>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all the way
>> to .24, rendering the later pvops kernels unusable for dom0.
>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>>
>> This behavior is consistent on platforms, HP proliant 380DL G6, and
>> G7, as well as i7 supermicros.
>>
>> -Bruce
>>
>>>
>>> Hmm so this appears not to be an issue with XCP kernel, in that
case I
>>> get the warnings but everything still works fine.
>>>
>>> I will investigate further when I have some time.
>>>
>>> Gianni
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>
> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN at
> msi.c:636 " messages with associated stack traces.
> I spent a little more time working with this version, and except for
> these disconcerting messages, which do look like they are initiated by
> the ethernet card discovery, the system appears functional.
> In all cases the first occurrence is immediately after the NIC discovery:
>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
> | xen: registering gsi 16 triggering 0 polarity 1
> | xen_allocate_pirq: returning irq 16 for gsi 16
>   xen: --> irq=16
>   Already setup the GSI :16
>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>   e1000e 0000:06:00.0: setting latency timer to 64
>     alloc irq_desc for 493 on node 0
>     alloc kstat_irqs on node 0
>   (XEN) Xen WARN at msi.c:636
>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
> ....
> In case it''s a NIC specific issue, I''m seeing it with
both
>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
> Network Connection
> and
>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
> BCM5709 Gigabit Ethernet (rev 20)
> NICs
> -Bruce




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-16 17:14 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
<linux@eikelenboom.it> wrote:> Hi Bruce,
>
> I tripped over the same warning trying to solve my freezes.
> Jan Beulich has posted a patch which is not in xen-unstable yet:
[Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
>
> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>         return 0;
>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>     {
> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>         if ( ++bir >= limit )
>             return 0;
>         return addr |
>                ((u64)pci_conf_read32(bus, slot, func,
>                                      PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
>     }
> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>  }
>
>  /**
>
>
>
> That fixes the warn, but my machine still keeps freezing non the less.
> (but it also does so with pci=nomsi so it''s not msi specific in my
case)
>
> --
>
> Sander
Hi Sander,

Thank you.  I tried it against 4.1.0-22240 with no effect.
I confirmed I had the right patch:

0 %> hg diff  xen/arch/x86/msi.c

diff -r 38ad3633ecaf xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
+++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
@@ -549,14 +549,14 @@
         return 0;
     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
=PCI_BASE_ADDRESS_MEM_TYPE_64 )
     {
-        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        addr &= PCI_BASE_ADDRESS_MEM_MASK;
         if ( ++bir >= limit )
             return 0;
         return addr |
                ((u64)pci_conf_read32(bus, slot, func,
                                      PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
     }
-    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+    return addr & PCI_BASE_ADDRESS_MEM_MASK;
 }

 /**

The boot time msi warn messages were unchanged.

-Bruce
>
> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
>
>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>>> <gianni.tedesco@citrix.com> wrote:
>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>>>> Hi,
>>>>>
>>>>> I''ve been trying to boot stefano''s
minimal dom0 kernel from
>>>>> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>>>> 2.6.36-rc1-initial-domain-v2+pat
>>>>>
>>>>> On xen-unstable, I get the following WARN_ON()''s
from Xen when bringing
>>>>> up the NIC''s, then the machine hangs forever when
trying to login either
>>>>> over serial or NIC.
>>>>>
>>>>> (XEN) Xen WARN at msi.c:649
>>>
>>> I get the same Xen WARN messages using the current pvops/xen-next
with
>>> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>>>
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:656
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:656
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:656
>>> (XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>>> (XEN) Xen WARN at msi.c:649
>>> (XEN) Xen WARN at msi.c:656
>>>
>>> The complete boot seq is attached.
>>>
>>> I do get a login at the end of the boot seq though.
>>> My situation goes pear shaped when I try start a pv domU. The dom0
>>> locks up after printing this on the console:
>>>
>>> (XEN) tmem: all pools frozen for all domains
>>> (XEN) tmem: all pools thawed for all domains
>>> (XEN) tmem: all pools frozen for all domains
>>> (XEN) tmem: all pools thawed for all domains
>>> mapping kernel into physical memory
>>> about to get started...
>>>
>>> then prints these once a minute:
>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>>>
>>> The xen console is still active and I can generate a diag dump,
also attached.
>>>
>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all the
way
>>> to .24, rendering the later pvops kernels unusable for dom0.
>>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>>>
>>> This behavior is consistent on platforms, HP proliant 380DL G6, and
>>> G7, as well as i7 supermicros.
>>>
>>> -Bruce
>>>
>>>>
>>>> Hmm so this appears not to be an issue with XCP kernel, in that
case I
>>>> get the warnings but everything still works fine.
>>>>
>>>> I will investigate further when I have some time.
>>>>
>>>> Gianni
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>
>
>> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN at
>> msi.c:636 " messages with associated stack traces.
>
>> I spent a little more time working with this version, and except for
>> these disconcerting messages, which do look like they are initiated by
>> the ethernet card discovery, the system appears functional.
>> In all cases the first occurrence is immediately after the NIC
discovery:
>
>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
>> | xen: registering gsi 16 triggering 0 polarity 1
>> | xen_allocate_pirq: returning irq 16 for gsi 16
>>   xen: --> irq=16
>>   Already setup the GSI :16
>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>>   e1000e 0000:06:00.0: setting latency timer to 64
>>     alloc irq_desc for 493 on node 0
>>     alloc kstat_irqs on node 0
>>   (XEN) Xen WARN at msi.c:636
>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>> ....
>
>> In case it''s a NIC specific issue, I''m seeing it with
both
>>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>> Network Connection
>> and
>>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
>> BCM5709 Gigabit Ethernet (rev 20)
>> NICs
>
>> -Bruce
>
>
>
>
>
> --
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-16 17:25 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

Probably there are more problems, you could also try a xen-unstable from before
the commit that changed this code (msi.c)
Another thing that could make it eassier to debug would be to put some
printk''s around the WARN_ON''s in msi.c  at the linenumbers
that gave the warnings, showing but parts of the equation in the WARN_ON

--

Sander

Saturday, October 16, 2010, 7:14:11 PM, you wrote:
> On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
> <linux@eikelenboom.it> wrote:
>> Hi Bruce,
>>
>> I tripped over the same warning trying to solve my freezes.
>> Jan Beulich has posted a patch which is not in xen-unstable yet:
[Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
>>
>> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>>         return 0;
>>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>>     {
>> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>>         if ( ++bir >= limit )
>>             return 0;
>>         return addr |
>>                ((u64)pci_conf_read32(bus, slot, func,
>>                                      PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>>     }
>> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>>  }
>>
>>  /**
>>
>>
>>
>> That fixes the warn, but my machine still keeps freezing non the less.
>> (but it also does so with pci=nomsi so it''s not msi specific
in my case)
>>
>> --
>>
>> Sander
> Hi Sander,
> Thank you.  I tried it against 4.1.0-22240 with no effect.
> I confirmed I had the right patch:
0 %>> hg diff  xen/arch/x86/msi.c
> diff -r 38ad3633ecaf xen/arch/x86/msi.c
> --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
> +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
> @@ -549,14 +549,14 @@
>          return 0;
>      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) =>
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>      {
> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>          if ( ++bir >= limit )
>              return 0;
>          return addr |
>                 ((u64)pci_conf_read32(bus, slot, func,
>                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>      }
> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>  }
>  /**
> The boot time msi warn messages were unchanged.
> -Bruce
>>
>> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
>>
>>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
>>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>>>> <gianni.tedesco@citrix.com> wrote:
>>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I''ve been trying to boot stefano''s
minimal dom0 kernel from
>>>>>>
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>>>>> 2.6.36-rc1-initial-domain-v2+pat
>>>>>>
>>>>>> On xen-unstable, I get the following
WARN_ON()''s from Xen when bringing
>>>>>> up the NIC''s, then the machine hangs forever
when trying to login either
>>>>>> over serial or NIC.
>>>>>>
>>>>>> (XEN) Xen WARN at msi.c:649
>>>>
>>>> I get the same Xen WARN messages using the current
pvops/xen-next with
>>>> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>>>>
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>>
>>>> The complete boot seq is attached.
>>>>
>>>> I do get a login at the end of the boot seq though.
>>>> My situation goes pear shaped when I try start a pv domU. The
dom0
>>>> locks up after printing this on the console:
>>>>
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> mapping kernel into physical memory
>>>> about to get started...
>>>>
>>>> then prints these once a minute:
>>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
>>>>
>>>> The xen console is still active and I can generate a diag dump,
also attached.
>>>>
>>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all
the way
>>>> to .24, rendering the later pvops kernels unusable for dom0.
>>>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>>>>
>>>> This behavior is consistent on platforms, HP proliant 380DL G6,
and
>>>> G7, as well as i7 supermicros.
>>>>
>>>> -Bruce
>>>>
>>>>>
>>>>> Hmm so this appears not to be an issue with XCP kernel, in
that case I
>>>>> get the warnings but everything still works fine.
>>>>>
>>>>> I will investigate further when I have some time.
>>>>>
>>>>> Gianni
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>
>>
>>> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN
at
>>> msi.c:636 " messages with associated stack traces.
>>
>>> I spent a little more time working with this version, and except
for
>>> these disconcerting messages, which do look like they are initiated
by
>>> the ethernet card discovery, the system appears functional.
>>> In all cases the first occurrence is immediately after the NIC
discovery:
>>
>>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
>>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
>>> | xen: registering gsi 16 triggering 0 polarity 1
>>> | xen_allocate_pirq: returning irq 16 for gsi 16
>>>   xen: --> irq=16
>>>   Already setup the GSI :16
>>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
>>>   e1000e 0000:06:00.0: setting latency timer to 64
>>>     alloc irq_desc for 493 on node 0
>>>     alloc kstat_irqs on node 0
>>>   (XEN) Xen WARN at msi.c:636
>>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>>> ....
>>
>>> In case it''s a NIC specific issue, I''m seeing it
with both
>>>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>>> Network Connection
>>> and
>>>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
>>> BCM5709 Gigabit Ethernet (rev 20)
>>> NICs
>>
>>> -Bruce
>>
>>
>>
>>
>>
>> --
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>>
>>


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-16 17:25 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

Probably there are more problems, you could also try a xen-unstable from before
the commit that changed this code (msi.c)
Another thing that could make it eassier to debug would be to put some
printk''s around the WARN_ON''s in msi.c  at the linenumbers
that gave the warnings, showing but parts of the equation in the WARN_ON

--

Sander

Saturday, October 16, 2010, 7:14:11 PM, you wrote:
> On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
> <linux@eikelenboom.it> wrote:
>> Hi Bruce,
>>
>> I tripped over the same warning trying to solve my freezes.
>> Jan Beulich has posted a patch which is not in xen-unstable yet:
[Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
>>
>> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>>         return 0;
>>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>>     {
>> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>>         if ( ++bir >= limit )
>>             return 0;
>>         return addr |
>>                ((u64)pci_conf_read32(bus, slot, func,
>>                                      PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>>     }
>> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>>  }
>>
>>  /**
>>
>>
>>
>> That fixes the warn, but my machine still keeps freezing non the less.
>> (but it also does so with pci=nomsi so it''s not msi specific
in my case)
>>
>> --
>>
>> Sander
> Hi Sander,
> Thank you.  I tried it against 4.1.0-22240 with no effect.
> I confirmed I had the right patch:
0 %>> hg diff  xen/arch/x86/msi.c
> diff -r 38ad3633ecaf xen/arch/x86/msi.c
> --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
> +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
> @@ -549,14 +549,14 @@
>          return 0;
>      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) =>
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>      {
> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>          if ( ++bir >= limit )
>              return 0;
>          return addr |
>                 ((u64)pci_conf_read32(bus, slot, func,
>                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>      }
> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>  }
>  /**
> The boot time msi warn messages were unchanged.
> -Bruce
>>
>> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
>>
>>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
>>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>>>> <gianni.tedesco@citrix.com> wrote:
>>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I''ve been trying to boot stefano''s
minimal dom0 kernel from
>>>>>>
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>>>>> 2.6.36-rc1-initial-domain-v2+pat
>>>>>>
>>>>>> On xen-unstable, I get the following
WARN_ON()''s from Xen when bringing
>>>>>> up the NIC''s, then the machine hangs forever
when trying to login either
>>>>>> over serial or NIC.
>>>>>>
>>>>>> (XEN) Xen WARN at msi.c:649
>>>>
>>>> I get the same Xen WARN messages using the current
pvops/xen-next with
>>>> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>>>>
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>>
>>>> The complete boot seq is attached.
>>>>
>>>> I do get a login at the end of the boot seq though.
>>>> My situation goes pear shaped when I try start a pv domU. The
dom0
>>>> locks up after printing this on the console:
>>>>
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> mapping kernel into physical memory
>>>> about to get started...
>>>>
>>>> then prints these once a minute:
>>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
>>>>
>>>> The xen console is still active and I can generate a diag dump,
also attached.
>>>>
>>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all
the way
>>>> to .24, rendering the later pvops kernels unusable for dom0.
>>>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>>>>
>>>> This behavior is consistent on platforms, HP proliant 380DL G6,
and
>>>> G7, as well as i7 supermicros.
>>>>
>>>> -Bruce
>>>>
>>>>>
>>>>> Hmm so this appears not to be an issue with XCP kernel, in
that case I
>>>>> get the warnings but everything still works fine.
>>>>>
>>>>> I will investigate further when I have some time.
>>>>>
>>>>> Gianni
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>
>>
>>> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN
at
>>> msi.c:636 " messages with associated stack traces.
>>
>>> I spent a little more time working with this version, and except
for
>>> these disconcerting messages, which do look like they are initiated
by
>>> the ethernet card discovery, the system appears functional.
>>> In all cases the first occurrence is immediately after the NIC
discovery:
>>
>>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
>>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
>>> | xen: registering gsi 16 triggering 0 polarity 1
>>> | xen_allocate_pirq: returning irq 16 for gsi 16
>>>   xen: --> irq=16
>>>   Already setup the GSI :16
>>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
>>>   e1000e 0000:06:00.0: setting latency timer to 64
>>>     alloc irq_desc for 493 on node 0
>>>     alloc kstat_irqs on node 0
>>>   (XEN) Xen WARN at msi.c:636
>>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>>> ....
>>
>>> In case it''s a NIC specific issue, I''m seeing it
with both
>>>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>>> Network Connection
>>> and
>>>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
>>> BCM5709 Gigabit Ethernet (rev 20)
>>> NICs
>>
>>> -Bruce
>>
>>
>>
>>
>>
>> --
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>>
>>


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Oct-16 17:26 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

Probably there are more problems, you could also try a xen-unstable from before
the commit that changed this code (msi.c)
Another thing that could make it eassier to debug would be to put some
printk''s around the WARN_ON''s in msi.c  at the linenumbers
that gave the warnings, showing but parts of the equation in the WARN_ON

--

Sander

Saturday, October 16, 2010, 7:14:11 PM, you wrote:
> On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
> <linux@eikelenboom.it> wrote:
>> Hi Bruce,
>>
>> I tripped over the same warning trying to solve my freezes.
>> Jan Beulich has posted a patch which is not in xen-unstable yet:
[Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
>>
>> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>>
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>>         return 0;
>>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>>     {
>> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>>         if ( ++bir >= limit )
>>             return 0;
>>         return addr |
>>                ((u64)pci_conf_read32(bus, slot, func,
>>                                      PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>>     }
>> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>>  }
>>
>>  /**
>>
>>
>>
>> That fixes the warn, but my machine still keeps freezing non the less.
>> (but it also does so with pci=nomsi so it''s not msi specific
in my case)
>>
>> --
>>
>> Sander
> Hi Sander,
> Thank you.  I tried it against 4.1.0-22240 with no effect.
> I confirmed I had the right patch:
0 %>> hg diff  xen/arch/x86/msi.c
> diff -r 38ad3633ecaf xen/arch/x86/msi.c
> --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
> +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
> @@ -549,14 +549,14 @@
>          return 0;
>      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) =>
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>      {
> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>          if ( ++bir >= limit )
>              return 0;
>          return addr |
>                 ((u64)pci_conf_read32(bus, slot, func,
>                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>      }
> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>  }
>  /**
> The boot time msi warn messages were unchanged.
> -Bruce
>>
>> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
>>
>>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
>>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>>>> <gianni.tedesco@citrix.com> wrote:
>>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I''ve been trying to boot stefano''s
minimal dom0 kernel from
>>>>>>
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>>>>>> 2.6.36-rc1-initial-domain-v2+pat
>>>>>>
>>>>>> On xen-unstable, I get the following
WARN_ON()''s from Xen when bringing
>>>>>> up the NIC''s, then the machine hangs forever
when trying to login either
>>>>>> over serial or NIC.
>>>>>>
>>>>>> (XEN) Xen WARN at msi.c:649
>>>>
>>>> I get the same Xen WARN messages using the current
pvops/xen-next with
>>>> xen-unstable, here''s the complete list for one boot,
grep''d for WARN:
>>>>
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>> (XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>>>> (XEN) Xen WARN at msi.c:649
>>>> (XEN) Xen WARN at msi.c:656
>>>>
>>>> The complete boot seq is attached.
>>>>
>>>> I do get a login at the end of the boot seq though.
>>>> My situation goes pear shaped when I try start a pv domU. The
dom0
>>>> locks up after printing this on the console:
>>>>
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> (XEN) tmem: all pools frozen for all domains
>>>> (XEN) tmem: all pools thawed for all domains
>>>> mapping kernel into physical memory
>>>> about to get started...
>>>>
>>>> then prints these once a minute:
>>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
>>>>
>>>> The xen console is still active and I can generate a diag dump,
also attached.
>>>>
>>>> This dom0 lockup behavior started with pv-ops 2.6.32.21, all
the way
>>>> to .24, rendering the later pvops kernels unusable for dom0.
>>>> The 2.6.32.18 kernel is the last one that functioned as a dom0.
>>>>
>>>> This behavior is consistent on platforms, HP proliant 380DL G6,
and
>>>> G7, as well as i7 supermicros.
>>>>
>>>> -Bruce
>>>>
>>>>>
>>>>> Hmm so this appears not to be an issue with XCP kernel, in
that case I
>>>>> get the warnings but everything still works fine.
>>>>>
>>>>> I will investigate further when I have some time.
>>>>>
>>>>> Gianni
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>
>>
>>> The latest xen-unstable, 22240 has the same "  (XEN) Xen WARN
at
>>> msi.c:636 " messages with associated stack traces.
>>
>>> I spent a little more time working with this version, and except
for
>>> these disconcerting messages, which do look like they are initiated
by
>>> the ethernet card discovery, the system appears functional.
>>> In all cases the first occurrence is immediately after the NIC
discovery:
>>
>>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
>>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
>>> | xen: registering gsi 16 triggering 0 polarity 1
>>> | xen_allocate_pirq: returning irq 16 for gsi 16
>>>   xen: --> irq=16
>>>   Already setup the GSI :16
>>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
>>>   e1000e 0000:06:00.0: setting latency timer to 64
>>>     alloc irq_desc for 493 on node 0
>>>     alloc kstat_irqs on node 0
>>>   (XEN) Xen WARN at msi.c:636
>>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>>> ....
>>
>>> In case it''s a NIC specific issue, I''m seeing it
with both
>>>     06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit
>>> Network Connection
>>> and
>>>     02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
>>> BCM5709 Gigabit Ethernet (rev 20)
>>> NICs
>>
>>> -Bruce
>>
>>
>>
>>
>>
>> --
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>>
>>


-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-17 20:19 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Sat, Oct 16, 2010 at 10:26 AM, Sander Eikelenboom
<linux@eikelenboom.it> wrote:>
> Probably there are more problems, you could also try a xen-unstable from
before the commit that changed this code (msi.c)
> Another thing that could make it eassier to debug would be to put some
printk''s around the WARN_ON''s in msi.c  at the linenumbers
that gave the warnings, showing but parts of the equation in the WARN_ON
>
Good idea.

Here''s the debug stuff I added (so the printk output will make sense):

diff -r 3a5755249361 xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c        Thu Oct 14 12:46:29 2010 +0100
+++ b/xen/arch/x86/msi.c        Sun Oct 17 13:18:06 2010 -0700
@@ -549,14 +549,14 @@
         return 0;
     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
=PCI_BASE_ADDRESS_MEM_TYPE_64 )
     {
-        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        addr &= PCI_BASE_ADDRESS_MEM_MASK;
         if ( ++bir >= limit )
             return 0;
         return addr |
                ((u64)pci_conf_read32(bus, slot, func,
                                      PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
     }
-    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+    return addr & PCI_BASE_ADDRESS_MEM_MASK;
 }

 /**
@@ -633,7 +633,15 @@
         u32 pba_offset;

         ASSERT(!dev->msix_used_entries);
-        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func,
bir)); // XXX
+        if(msi->table_base != read_pci_mem_bar(bus, slot, func, bir)); {
+                       printk(
"==================================================\n");
+                       printk( "msi->table_base !read_pci_mem_bar(bus,
slot, func, bir)\n");
+                       printk( "msi->table_base = %0lx\n",
msi->table_base );
+                       printk( "read_pci_mem_bar = %0lx\n",
read_pci_mem_bar(bus, slot, func, bir) );
+                       printk( "bus=%0x, slot=%0x, func=%0x,
bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }

         dev->msix_nr_entries = nr_entries;
         dev->msix_table.first = PFN_DOWN(table_paddr);
@@ -646,14 +654,27 @@
                                      msix_pba_offset_reg(pos));
         bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
         pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
-        WARN_ON(!pba_paddr);
+        WARN_ON(!pba_paddr); // XXX
+        if (!pba_paddr) {
+                       printk(
"==================================================\n");
+                       printk( "No pba_addr: bus=%0x, slot=%0x,
func=%0x, bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }
         pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;

         dev->msix_pba.first = PFN_DOWN(pba_paddr);
         dev->msix_pba.last = PFN_DOWN(pba_paddr +
                                       BITS_TO_LONGS(nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
-                                        dev->msix_pba.last));
+                                        dev->msix_pba.last)); // XXX
+        if ( ! rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last)) {
+                       printk(
"==================================================\n");
+                       printk( "rangeset_overlaps_range\n" );
+                       printk( "mmio_ro_ranges = %p,
dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n",
+                                       mmio_ro_ranges,
dev->msix_pba.first, dev->msix_pba.last);
+                       printk(
"==================================================\n\n");
+               }

         if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
                                 dev->msix_table.last) )

The boot log from this patched msi.c is attached. Let me know what
else I can add to help track down this issue.

Also, here''s the pci config of dom0, although I think it''s the
NIC''s
that are responsible for this:

00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI
Port (rev 12)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 1 (rev 12)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 3 (rev 12)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express
Root Port 5 (rev 12)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 7 (rev 12)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
Express Root Port 9 (rev 12)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management
Registers (rev 12)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch
Pad Registers (rev 12)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status
and RAS Registers (rev 12)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 12)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI
Express Root Port 1
00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 2
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
01:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
04:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
04:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
05:00.0 SCSI storage controller: LSI Logic / Symbios Logic MegaRAID
SAS 8208ELP/8208ELP (rev 08)
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
08:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
WPCM450 (rev 0a)
ff:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
Architecture Generic Non-Core Registers (rev 04)
ff:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
Architecture System Address Decoder (rev 04)
ff:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 04)
ff:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev 04)
ff:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller (rev 04)
ff:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Target Address Decoder (rev 04)
ff:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Test Registers (rev 04)
ff:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Control Registers (rev 04)
ff:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Address Registers (rev 04)
ff:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Rank Registers (rev 04)
ff:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 0 Thermal Control Registers (rev 04)
ff:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Control Registers (rev 04)
ff:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Address Registers (rev 04)
ff:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Rank Registers (rev 04)
ff:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 1 Thermal Control Registers (rev 04)
ff:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Control Registers (rev 04)
ff:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Address Registers (rev 04)
ff:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Rank Registers (rev 04)
ff:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
Memory Controller Channel 2 Thermal Control Registers (rev 04)

Thanks

-Bruce
>
> --
>
> Sander
>
> Saturday, October 16, 2010, 7:14:11 PM, you wrote:
>
> > On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
> > <linux@eikelenboom.it> wrote:
> >> Hi Bruce,
> >>
> >> I tripped over the same warning trying to solve my freezes.
> >> Jan Beulich has posted a patch which is not in xen-unstable yet:
[Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@novell.com>
> >>
> >> --- a/xen/arch/x86/msi.c
> >> +++ b/xen/arch/x86/msi.c
> >> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
> >>         return 0;
> >>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
> >>     {
> >> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> >> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
> >>         if ( ++bir >= limit )
> >>             return 0;
> >>         return addr |
> >>                ((u64)pci_conf_read32(bus, slot, func,
> >>                                      PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
> >>     }
> >> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> >> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
> >>  }
> >>
> >>  /**
> >>
> >>
> >>
> >> That fixes the warn, but my machine still keeps freezing non the
less.
> >> (but it also does so with pci=nomsi so it''s not msi
specific in my case)
> >>
> >> --
> >>
> >> Sander
>
> > Hi Sander,
>
> > Thank you.  I tried it against 4.1.0-22240 with no effect.
> > I confirmed I had the right patch:
>
> 0 %>> hg diff  xen/arch/x86/msi.c
>
> > diff -r 38ad3633ecaf xen/arch/x86/msi.c
> > --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
> > +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
> > @@ -549,14 +549,14 @@
> >          return 0;
> >      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) => >
PCI_BASE_ADDRESS_MEM_TYPE_64 )
> >      {
> > -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> > +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
> >          if ( ++bir >= limit )
> >              return 0;
> >          return addr |
> >                 ((u64)pci_conf_read32(bus, slot, func,
> >                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
> >      }
> > -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> > +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
> >  }
>
> >  /**
>
> > The boot time msi warn messages were unchanged.
>
> > -Bruce
>
> >>
> >> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
> >>
> >>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
> >>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
> >>>> <gianni.tedesco@citrix.com> wrote:
> >>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco
wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I''ve been trying to boot
stefano''s minimal dom0 kernel from
> >>>>>>
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
> >>>>>> 2.6.36-rc1-initial-domain-v2+pat
> >>>>>>
> >>>>>> On xen-unstable, I get the following
WARN_ON()''s from Xen when bringing
> >>>>>> up the NIC''s, then the machine hangs
forever when trying to login either
> >>>>>> over serial or NIC.
> >>>>>>
> >>>>>> (XEN) Xen WARN at msi.c:649
> >>>>
> >>>> I get the same Xen WARN messages using the current
pvops/xen-next with
> >>>> xen-unstable, here''s the complete list for one
boot, grep''d for WARN:
> >>>>
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>> (XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
> >>>> (XEN) Xen WARN at msi.c:649
> >>>> (XEN) Xen WARN at msi.c:656
> >>>>
> >>>> The complete boot seq is attached.
> >>>>
> >>>> I do get a login at the end of the boot seq though.
> >>>> My situation goes pear shaped when I try start a pv domU.
The dom0
> >>>> locks up after printing this on the console:
> >>>>
> >>>> (XEN) tmem: all pools frozen for all domains
> >>>> (XEN) tmem: all pools thawed for all domains
> >>>> (XEN) tmem: all pools frozen for all domains
> >>>> (XEN) tmem: all pools thawed for all domains
> >>>> mapping kernel into physical memory
> >>>> about to get started...
> >>>>
> >>>> then prints these once a minute:
> >>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
> >>>>
> >>>> The xen console is still active and I can generate a diag
dump, also attached.
> >>>>
> >>>> This dom0 lockup behavior started with pv-ops 2.6.32.21,
all the way
> >>>> to .24, rendering the later pvops kernels unusable for
dom0.
> >>>> The 2.6.32.18 kernel is the last one that functioned as a
dom0.
> >>>>
> >>>> This behavior is consistent on platforms, HP proliant
380DL G6, and
> >>>> G7, as well as i7 supermicros.
> >>>>
> >>>> -Bruce
> >>>>
> >>>>>
> >>>>> Hmm so this appears not to be an issue with XCP
kernel, in that case I
> >>>>> get the warnings but everything still works fine.
> >>>>>
> >>>>> I will investigate further when I have some time.
> >>>>>
> >>>>> Gianni
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@lists.xensource.com
> >>>>> http://lists.xensource.com/xen-devel
> >>>>>
> >>>>
> >>
> >>> The latest xen-unstable, 22240 has the same "  (XEN) Xen
WARN at
> >>> msi.c:636 " messages with associated stack traces.
> >>
> >>> I spent a little more time working with this version, and
except for
> >>> these disconcerting messages, which do look like they are
initiated by
> >>> the ethernet card discovery, the system appears functional.
> >>> In all cases the first occurrence is immediately after the NIC
discovery:
> >>
> >>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
> >>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
> >>> | xen: registering gsi 16 triggering 0 polarity 1
> >>> | xen_allocate_pirq: returning irq 16 for gsi 16
> >>>   xen: --> irq=16
> >>>   Already setup the GSI :16
> >>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low)
-> IRQ 16
> >>>   e1000e 0000:06:00.0: setting latency timer to 64
> >>>     alloc irq_desc for 493 on node 0
> >>>     alloc kstat_irqs on node 0
> >>>   (XEN) Xen WARN at msi.c:636
> >>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted
]----
> >>> ....
> >>
> >>> In case it''s a NIC specific issue, I''m
seeing it with both
> >>>     06:00.0 Ethernet controller: Intel Corporation 82574L
Gigabit
> >>> Network Connection
> >>> and
> >>>     02:00.0 Ethernet controller: Broadcom Corporation
NetXtreme II
> >>> BCM5709 Gigabit Ethernet (rev 20)
> >>> NICs
> >>
> >>> -Bruce
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>  Sander                            mailto:linux@eikelenboom.it
> >>
> >>
>
>
>
> --
> Best regards,
>  Sander                            mailto:linux@eikelenboom.it
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-17 22:33 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Sun, Oct 17, 2010 at 1:19 PM, Bruce Edge <bruce.edge@gmail.com>
wrote:> On Sat, Oct 16, 2010 at 10:26 AM, Sander Eikelenboom
> <linux@eikelenboom.it> wrote:
>>
>> Probably there are more problems, you could also try a xen-unstable
from before the commit that changed this code (msi.c)
>> Another thing that could make it eassier to debug would be to put some
printk''s around the WARN_ON''s in msi.c  at the linenumbers
that gave the warnings, showing but parts of the equation in the WARN_ON
>>
>
> Good idea.
>
> Here''s the debug stuff I added (so the printk output will make
sense):
Apologies, jumped the gun on the post, trying to do too many things at
once. Ignore it, use this diff & output instead.

Fixed errors in the printk logic. Here''s the diff:

diff -r 3a5755249361 xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c        Thu Oct 14 12:46:29 2010 +0100
+++ b/xen/arch/x86/msi.c        Sun Oct 17 15:32:05 2010 -0700
@@ -549,14 +549,14 @@
         return 0;
     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
=PCI_BASE_ADDRESS_MEM_TYPE_64 )
     {
-        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        addr &= PCI_BASE_ADDRESS_MEM_MASK;
         if ( ++bir >= limit )
             return 0;
         return addr |
                ((u64)pci_conf_read32(bus, slot, func,
                                      PCI_BASE_ADDRESS_0 + bir * 4) <<
32);
     }
-    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+    return addr & PCI_BASE_ADDRESS_MEM_MASK;
 }

 /**
@@ -634,6 +634,14 @@

         ASSERT(!dev->msix_used_entries);
         WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+        if(msi->table_base == read_pci_mem_bar(bus, slot, func, bir)) { //
XXX
+                       printk(
"==================================================\n");
+                       printk( "msi->table_base !read_pci_mem_bar(bus,
slot, func, bir)\n");
+                       printk( "msi->table_base = %0lx\n",
msi->table_base );
+                       printk( "read_pci_mem_bar = %0lx\n",
read_pci_mem_bar(bus, slot, func, bir) );
+                       printk( "bus=%0x, slot=%0x, func=%0x,
bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }

         dev->msix_nr_entries = nr_entries;
         dev->msix_table.first = PFN_DOWN(table_paddr);
@@ -647,6 +655,11 @@
         bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
         pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
         WARN_ON(!pba_paddr);
+        if (!pba_paddr) { // XXX
+                       printk(
"==================================================\n");
+                       printk( "No pba_addr: bus=%0x, slot=%0x,
func=%0x, bir=%0x\n", bus, slot, func, bir);
+                       printk(
"==================================================\n\n");
+               }
         pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;

         dev->msix_pba.first = PFN_DOWN(pba_paddr);
@@ -654,6 +667,14 @@
                                       BITS_TO_LONGS(nr_entries) - 1);
         WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
                                         dev->msix_pba.last));
+        if ( rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last)) { // XXX
+                       printk(
"==================================================\n");
+                       printk( "rangeset_overlaps_range\n" );
+                       printk( "mmio_ro_ranges = %p,
dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n",
+                                       mmio_ro_ranges,
dev->msix_pba.first, dev->msix_pba.last);
+                       printk(
"==================================================\n\n");
+               }

         if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
                                 dev->msix_table.last) )


The updated boot log is attached.

-Bruce
> Also, here''s the pci config of dom0, although I think
it''s the NIC''s
> that are responsible for this:
>
> 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI
> Port (rev 12)
> 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 1 (rev 12)
> 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 3 (rev 12)
> 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express
> Root Port 5 (rev 12)
> 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 7 (rev 12)
> 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
> Express Root Port 9 (rev 12)
> 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management
> Registers (rev 12)
> 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch
> Pad Registers (rev 12)
> 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status
> and RAS Registers (rev 12)
> 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers
(rev 12)
> 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
> QuickData Technology Device (rev 12)
> 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #4
> 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #5
> 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #6
> 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
> EHCI Controller #2
> 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI
> Express Root Port 1
> 00:1c.1 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express
Port 2
> 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #1
> 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #2
> 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB
> UHCI Controller #3
> 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2
> EHCI Controller #1
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface
Controller
> 00:1f.2 RAID bus controller: Intel Corporation 82801 SATA RAID Controller
> 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
> 01:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
> 01:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
> 04:00.0 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
> 04:00.1 Fibre Channel: PMC-Sierra Inc. Device 8032 (rev 05)
> 05:00.0 SCSI storage controller: LSI Logic / Symbios Logic MegaRAID
> SAS 8208ELP/8208ELP (rev 08)
> 06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
> 07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
> 08:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW
> WPCM450 (rev 0a)
> ff:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
> Architecture Generic Non-Core Registers (rev 04)
> ff:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath
> Architecture System Address Decoder (rev 04)
> ff:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev
04)
> ff:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0
(rev 04)
> ff:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller (rev 04)
> ff:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Target Address Decoder (rev 04)
> ff:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Test Registers (rev 04)
> ff:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 0 Control Registers (rev 04)
> ff:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 0 Address Registers (rev 04)
> ff:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 0 Rank Registers (rev 04)
> ff:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 0 Thermal Control Registers (rev 04)
> ff:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 1 Control Registers (rev 04)
> ff:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 1 Address Registers (rev 04)
> ff:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 1 Rank Registers (rev 04)
> ff:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 1 Thermal Control Registers (rev 04)
> ff:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 2 Control Registers (rev 04)
> ff:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 2 Address Registers (rev 04)
> ff:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 2 Rank Registers (rev 04)
> ff:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated
> Memory Controller Channel 2 Thermal Control Registers (rev 04)
>
> Thanks
>
> -Bruce
>
>>
>> --
>>
>> Sander
>>
>> Saturday, October 16, 2010, 7:14:11 PM, you wrote:
>>
>> > On Sat, Oct 16, 2010 at 9:29 AM, Sander Eikelenboom
>> > <linux@eikelenboom.it> wrote:
>> >> Hi Bruce,
>> >>
>> >> I tripped over the same warning trying to solve my freezes.
>> >> Jan Beulich has posted a patch which is not in xen-unstable
yet: [Xen-devel] [PATCH] x86/msi: fix inverted masks in c/s 22182:68cc3c514a0a
>> >>
>> >> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>> >>
>> >> --- a/xen/arch/x86/msi.c
>> >> +++ b/xen/arch/x86/msi.c
>> >> @@ -549,14 +549,14 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>> >>         return 0;
>> >>     if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>> >>     {
>> >> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> >> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>> >>         if ( ++bir >= limit )
>> >>             return 0;
>> >>         return addr |
>> >>                ((u64)pci_conf_read32(bus, slot, func,
>> >>                                      PCI_BASE_ADDRESS_0 + bir
* 4) << 32);
>> >>     }
>> >> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> >> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>> >>  }
>> >>
>> >>  /**
>> >>
>> >>
>> >>
>> >> That fixes the warn, but my machine still keeps freezing non
the less.
>> >> (but it also does so with pci=nomsi so it''s not msi
specific in my case)
>> >>
>> >> --
>> >>
>> >> Sander
>>
>> > Hi Sander,
>>
>> > Thank you.  I tried it against 4.1.0-22240 with no effect.
>> > I confirmed I had the right patch:
>>
>> 0 %>> hg diff  xen/arch/x86/msi.c
>>
>> > diff -r 38ad3633ecaf xen/arch/x86/msi.c
>> > --- a/xen/arch/x86/msi.c        Wed Oct 13 12:01:30 2010 +0100
>> > +++ b/xen/arch/x86/msi.c        Sat Oct 16 10:12:31 2010 -0700
>> > @@ -549,14 +549,14 @@
>> >          return 0;
>> >      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) =>>
> PCI_BASE_ADDRESS_MEM_TYPE_64 )
>> >      {
>> > -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> > +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>> >          if ( ++bir >= limit )
>> >              return 0;
>> >          return addr |
>> >                 ((u64)pci_conf_read32(bus, slot, func,
>> >                                       PCI_BASE_ADDRESS_0 + bir *
4) << 32);
>> >      }
>> > -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> > +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>> >  }
>>
>> >  /**
>>
>> > The boot time msi warn messages were unchanged.
>>
>> > -Bruce
>>
>> >>
>> >> Saturday, October 16, 2010, 6:14:17 PM, you wrote:
>> >>
>> >>> On Mon, Oct 11, 2010 at 2:05 PM, Bruce Edge
<bruce.edge@gmail.com> wrote:
>> >>>> On Mon, Oct 11, 2010 at 10:12 AM, Gianni Tedesco
>> >>>> <gianni.tedesco@citrix.com> wrote:
>> >>>>> On Fri, 2010-10-08 at 10:33 +0100, Gianni Tedesco
wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I''ve been trying to boot
stefano''s minimal dom0 kernel from
>> >>>>>>
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>> >>>>>> 2.6.36-rc1-initial-domain-v2+pat
>> >>>>>>
>> >>>>>> On xen-unstable, I get the following
WARN_ON()''s from Xen when bringing
>> >>>>>> up the NIC''s, then the machine hangs
forever when trying to login either
>> >>>>>> over serial or NIC.
>> >>>>>>
>> >>>>>> (XEN) Xen WARN at msi.c:649
>> >>>>
>> >>>> I get the same Xen WARN messages using the current
pvops/xen-next with
>> >>>> xen-unstable, here''s the complete list for
one boot, grep''d for WARN:
>> >>>>
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:656
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:656
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:656
>> >>>> (XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN)    0000000080287db8 0(XEN) Xen WARN at msi.c:636
>> >>>> (XEN) Xen WARN at msi.c:649
>> >>>> (XEN) Xen WARN at msi.c:656
>> >>>>
>> >>>> The complete boot seq is attached.
>> >>>>
>> >>>> I do get a login at the end of the boot seq though.
>> >>>> My situation goes pear shaped when I try start a pv
domU. The dom0
>> >>>> locks up after printing this on the console:
>> >>>>
>> >>>> (XEN) tmem: all pools frozen for all domains
>> >>>> (XEN) tmem: all pools thawed for all domains
>> >>>> (XEN) tmem: all pools frozen for all domains
>> >>>> (XEN) tmem: all pools thawed for all domains
>> >>>> mapping kernel into physical memory
>> >>>> about to get started...
>> >>>>
>> >>>> then prints these once a minute:
>> >>>> [  589.490894] BUG: soft lockup - CPU#0 stuck for 61s!
[swapper:0]
>> >>>>
>> >>>> The xen console is still active and I can generate a
diag dump, also attached.
>> >>>>
>> >>>> This dom0 lockup behavior started with pv-ops
2.6.32.21, all the way
>> >>>> to .24, rendering the later pvops kernels unusable for
dom0.
>> >>>> The 2.6.32.18 kernel is the last one that functioned
as a dom0.
>> >>>>
>> >>>> This behavior is consistent on platforms, HP proliant
380DL G6, and
>> >>>> G7, as well as i7 supermicros.
>> >>>>
>> >>>> -Bruce
>> >>>>
>> >>>>>
>> >>>>> Hmm so this appears not to be an issue with XCP
kernel, in that case I
>> >>>>> get the warnings but everything still works fine.
>> >>>>>
>> >>>>> I will investigate further when I have some time.
>> >>>>>
>> >>>>> Gianni
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Xen-devel mailing list
>> >>>>> Xen-devel@lists.xensource.com
>> >>>>> http://lists.xensource.com/xen-devel
>> >>>>>
>> >>>>
>> >>
>> >>> The latest xen-unstable, 22240 has the same "  (XEN)
Xen WARN at
>> >>> msi.c:636 " messages with associated stack traces.
>> >>
>> >>> I spent a little more time working with this version, and
except for
>> >>> these disconcerting messages, which do look like they are
initiated by
>> >>> the ethernet card discovery, the system appears
functional.
>> >>> In all cases the first occurrence is immediately after the
NIC discovery:
>> >>
>> >>>  e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
>> >>> | e1000e: Copyright (c) 1999-2008 Intel Corporation.
>> >>> | xen: registering gsi 16 triggering 0 polarity 1
>> >>> | xen_allocate_pirq: returning irq 16 for gsi 16
>> >>>   xen: --> irq=16
>> >>>   Already setup the GSI :16
>> >>>   e1000e 0000:06:00.0: PCI INT A -> GSI 16 (level, low)
-> IRQ 16
>> >>>   e1000e 0000:06:00.0: setting latency timer to 64
>> >>>     alloc irq_desc for 493 on node 0
>> >>>     alloc kstat_irqs on node 0
>> >>>   (XEN) Xen WARN at msi.c:636
>> >>>   (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not
tainted ]----
>> >>> ....
>> >>
>> >>> In case it''s a NIC specific issue, I''m
seeing it with both
>> >>>     06:00.0 Ethernet controller: Intel Corporation 82574L
Gigabit
>> >>> Network Connection
>> >>> and
>> >>>     02:00.0 Ethernet controller: Broadcom Corporation
NetXtreme II
>> >>> BCM5709 Gigabit Ethernet (rev 20)
>> >>> NICs
>> >>
>> >>> -Bruce
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >>  Sander                            mailto:linux@eikelenboom.it
>> >>
>> >>
>>
>>
>>
>> --
>> Best regards,
>>  Sander                            mailto:linux@eikelenboom.it
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-18 08:24 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

>>> On 18.10.10 at 00:33, Bruce Edge <bruce.edge@gmail.com>
wrote:
> diff -r 3a5755249361 xen/arch/x86/msi.c
> --- a/xen/arch/x86/msi.c        Thu Oct 14 12:46:29 2010 +0100
> +++ b/xen/arch/x86/msi.c        Sun Oct 17 15:32:05 2010 -0700
> @@ -549,14 +549,14 @@
>          return 0;
>      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>      {
> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>          if ( ++bir >= limit )
>              return 0;
>          return addr |
>                 ((u64)pci_conf_read32(bus, slot, func,
>                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>      }
> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>  }
> 
>  /**
> @@ -634,6 +634,14 @@
> 
>          ASSERT(!dev->msix_used_entries);
>          WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func,
bir));
> +        if(msi->table_base == read_pci_mem_bar(bus, slot, func, bir)) {
// XXX
Did you perhaps mean != here? The log you provided shows the two
values to be identical when these printk()s get executed (which raises
the question how the warning could get triggered, the more on line
635 when it sits on line 636 according to the patch).
> +                       printk(
"==================================================\n");
> +                       printk( "msi->table_base !=
read_pci_mem_bar(bus, slot, func, bir)\n");
> +                       printk( "msi->table_base = %0lx\n",
msi->table_base );
> +                       printk( "read_pci_mem_bar = %0lx\n",
read_pci_mem_bar(bus, slot, func, bir) );
> +                       printk( "bus=%0x, slot=%0x, func=%0x,
bir=%0x\n", bus, slot, func, bir);
> +                       printk(
"==================================================\n\n");
> +               }
> 
>          dev->msix_nr_entries = nr_entries;
>          dev->msix_table.first = PFN_DOWN(table_paddr);
> @@ -647,6 +655,11 @@
>          bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
>          pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
>          WARN_ON(!pba_paddr);
Similar here: the warning sits on line 657, but the log shows warnings
only on lines 635, 639, and 660. Something''s out of sync here.
> +        if (!pba_paddr) { // XXX
> +                       printk(
"==================================================\n");
> +                       printk( "No pba_addr: bus=%0x, slot=%0x,
func=%0x, bir=%0x\n", bus, slot, func, bir);
> +                       printk(
"==================================================\n\n");
> +               }
>          pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
> 
>          dev->msix_pba.first = PFN_DOWN(pba_paddr);
> @@ -654,6 +667,14 @@
>                                        BITS_TO_LONGS(nr_entries) - 1);
>          WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
dev->msix_pba.first,
>                                          dev->msix_pba.last));
> +        if ( rangeset_overlaps_range(mmio_ro_ranges,
dev->msix_pba.first,
> +                                        dev->msix_pba.last)) { // XXX
> +                       printk(
"==================================================\n");
> +                       printk( "rangeset_overlaps_range\n" );
> +                       printk( "mmio_ro_ranges = %p,
dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n",
> +                                       mmio_ro_ranges,
dev->msix_pba.first, dev->msix_pba.last);
> +                       printk(
"==================================================\n\n");
> +               }
> 
>          if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>                                  dev->msix_table.last) )
Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-18 16:32 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Mon, Oct 18, 2010 at 1:24 AM, Jan Beulich <JBeulich@novell.com>
wrote:>>>> On 18.10.10 at 00:33, Bruce Edge <bruce.edge@gmail.com>
wrote:
>> diff -r 3a5755249361 xen/arch/x86/msi.c
>> --- a/xen/arch/x86/msi.c        Thu Oct 14 12:46:29 2010 +0100
>> +++ b/xen/arch/x86/msi.c        Sun Oct 17 15:32:05 2010 -0700
>> @@ -549,14 +549,14 @@
>>          return 0;
>>      if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64 )
>>      {
>> -        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>> +        addr &= PCI_BASE_ADDRESS_MEM_MASK;
>>          if ( ++bir >= limit )
>>              return 0;
>>          return addr |
>>                 ((u64)pci_conf_read32(bus, slot, func,
>>                                       PCI_BASE_ADDRESS_0 + bir * 4)
<< 32);
>>      }
>> -    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>> +    return addr & PCI_BASE_ADDRESS_MEM_MASK;
>>  }
>>
>>  /**
>> @@ -634,6 +634,14 @@
>>
>>          ASSERT(!dev->msix_used_entries);
>>          WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot,
func, bir));
>> +        if(msi->table_base == read_pci_mem_bar(bus, slot, func,
bir)) { // XXX
>
> Did you perhaps mean != here? The log you provided shows the two
> values to be identical when these printk()s get executed (which raises
> the question how the warning could get triggered, the more on line
> 635 when it sits on line 636 according to the patch).
>
>> +                       printk(
"==================================================\n");
>> +                       printk( "msi->table_base !=
read_pci_mem_bar(bus, slot, func, bir)\n");
>> +                       printk( "msi->table_base =
%0lx\n", msi->table_base );
>> +                       printk( "read_pci_mem_bar = %0lx\n",
read_pci_mem_bar(bus, slot, func, bir) );
>> +                       printk( "bus=%0x, slot=%0x, func=%0x,
bir=%0x\n", bus, slot, func, bir);
>> +                       printk(
"==================================================\n\n");
>> +               }
>>
>>          dev->msix_nr_entries = nr_entries;
>>          dev->msix_table.first = PFN_DOWN(table_paddr);
>> @@ -647,6 +655,11 @@
>>          bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
>>          pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
>>          WARN_ON(!pba_paddr);
>
> Similar here: the warning sits on line 657, but the log shows warnings
> only on lines 635, 639, and 660. Something''s out of sync here.
>
>> +        if (!pba_paddr) { // XXX
>> +                       printk(
"==================================================\n");
>> +                       printk( "No pba_addr: bus=%0x, slot=%0x,
func=%0x, bir=%0x\n", bus, slot, func, bir);
>> +                       printk(
"==================================================\n\n");
>> +               }
>>          pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
>>
>>          dev->msix_pba.first = PFN_DOWN(pba_paddr);
>> @@ -654,6 +667,14 @@
>>                                        BITS_TO_LONGS(nr_entries) - 1);
>>          WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
dev->msix_pba.first,
>>                                          dev->msix_pba.last));
>> +        if ( rangeset_overlaps_range(mmio_ro_ranges,
dev->msix_pba.first,
>> +                                        dev->msix_pba.last)) { //
XXX
>> +                       printk(
"==================================================\n");
>> +                       printk( "rangeset_overlaps_range\n"
);
>> +                       printk( "mmio_ro_ranges = %p,
dev->msix_pba.first = %0lx, dev->msix_pba.last = %0lx\n",
>> +                                       mmio_ro_ranges,
dev->msix_pba.first, dev->msix_pba.last);
>> +                       printk(
"==================================================\n\n");
>> +               }
>>
>>          if ( rangeset_add_range(mmio_ro_ranges,
dev->msix_table.first,
>>                                  dev->msix_table.last) )
>
> Jan
>
>
Jan,
You''re right, wrong again. I was thrown by the fact last "Xen WARN
at
msi.c:636" prints no debug data, so I thought I had the sense wrong. I
don''t know what happened to the last WARN''s printks.

Anyway, here''s output again, this time with msi.c as well so you can
correlate the line numbers without patching.

kjournald starting.  Commit interval 5 seconds
init: ureadahead main process (424) terminated with status 5
(XEN) Xen WARN at msi.c:636
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015b9d5>] pci_enable_msi+0x476/0xacf
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82c480287ea8   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480249940
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff83011fff4004
(XEN) r9:  ffff830000000000   r10: ffff82c48020e7e0   r11: 0000000000000217
(XEN) r12: 0000000000000000   r13: ffff83011ff7fed0   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007f9bbe730000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    ffff82c480287d18 ffff83011feca000 000082c480287e28 ffff83011ff7ff68
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0fe 00000000fa0fe000
(XEN)    0000008200000246 00000000fa0fe000 0000000080120257 0000000000000016
(XEN)    00000000fa0fc000 00000000000fa0fe 0000000000000080 000000001fee1228
(XEN)    0000000000000016 ffff83011a625700 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000109 0000000000000027 00000000ffffffed ffff83011ff81400
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000027
(XEN)    ffff82c480287ea8 0000000000000424 000000000000009c ffff83011ff7fed0
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000109 ffff83011feca000 0000000000000027 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff82c400000004
(XEN)    ffffffff813c05b4 ffff82c480287ea8 0000000000007ff0 ffffffffffffffff
(XEN)    000000b000000000 0000000000000000 00000000fa0fc000 aaaaaaaaaaaaaaaa
(XEN)    000000b000000000 0000000000000027 00000000fa0fc000 0000000000000000
(XEN)    ffff82c480287ed8 ffff8300df4ce000 00000000000001e7 0000000000000011
(XEN)    ffff88003d4ebec0 0000000000000080 00007d3b7fd780c7 ffff82c4801fd012
(XEN)    ffffffff8100942a 0000000000000021 0000000000000080 ffff88003d4ebec0
(XEN)    0000000000000011 00000000000001e7 ffff88003cba3aa8 0000000000007ff0
(XEN)    0000000000000217 ffffffff819c4760 000000000000000a ffff88000312c9e0
(XEN)    0000000000000021 ffffffff8100942a ffff88003c8d846c ffff88003cba3a08
(XEN) Xen call trace:
(XEN)    [<ffff82c48015b9d5>] pci_enable_msi+0x476/0xacf
(XEN)    [<ffff82c48015d417>] map_domain_pirq+0x28e/0x37b
(XEN)    [<ffff82c48017089b>] do_physdev_op+0x7fb/0x1050
(XEN)    [<ffff82c4801fd012>] syscall_enter+0xf2/0x14c
(XEN)
(XEN) =================================================(XEN) msi->table_base
!= read_pci_mem_bar(bus, slot, func, bir)
(XEN) msi->table_base = fa0fc000
(XEN) read_pci_mem_bar = 0
(XEN) bus=0, slot=16, func=0, bir=0
(XEN) =================================================(XEN)
(XEN) Xen WARN at msi.c:657
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015bb06>] pci_enable_msi+0x5a7/0xacf
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480249940
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff82c4802bf390
(XEN) r9:  0000000000000000   r10: 00000000fffffffe   r11: ffff82c480209260
(XEN) r12: 0000000000003000   r13: ffff83011ff7fed0   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007f9bbe730000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    ffff82c480287d18 ffff83011feca000 000082c480287e28 ffff83011ff7ff68
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0fe 00000000fa0fe000
(XEN)    0000008200000246 00000000fa0fe000 0000000080120257 0000000000000016
(XEN)    00000000fa0fc000 00000000000fa0fe 0000000000000000 000000001fee1228
(XEN)    0000000000000016 ffff83011a625700 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000109 0000000000000027 00000000ffffffed ffff83011ff81400
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000027
(XEN)    ffff82c480287ea8 0000000000000424 000000000000009c ffff83011ff7fed0
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000109 ffff83011feca000 0000000000000027 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff82c400000004
(XEN)    ffffffff813c05b4 ffff82c480287ea8 0000000000007ff0 ffffffffffffffff
(XEN)    000000b000000000 0000000000000000 00000000fa0fc000 aaaaaaaaaaaaaaaa
(XEN)    000000b000000000 0000000000000027 00000000fa0fc000 0000000000000000
(XEN)    ffff82c480287ed8 ffff8300df4ce000 00000000000001e7 0000000000000011
(XEN)    ffff88003d4ebec0 0000000000000080 00007d3b7fd780c7 ffff82c4801fd012
(XEN)    ffffffff8100942a 0000000000000021 0000000000000080 ffff88003d4ebec0
(XEN)    0000000000000011 00000000000001e7 ffff88003cba3aa8 0000000000007ff0
(XEN)    0000000000000217 ffffffff819c4760 000000000000000a ffff88000312c9e0
(XEN)    0000000000000021 ffffffff8100942a ffff88003c8d846c ffff88003cba3a08
(XEN) Xen call trace:
(XEN)    [<ffff82c48015bb06>] pci_enable_msi+0x5a7/0xacf
(XEN)    [<ffff82c48015d417>] map_domain_pirq+0x28e/0x37b
(XEN)    [<ffff82c48017089b>] do_physdev_op+0x7fb/0x1050
(XEN)    [<ffff82c4801fd012>] syscall_enter+0xf2/0x14c
(XEN)
(XEN) =================================================(XEN) No pba_addr: bus=0,
slot=16, func=0, bir=0
(XEN) bus=0, slot=16, func=0, bir=0
(XEN) =================================================(XEN)
(XEN) Xen WARN at msi.c:636
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015b9d5>] pci_enable_msi+0x476/0xacf
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82c480287ea8   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480249940
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff83011fff4004
(XEN) r9:  ffff830000000000   r10: ffff82c48020e7e0   r11: 0000000000000217
(XEN) r12: 0000000000000000   r13: ffff83011ff7e010   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007f9bbe730000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    0000000000000082 ffff82c480287d28 000082c480120257 ffff83011ff7e0a8
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0fa 00000000fa0fa000
(XEN)    0000008200000246 00000000fa0fa000 0000000080120257 0000000100000016
(XEN)    00000000fa0f8000 00000000000fa0fa 0000000000000080 000000001fee1228
(XEN)    0000000100000016 ffff83011a625840 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000108 0000000000000028 00000000ffffffed ffff83011ff81480
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000028
(XEN)    ffff82c480287ea8 0000000000000420 00000000000000a0 ffff83011ff7e010
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000108 ffff83011feca000 0000000000000028 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff82c400000004
(XEN)    ffffffff813c05b4 ffff82c480287ea8 0000000000007ff0 ffffffffffffffff
(XEN)    000000b100000000 0000000000000000 00000000fa0f8000 aaaaaaaaaaaaaaaa
(XEN)    000000b100000000 0000000000000028 00000000fa0f8000 0000000000000000
(XEN)    0000000000000cfc ffff8300df4ce000 00000000000001e6 0000000000000011
(XEN)    ffff8800023df900 0000000000000080 00007d3b7fd780c7 ffff82c4801fd012
(XEN)    ffffffff8100942a 0000000000000021 0000000000000080 ffff8800023df900
(XEN)    0000000000000011 00000000000001e6 ffff88003cba3aa8 0000000000007ff0
(XEN)    0000000000000217 ffffffff819c4760 000000000000000a ffff88000312c9e0
(XEN)    0000000000000021 ffffffff8100942a ffff88003c61486c ffff88003cba3a08
(XEN) Xen call trace:
(XEN)    [<ffff82c48015b9d5>] pci_enable_msi+0x476/0xacf
(XEN)    [<ffff82c48015d417>] map_domain_pirq+0x28e/0x37b
(XEN)    [<ffff82c48017089b>] do_physdev_op+0x7fb/0x1050
(XEN)    [<ffff82c4801fd012>] syscall_enter+0xf2/0x14c
(XEN)
(XEN) =================================================(XEN) msi->table_base
!= read_pci_mem_bar(bus, slot, func, bir)
(XEN) msi->table_base = fa0f8000
(XEN) read_pci_mem_bar = 0
(XEN) bus=0, slot=16, func=1, bir=0
(XEN) =================================================(XEN)
(XEN) Xen WARN at msi.c:657
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015bb06>] pci_enable_msi+0x5a7/0xacf
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480249940
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff82c4802bf390
(XEN) r9:  0000000000000000   r10: 00000000fffffffe   r11: ffff82c480209260
(XEN) r12: 0000000000003000   r13: ffff83011ff7e010   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007f9bbe730000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    0000000000000082 ffff82c480287d28 000082c480120257 ffff83011ff7e0a8
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0fa 00000000fa0fa000
(XEN)    0000008200000246 00000000fa0fa000 0000000080120257 0000000100000016
(XEN)    00000000fa0f8000 00000000000fa0fa 0000000000000000 000000001fee1228
(XEN)    0000000100000016 ffff83011a625840 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000108 0000000000000028 00000000ffffffed ffff83011ff81480
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000028
(XEN)    ffff82c480287ea8 0000000000000420 00000000000000a0 ffff83011ff7e010
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000108 ffff83011feca000 0000000000000028 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff82c400000004
(XEN)    ffffffff813c05b4 ffff82c480287ea8 0000000000007ff0 ffffffffffffffff
(XEN)    000000b100000000 0000000000000000 00000000fa0f8000 aaaaaaaaaaaaaaaa
(XEN)    000000b100000000 0000000000000028 00000000fa0f8000 0000000000000000
(XEN)    0000000000000cfc ffff8300df4ce000 00000000000001e6 0000000000000011
(XEN)    ffff8800023df900 0000000000000080 00007d3b7fd780c7 ffff82c4801fd012
(XEN)    ffffffff8100942a 0000000000000021 0000000000000080 ffff8800023df900
(XEN)    0000000000000011 00000000000001e6 ffff88003cba3aa8 0000000000007ff0
(XEN)    0000000000000217 ffffffff819c4760 000000000000000a ffff88000312c9e0
(XEN)    0000000000000021 ffffffff8100942a ffff88003c61486c ffff88003cba3a08
(XEN) Xen call trace:
(XEN)    [<ffff82c48015bb06>] pci_enable_msi+0x5a7/0xacf
(XEN)    [<ffff82c48015d417>] map_domain_pirq+0x28e/0x37b
(XEN)    [<ffff82c48017089b>] do_physdev_op+0x7fb/0x1050
(XEN)    [<ffff82c4801fd012>] syscall_enter+0xf2/0x14c
(XEN)
(XEN) =================================================(XEN) No pba_addr: bus=0,
slot=16, func=1, bir=0
(XEN) bus=0, slot=16, func=1, bir=0
(XEN) =================================================(XEN)
(XEN) Xen WARN at msi.c:670
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015bba2>] pci_enable_msi+0x643/0xacf
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: 0000000000000000   rcx: ffff83011fee1148
(XEN) rdx: ffff83011a625750   rsi: 0000000000000003   rdi: ffff83011fee1148
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff82c4802bf390
(XEN) r9:  0000000000000000   r10: 00000000fffffffe   r11: ffff82c480209260
(XEN) r12: 0000000000003000   r13: ffff83011ff7e010   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007f9bbe730000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    0000000000000082 ffff82c480287d28 000082c480120257 ffff83011ff7e0a8
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0fa 00000000fa0fa000
(XEN)    0000008200000246 00000000fa0fa000 0000000080120257 0000000100000016
(XEN)    00000000fa0f8000 00000000000fa0fa 0000000000000000 000000001fee1228
(XEN)    0000000100000016 ffff83011a625840 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000108 0000000000000028 00000000ffffffed ffff83011ff81480
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000028
(XEN)    ffff82c480287ea8 0000000000000420 00000000000000a0 ffff83011ff7e010
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000108 ffff83011feca000 0000000000000028 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff82c400000004
(XEN)    ffffffff813c05b4 ffff82c480287ea8 0000000000007ff0 ffffffffffffffff
(XEN)    000000b100000000 0000000000000000 00000000fa0f8000 aaaaaaaaaaaaaaaa
(XEN)    000000b100000000 0000000000000028 00000000fa0f8000 0000000000000000
(XEN)    0000000000000cfc ffff8300df4ce000 00000000000001e6 0000000000000011
(XEN)    ffff8800023df900 0000000000000080 00007d3b7fd780c7 ffff82c4801fd012
(XEN)    ffffffff8100942a 0000000000000021 0000000000000080 ffff8800023df900
(XEN)    0000000000000011 00000000000001e6 ffff88003cba3aa8 0000000000007ff0
(XEN)    0000000000000217 ffffffff819c4760 000000000000000a ffff88000312c9e0
(XEN)    0000000000000021 ffffffff8100942a ffff88003c61486c ffff88003cba3a08
(XEN) Xen call trace:
(XEN)    [<ffff82c48015bba2>] pci_enable_msi+0x643/0xacf
(XEN)    [<ffff82c48015d417>] map_domain_pirq+0x28e/0x37b
(XEN)    [<ffff82c48017089b>] do_physdev_op+0x7fb/0x1050
(XEN)    [<ffff82c4801fd012>] syscall_enter+0xf2/0x14c
(XEN)
(XEN) =================================================(XEN)
rangeset_overlaps_range
(XEN) mmio_ro_ranges = ffff83011fee1120, dev->msix_pba.first = 3,
dev->msix_pba.last = 3
(XEN) =================================================(XEN)
(XEN) Xen WARN at msi.c:636
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015b9d5>] pci_enable_msi+0x476/0xacf
(XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82c480287ea8   rcx: 000000000000000c
(XEN) rdx: 0000000000000cfe   rsi: 0000000000000286   rdi: ffff82c480249940
(XEN) rbp: ffff82c480287dc8   rsp: ffff82c480287d08   r8:  ffff83011fff4004
(XEN) r9:  ffff830000000000   r10: ffff82c48020e7e0   r11: 0000000000000213
(XEN) r12: 0000000000000000   r13: ffff83011ff7e0e0   r14: ffff82c480287e10
(XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000116b2b000   cr2: 00007ff12c5471c0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480287d08:
(XEN)    ffff82c480287d18 ffff83011feca000 000082c480287e28 ffff83011ff7e178
(XEN)    ffff82c400002000 ffff82c480287f18 00000000000fa0f6 00000000fa0f6000
(XEN)    0000008200000246 00000000fa0f6000 0000000080120257 0000000200000016
(XEN)    00000000fa0f4000 00000000000fa0f6 0000000000000080 000000001fee1228
(XEN)    0000000200000016 ffff83011a625950 ffff82c48011ffb1 ffff83011feca000
(XEN)    0000000000000107 0000000000000029 00000000ffffffed ffff83011ff81500
(XEN)    ffff82c480287e48 ffff82c48015d417 ffff82c480287f18 0000000000000029
(XEN)    ffff82c480287ea8 000000000000041c 00000000000000a4 ffff83011ff7e0e0
(XEN)    0000000000000246 ffff82c480287e28 ffff82c48011ffb1 ffff88003cba3a08
(XEN)    0000000000000107 ffff83011feca000 0000000000000029 ffff83011feca190
(XEN)    ffff82c480287ef8 ffff82c48017089b ffff82c400000000 ffff8sd
7:0:0:0: [sdbfsck from util-linux-ng 2.17.2
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
e2fsck 1.41.11 (14-Mar-2010)

Looks like this is the device it''s complaining about now:

00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)

Here''s the detail on it and it''s adjacent companion for
comparison.


00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
        Subsystem: Super Micro Computer Inc Device f580
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fa0fc000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable+ Mask- TabSize=1
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
                LnkCap: Port #0, Speed unknown, Width x0, ASPM
unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [e0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 12)
        Subsystem: Super Micro Computer Inc Device f580
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin B routed to IRQ 17
        Region 0: Memory at fa0f8000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable+ Mask- TabSize=1
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
                LnkCap: Port #0, Speed unknown, Width x0, ASPM
unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [e0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma


-Bruce



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gianni Tedesco

2010-Oct-18 17:16 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Sat, 2010-10-16 at 18:25 +0100, Sander Eikelenboom
wrote:> Probably there are more problems, you could also try a xen-unstable
> from before the commit that changed this code (msi.c)
> Another thing that could make it eassier to debug would be to put some
> printk''s around the WARN_ON''s in msi.c  at the
linenumbers that gave
> the warnings, showing but parts of the equation in the WARN_ON
Yes, I am still getting WARN''s after the inverted masks patch too.
Bruces patch was line-wrap mangled but I instrumented the WARN that I''m
hitting based on that. The device in question is a broadcom netXtreme II
- there are two installed in the box but only one of them is brought up.
The WARN''s happen when the interface is brought up for DHCP.

(XEN) ================================================(XEN) msi->table_base
!= read_pci_mem_bar(bus, slot, func, bir)
(XEN) msi->table_base = da000000
(XEN) read_pci_mem_bar = 0
(XEN) bus=2, slot=0, func=0, bir=0
(XEN) ================================================
(XEN) ================================================(XEN) No pba_addr: bus=2,
slot=0, func=0, bir=0
(XEN) ================================================
The problem appears to be as simple as read_pci_mem_bar() returning
zero. This can only happen for a few possible reasons and in my case
what I got was:

pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) is not one of:
	PCI_HEADER_TYPE_NORMAL
	PCI_HEADER_TYPE_BRIDGE
	PCI_HEADER_TYPE_CARDBUS

Thereby bailing in the switch statement. It seems that the problem here
is that the multi-function bit (0x80) was not being masked out. Does the
following patch work for you guys?

diff -r fc2242ac90e1 xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c	Mon Oct 18 11:31:47 2010 +0100
+++ b/xen/arch/x86/msi.c	Mon Oct 18 18:14:22 2010 +0100
@@ -527,7 +527,7 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
     u8 limit;
     u32 addr;
 
-    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) & 0x7f )
     {
     case PCI_HEADER_TYPE_NORMAL:
         limit = 6;



FYI: This is function 0 of my multi-function bnx2 NIC. I notice your
affected devices were also multi-function

02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit
Ethernet (rev 20)
	Subsystem: Dell Device 02a3
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at da000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetXtreme II Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95716C1
			[EC] Engineering changes: 220197-3
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 30 32 38
			[V0] Vendor specific: 5.0.13
			[RV] Reserved: checksum good, 22 byte(s) reserved
		End
	Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
		Vector table: BAR=0 offset=0000c000
		PBA: BAR=0 offset=0000e000
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us, L1
<2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt-
ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable
De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [100 v1] Device Serial Number a4-ba-db-ff-fe-4d-11-0b
	Capabilities: [110 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [150 v1] Power Budgeting <?>
	Capabilities: [160 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Kernel driver in use: bnx2
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit
Ethernet (rev 20)
	Subsystem: Dell Device 02a3
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 17
	Region 0: Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetXtreme II Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95716C1
			[EC] Engineering changes: 220197-3
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 30 32 38
			[V0] Vendor specific: 5.0.13
			[RV] Reserved: checksum good, 22 byte(s) reserved
		End
	Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
		Vector table: BAR=0 offset=0000c000
		PBA: BAR=0 offset=0000e000
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us, L1
<2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt-
ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable
De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [100 v1] Device Serial Number a4-ba-db-ff-fe-4d-11-0c
	Capabilities: [110 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC-
UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [150 v1] Power Budgeting <?>
	Capabilities: [160 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Kernel driver in use: bnx2



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Oct-18 17:29 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

On Mon, Oct 18, 2010 at 10:16 AM, Gianni Tedesco
<gianni.tedesco@citrix.com> wrote:> On Sat, 2010-10-16 at 18:25 +0100, Sander Eikelenboom wrote:
>> Probably there are more problems, you could also try a xen-unstable
>> from before the commit that changed this code (msi.c)
>> Another thing that could make it eassier to debug would be to put some
>> printk''s around the WARN_ON''s in msi.c  at the
linenumbers that gave
>> the warnings, showing but parts of the equation in the WARN_ON
>
> Yes, I am still getting WARN''s after the inverted masks patch too.
> Bruces patch was line-wrap mangled but I instrumented the WARN that
I''m
> hitting based on that. The device in question is a broadcom netXtreme II
> - there are two installed in the box but only one of them is brought up.
> The WARN''s happen when the interface is brought up for DHCP.
>
> (XEN) ================================================> (XEN)
msi->table_base != read_pci_mem_bar(bus, slot, func, bir)
> (XEN) msi->table_base = da000000
> (XEN) read_pci_mem_bar = 0
> (XEN) bus=2, slot=0, func=0, bir=0
> (XEN) ================================================>
> (XEN) ================================================> (XEN) No
pba_addr: bus=2, slot=0, func=0, bir=0
> (XEN) ================================================>
> The problem appears to be as simple as read_pci_mem_bar() returning
> zero. This can only happen for a few possible reasons and in my case
> what I got was:
>
> pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) is not one of:
>        PCI_HEADER_TYPE_NORMAL
>        PCI_HEADER_TYPE_BRIDGE
>        PCI_HEADER_TYPE_CARDBUS
>
> Thereby bailing in the switch statement. It seems that the problem here
> is that the multi-function bit (0x80) was not being masked out. Does the
> following patch work for you guys?
Nice!
Boots clean now, no WARNs at all.

Thanks

-Bruce
>
> diff -r fc2242ac90e1 xen/arch/x86/msi.c
> --- a/xen/arch/x86/msi.c        Mon Oct 18 11:31:47 2010 +0100
> +++ b/xen/arch/x86/msi.c        Mon Oct 18 18:14:22 2010 +0100
> @@ -527,7 +527,7 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>     u8 limit;
>     u32 addr;
>
> -    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
> +    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) & 0x7f )
>     {
>     case PCI_HEADER_TYPE_NORMAL:
>         limit = 6;
>
>
>
> FYI: This is function 0 of my multi-function bnx2 NIC. I notice your
> affected devices were also multi-function
>
> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716
Gigabit Ethernet (rev 20)
>        Subsystem: Dell Device 02a3
>        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0, Cache Line Size: 64 bytes
>        Interrupt: pin A routed to IRQ 16
>        Region 0: Memory at da000000 (64-bit, non-prefetchable) [size=32M]
>        Capabilities: [48] Power Management version 3
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>        Capabilities: [50] Vital Product Data
>                Product Name: Broadcom NetXtreme II Ethernet Controller
>                Read-only fields:
>                        [PN] Part number: BCM95716C1
>                        [EC] Engineering changes: 220197-3
>                        [SN] Serial number: 0123456789
>                        [MN] Manufacture ID: 31 30 32 38
>                        [V0] Vendor specific: 5.0.13
>                        [RV] Reserved: checksum good, 22 byte(s) reserved
>                End
>        Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
>                Vector table: BAR=0 offset=0000c000
>                PBA: BAR=0 offset=0000e000
>        Capabilities: [ac] Express (v2) Endpoint, MSI 00
>                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<4us, L1 <64us
>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+
Unsupported+
>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
>                        MaxPayload 128 bytes, MaxReadReq 512 bytes
>                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+
TransPend-
>                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1,
Latency L0 <2us, L1 <2us
>                        ClockPM- Surprise- LLActRep- BwNot-
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk+
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
>                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
>                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
>                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
>                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
>                         Compliance De-emphasis: -6dB
>                LnkSta2: Current De-emphasis Level: -6dB
>        Capabilities: [100 v1] Device Serial Number a4-ba-db-ff-fe-4d-11-0b
>        Capabilities: [110 v1] Advanced Error Reporting
>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
>                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr+
>                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+
NonFatalErr+
>                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+
ChkEn-
>        Capabilities: [150 v1] Power Budgeting <?>
>        Capabilities: [160 v1] Virtual Channel
>                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                Arb:    Fixed- WRR32- WRR64- WRR128-
>                Ctrl:   ArbSelect=Fixed
>                Status: InProgress-
>                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128-
WRR256-
>                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                        Status: NegoPending- InProgress-
>        Kernel driver in use: bnx2
> 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716
Gigabit Ethernet (rev 20)
>        Subsystem: Dell Device 02a3
>        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0, Cache Line Size: 64 bytes
>        Interrupt: pin B routed to IRQ 17
>        Region 0: Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
>        Capabilities: [48] Power Management version 3
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>        Capabilities: [50] Vital Product Data
>                Product Name: Broadcom NetXtreme II Ethernet Controller
>                Read-only fields:
>                        [PN] Part number: BCM95716C1
>                        [EC] Engineering changes: 220197-3
>                        [SN] Serial number: 0123456789
>                        [MN] Manufacture ID: 31 30 32 38
>                        [V0] Vendor specific: 5.0.13
>                        [RV] Reserved: checksum good, 22 byte(s) reserved
>                End
>        Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
>                Address: 0000000000000000  Data: 0000
>        Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
>                Vector table: BAR=0 offset=0000c000
>                PBA: BAR=0 offset=0000e000
>        Capabilities: [ac] Express (v2) Endpoint, MSI 00
>                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<4us, L1 <64us
>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+
Unsupported+
>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
>                        MaxPayload 128 bytes, MaxReadReq 512 bytes
>                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+
TransPend-
>                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1,
Latency L0 <2us, L1 <2us
>                        ClockPM- Surprise- LLActRep- BwNot-
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk+
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
>                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
>                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
>                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
>                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
>                         Compliance De-emphasis: -6dB
>                LnkSta2: Current De-emphasis Level: -6dB
>        Capabilities: [100 v1] Device Serial Number a4-ba-db-ff-fe-4d-11-0c
>        Capabilities: [110 v1] Advanced Error Reporting
>                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
>                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr+
>                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+
NonFatalErr+
>                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+
ChkEn-
>        Capabilities: [150 v1] Power Budgeting <?>
>        Capabilities: [160 v1] Virtual Channel
>                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                Arb:    Fixed- WRR32- WRR64- WRR128-
>                Ctrl:   ArbSelect=Fixed
>                Status: InProgress-
>                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128-
WRR256-
>                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                        Status: NegoPending- InProgress-
>        Kernel driver in use: bnx2
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Oct-19 07:18 UTC

head link

Re: [Xen-devel] MSI badness in xen-unstable

>>> On 18.10.10 at 19:16, Gianni Tedesco
<gianni.tedesco@citrix.com> wrote:
> --- a/xen/arch/x86/msi.c	Mon Oct 18 11:31:47 2010 +0100
> +++ b/xen/arch/x86/msi.c	Mon Oct 18 18:14:22 2010 +0100
> @@ -527,7 +527,7 @@ static u64 read_pci_mem_bar(u8 bus, u8 s
>      u8 limit;
>      u32 addr;
>  
> -    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
> +    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) & 0x7f )
>      {
>      case PCI_HEADER_TYPE_NORMAL:
>          limit = 6;
Ah, yes, of course!

Acked-by: Jan Beulich <jbeulich@novell.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Apparently Analagous Threads

Search for more maybe matching threads

Xen devel - Oct 2010 - MSI badness in xen-unstable

[Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Re: [Xen-devel] MSI badness in xen-unstable

Apparently Analagous Threads