thr3ads.net - Pkg xen devel - regression ioatdma 3.3 [Jan 2012]

If this information is useful, please help other people find it:
Share via:

William Dauchy

2012-Jan-27 13:31 UTC

regression ioatdma 3.3

Hello,

I have some troubles loading the IOATDMA module under xen4.1.2 and a
linux dom0 3.3

CONFIG_INTEL_IOATDMA=m
CONFIG_IGB=y

It was working with linux 3.1.5. The regression seems to be since
linux 3.2. I tried to do a `git bisect` but I''m facing other
regressions which make the debug harder.

Here is the call trace when loading the module in dom0:

dca service started, version 1.12.1
ioatdma: Intel(R) QuickData Technology Driver 4.00
ioatdma 0000:00:16.0: enabling device (0000 -> 0002)
xen: registering gsi 43 triggering 0 polarity 1
xen: --> pirq=43 -> irq=43 (gsi=43)
------------[ cut here ]------------
kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
invalid opcode: 0000 [#1] SMP
Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
button

Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell
  C6100           /0D61XP
EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0
EIP is at __cleanup+0x154/0x160 [ioatdma]
EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002
ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000)
Stack:
 eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006
 eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001
 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000
Call Trace:
 [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma]
 [<c10347cb>] tasklet_action+0x9b/0xb0
 [<c10350ab>] __do_softirq+0x7b/0x110
 [<c1035030>] ? irq_enter+0x70/0x70
 <IRQ>
 [<c1034e7e>] ? irq_exit+0x6e/0xa0
 [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30
 [<c1322907>] ? xen_do_upcall+0x7/0xc
 [<c10013a7>] ? hypercall_page+0x3a7/0x1000
 [<c1006172>] ? xen_safe_halt+0x12/0x20
 [<c1010582>] ? default_idle+0x32/0x60
 [<c1008596>] ? cpu_idle+0x66/0xa0
 [<c130bd58>] ? rest_init+0x58/0x60
 [<c14237d2>] ? start_kernel+0x2e4/0x2ea
 [<c142331d>] ? kernel_init+0x11b/0x11b
 [<c14230ba>] ? i386_start_kernel+0xa9/0xb0
 [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa
Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea
f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f>
0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83
EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c
---[ end trace 902e93593e49fa50 ]---
Kernel panic - not syncing: Fatal exception in interrupt


Does anybody have any clue?

Regards,
-- 
William

Konrad Rzeszutek Wilk

2012-Jan-27 14:47 UTC

head link

Re: regression ioatdma 3.3

On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy
wrote:> Hello,
> 
> I have some troubles loading the IOATDMA module under xen4.1.2 and a
> linux dom0 3.3
So you are using the rc1 version? What exact git commit are you using?
> 
> CONFIG_INTEL_IOATDMA=m
> CONFIG_IGB=y
> 
> It was working with linux 3.1.5. The regression seems to be since
> linux 3.2. I tried to do a `git bisect` but I''m facing other
3.2 you say? This below is 3.3?
> regressions which make the debug harder.
Such as?
> 
> Here is the call trace when loading the module in dom0:
Is the problem present with baremetal (same exact kernel?)
Do you see this if you run a 64-bit dom0?
> 
> dca service started, version 1.12.1
> ioatdma: Intel(R) QuickData Technology Driver 4.00
> ioatdma 0000:00:16.0: enabling device (0000 -> 0002)
> xen: registering gsi 43 triggering 0 polarity 1
> xen: --> pirq=43 -> irq=43 (gsi=43)
> ------------[ cut here ]------------
> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
> button
> 
> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell
>   C6100           /0D61XP
> EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0
> EIP is at __cleanup+0x154/0x160 [ioatdma]
> EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002
> ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000)
> Stack:
>  eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006
>  eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001
>  00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000
> Call Trace:
>  [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma]
>  [<c10347cb>] tasklet_action+0x9b/0xb0
>  [<c10350ab>] __do_softirq+0x7b/0x110
>  [<c1035030>] ? irq_enter+0x70/0x70
>  <IRQ>
>  [<c1034e7e>] ? irq_exit+0x6e/0xa0
>  [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30
>  [<c1322907>] ? xen_do_upcall+0x7/0xc
>  [<c10013a7>] ? hypercall_page+0x3a7/0x1000
>  [<c1006172>] ? xen_safe_halt+0x12/0x20
>  [<c1010582>] ? default_idle+0x32/0x60
>  [<c1008596>] ? cpu_idle+0x66/0xa0
>  [<c130bd58>] ? rest_init+0x58/0x60
>  [<c14237d2>] ? start_kernel+0x2e4/0x2ea
>  [<c142331d>] ? kernel_init+0x11b/0x11b
>  [<c14230ba>] ? i386_start_kernel+0xa9/0xb0
>  [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa
> Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea
> f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5
<0f>
> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83
> EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP
0069:eb40bf7c
> ---[ end trace 902e93593e49fa50 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> Does anybody have any clue?
> 
> Regards,
> -- 
> William
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

William Dauchy

2012-Jan-27 15:02 UTC

head link

Re: regression ioatdma 3.3

On Fri, Jan 27, 2012 at 3:47 PM, Konrad Rzeszutek Wilk
<konrad@darnok.org> wrote:> So you are using the rc1 version? What exact git commit are you using?
I pulled the last revision 74ea15d
> 3.2 you say? This below is 3.3?
Yes. I was using 3.1 kernel. After an upgrade to 3.2 I got the problem
and thought it was good to report the problem with the last 3.3-rc
kernel
> Is the problem present with baremetal (same exact kernel?)
I indeed tested with a baremetal kernel and didn''t got any problem. So
it seems to come from a Xen problem.
> Do you see this if you run a 64-bit dom0?
I didn''t test this.

-- 
William

Jonathan Nieder

2012-Feb-19 22:31 UTC

head link

Re: regression ioatdma 3.3

forwarded 660554 http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604
quit
(cc-ing Thomas, since he ran into the same bug)
Hi,

Konrad Rzeszutek Wilk wrote:> On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote:
>> I have some troubles loading the IOATDMA module under xen4.1.2 and a
>> linux dom0 3.3
>
> So you are using the rc1 version? What exact git commit are you using?
Broken:

 v3.2.6 + Debian patches (zigo)
 v3.3-rc2~22 (William)

Not broken:

 v3.1.8 + Debian patches, presumably (zigo)
 v3.1.5 (William)

[...]>> Here is the call trace when loading the module in dom0:
>
> Is the problem present with baremetal (same exact kernel?)
No.
> Do you see this if you run a 64-bit dom0?
I''m guessing not, just based on the crazy coincidence that both
reports were with 32-bit kernels.  But who knows. ;-)

[...]>> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
>> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
>> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
>> button
>> 
>> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell  
C6100           /0D61XP
This is

	active = ioat2_ring_active(ioat);
	for (i = 0; i < active && !seen_current; i++) {
		...
		if (tx->phys == phys_complete)
			seen_current = true;
	}
	...
	BUG_ON(active && !seen_current); /* no active descs have written a
completion? */

Any hints for tracking it down?

Thanks,
Jonathan

Jonathan Nieder

2012-Feb-20 18:16 UTC

head link

Re: regression ioatdma 3.3

> Konrad Rzeszutek Wilk wrote:
>> Do you see this if you run a 64-bit dom0?
Looks like no.  Thomas reports[1]:
> I just tried with the amd64 kernel and Xen, and I didn''t see any
issue.
>
> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits
> kernel, because that is the most optimized configuration (eg: 64 bits
> hypervisor, 32 bits kernel and 32 bits userland).
Maybe Andres''s patches are relevant.

Hope that helps,
Jonathan

[1] http://bugs.debian.org/660554#25

Thomas Goirand

2012-Feb-25 07:46 UTC

head link

Re: regression ioatdma 3.3

On 02/21/2012 02:16 AM, Jonathan Nieder wrote:>> I just tried with the amd64 kernel and Xen, and I didn''t see
any issue.
>>
>> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits
>> kernel, because that is the most optimized configuration (eg: 64 bits
>> hypervisor, 32 bits kernel and 32 bits userland).
>>     
> Maybe Andres''s patches are relevant.
>
> Hope that helps,
> Jonathan
>
> [1] http://bugs.debian.org/660554#25
>   Hi,

Which patch are you referring to? Is there anything I can do to help
testing/investigating this? Should this be reported in the LKML? How
can I find who''s the author of this driver?

Thomas Goirand (zigo)

William Dauchy

2012-Feb-25 21:13 UTC

head link

Re: regression ioatdma 3.3

Hi Thomas,

On Sat, Feb 25, 2012 at 8:46 AM, Thomas Goirand <thomas@goirand.fr>
wrote:> How can I find who''s the author of this driver?
I don''t think the problem is related to the driver itself, because it
is working without xen.
I''m also looking for hints to fix the problem.

-- 
William

Dan Williams

2012-Mar-05 15:38 UTC

head link

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo at debian.org>
wrote:> I will do my best to provide it ASAP. Should I compile with BUG_ON so
> you see it crashing, as per the original code, or just with WARN_ON, so
> you also see further things in dmesg?
Yes, replacing with a WARN_ON might allow it to skid after the crash
and give a bit more information.

Thank you for grabbing this info.

--
Dan

Thomas Goirand

2012-Mar-06 09:20 UTC

head link

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

On 03/05/2012 11:38 PM, Dan Williams wrote:> On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo at debian.org>
wrote:
>> I will do my best to provide it ASAP. Should I compile with BUG_ON so
>> you see it crashing, as per the original code, or just with WARN_ON, so
>> you also see further things in dmesg?
> 
> Yes, replacing with a WARN_ON might allow it to skid after the crash
> and give a bit more information.
> 
> Thank you for grabbing this info.
> 
> --
> Dan
Hi Dan,

Please find attached the log that you asked me, with WARN_ON instead of
BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.

Let me know if you want me to do more, or if you want to have access to
my server (in which case, provide me a public ssh key and sign your
email with PGP).

Thomas

P.S: I compressed the dmesg.txt because on debian lists if a message
is>= 40K, it requires administrator moderation, which I want to avoid.-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.txt.gz
Type: application/x-gzip
Size: 20336 bytes
Desc: not available
URL:
<http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20120306/f7433628/attachment-0001.bin>

Bastian Blank

2012-Mar-06 10:33 UTC

head link

Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

On Tue, Mar 06, 2012 at 05:20:54PM +0800, Thomas Goirand
wrote:> Please find attached the log that you asked me, with WARN_ON instead of
> BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.
| ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags:
0x0 ctl: 0x0 (op: 0 int_en: 0 compl: 0)
| ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags:
0x31 ctl: 0x9 (op: 0 int_en: 1 compl: 1)

*counting* 9 hex digest, aka > 2^32. What did I say?

Bastian

-- 
The joys of love made her human and the agonies of love destroyed her.
		-- Spock, "Requiem for Methuselah", stardate 5842.8

Konrad Rzeszutek Wilk

2012-Mar-13 16:49 UTC

head link

Re: [Xen-devel] [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

On Tue, Mar 06, 2012 at 06:39:12AM -0800, Ian Campbell
wrote:> On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote:
> > [    9.276817] ioatdma 0000:00:16.4: desc[0]:
> > (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0
> > int_en: 1 compl: 1)
> > ...
> > [    9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion:
> > phys_complete: 0xcc7000
> > 
> > Thanks, this clearly shows that our descriptors are above 4GB and that
> > the driver truncates the completion word.
> > 
> > Is this new behavior for xen?
> 
> Xen makes a distinction between physical addresses and DMA addresses and
> the latter can potentially be anywhere in the machine''s real
address
> space while the former is what GFP_KERNEL etc controls.
> 
> You are using pci_pool_alloc which is the correct API to use for these
> things since it''s purpose is to handle cases where PHYS != DMA
addr by
> exposing the DMA address to the caller. As part of that you should also
> be using dma_addr_t for DMA addresses since that is the type which is
> defined to handle the appropriate DMA address size on the platform.
> 
> I think this DMA!=PHYS can also be true of some non-x86 architectures
Especially SPARC.> without Xen too but I guess ioat is quite x86 specific? In any case it
> is wrong, or at least non-portable, to use unsigned long for these
> addresses even though it happens on x86 that physaddr == dma addr
> (usually).
I think with the Intel VT-d that can be different. The bus addresses returned
do seem different.

Williams, Dan J

2012-Mar-24 03:34 UTC

head link

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

On Fri, Mar 23, 2012 at 7:25 PM, William Dauchy <wdauchy at gmail.com>
wrote:> Hi Dan,
>
> On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams at
intel.com> wrote:
>> Thanks for the debug help, does this patch fix the issue for you?
>
> I successfully tested your patch and it works fine. Thanks again for your
work.
>
> Reported-by: William Dauchy <wdauchy at gmail.com>
> Tested-by: William Dauchy <wdauchy at gmail.com>
Great, thanks for the test.

--
Dan

Pkg xen devel - Jan 2012 - regression ioatdma 3.3

regression ioatdma 3.3

Re: regression ioatdma 3.3

Re: regression ioatdma 3.3

Re: regression ioatdma 3.3

Re: regression ioatdma 3.3

Re: regression ioatdma 3.3

Re: regression ioatdma 3.3

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

Re: [Xen-devel] [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2

[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2