Hello, I have some troubles loading the IOATDMA module under xen4.1.2 and a linux dom0 3.3 CONFIG_INTEL_IOATDMA=m CONFIG_IGB=y It was working with linux 3.1.5. The regression seems to be since linux 3.2. I tried to do a `git bisect` but I''m facing other regressions which make the debug harder. Here is the call trace when loading the module in dom0: dca service started, version 1.12.1 ioatdma: Intel(R) QuickData Technology Driver 4.00 ioatdma 0000:00:16.0: enabling device (0000 -> 0002) xen: registering gsi 43 triggering 0 polarity 1 xen: --> pirq=43 -> irq=43 (gsi=43) ------------[ cut here ]------------ kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! invalid opcode: 0000 [#1] SMP Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod button Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell C6100 /0D61XP EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0 EIP is at __cleanup+0x154/0x160 [ioatdma] EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002 ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000) Stack: eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006 eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000 Call Trace: [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma] [<c10347cb>] tasklet_action+0x9b/0xb0 [<c10350ab>] __do_softirq+0x7b/0x110 [<c1035030>] ? irq_enter+0x70/0x70 <IRQ> [<c1034e7e>] ? irq_exit+0x6e/0xa0 [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30 [<c1322907>] ? xen_do_upcall+0x7/0xc [<c10013a7>] ? hypercall_page+0x3a7/0x1000 [<c1006172>] ? xen_safe_halt+0x12/0x20 [<c1010582>] ? default_idle+0x32/0x60 [<c1008596>] ? cpu_idle+0x66/0xa0 [<c130bd58>] ? rest_init+0x58/0x60 [<c14237d2>] ? start_kernel+0x2e4/0x2ea [<c142331d>] ? kernel_init+0x11b/0x11b [<c14230ba>] ? i386_start_kernel+0xa9/0xb0 [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83 EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c ---[ end trace 902e93593e49fa50 ]--- Kernel panic - not syncing: Fatal exception in interrupt Does anybody have any clue? Regards, -- William
On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote:> Hello, > > I have some troubles loading the IOATDMA module under xen4.1.2 and a > linux dom0 3.3So you are using the rc1 version? What exact git commit are you using?> > CONFIG_INTEL_IOATDMA=m > CONFIG_IGB=y > > It was working with linux 3.1.5. The regression seems to be since > linux 3.2. I tried to do a `git bisect` but I''m facing other3.2 you say? This below is 3.3?> regressions which make the debug harder.Such as?> > Here is the call trace when loading the module in dom0:Is the problem present with baremetal (same exact kernel?) Do you see this if you run a 64-bit dom0?> > dca service started, version 1.12.1 > ioatdma: Intel(R) QuickData Technology Driver 4.00 > ioatdma 0000:00:16.0: enabling device (0000 -> 0002) > xen: registering gsi 43 triggering 0 polarity 1 > xen: --> pirq=43 -> irq=43 (gsi=43) > ------------[ cut here ]------------ > kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! > invalid opcode: 0000 [#1] SMP > Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip > ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp > llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod > button > > Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell > C6100 /0D61XP > EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0 > EIP is at __cleanup+0x154/0x160 [ioatdma] > EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002 > ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000) > Stack: > eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006 > eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001 > 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000 > Call Trace: > [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma] > [<c10347cb>] tasklet_action+0x9b/0xb0 > [<c10350ab>] __do_softirq+0x7b/0x110 > [<c1035030>] ? irq_enter+0x70/0x70 > <IRQ> > [<c1034e7e>] ? irq_exit+0x6e/0xa0 > [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30 > [<c1322907>] ? xen_do_upcall+0x7/0xc > [<c10013a7>] ? hypercall_page+0x3a7/0x1000 > [<c1006172>] ? xen_safe_halt+0x12/0x20 > [<c1010582>] ? default_idle+0x32/0x60 > [<c1008596>] ? cpu_idle+0x66/0xa0 > [<c130bd58>] ? rest_init+0x58/0x60 > [<c14237d2>] ? start_kernel+0x2e4/0x2ea > [<c142331d>] ? kernel_init+0x11b/0x11b > [<c14230ba>] ? i386_start_kernel+0xa9/0xb0 > [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa > Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea > f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f> > 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83 > EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c > ---[ end trace 902e93593e49fa50 ]--- > Kernel panic - not syncing: Fatal exception in interrupt > > > Does anybody have any clue? > > Regards, > -- > William > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Fri, Jan 27, 2012 at 3:47 PM, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:> So you are using the rc1 version? What exact git commit are you using?I pulled the last revision 74ea15d> 3.2 you say? This below is 3.3?Yes. I was using 3.1 kernel. After an upgrade to 3.2 I got the problem and thought it was good to report the problem with the last 3.3-rc kernel> Is the problem present with baremetal (same exact kernel?)I indeed tested with a baremetal kernel and didn''t got any problem. So it seems to come from a Xen problem.> Do you see this if you run a 64-bit dom0?I didn''t test this. -- William
forwarded 660554 http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604 quit (cc-ing Thomas, since he ran into the same bug) Hi, Konrad Rzeszutek Wilk wrote:> On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote:>> I have some troubles loading the IOATDMA module under xen4.1.2 and a >> linux dom0 3.3 > > So you are using the rc1 version? What exact git commit are you using?Broken: v3.2.6 + Debian patches (zigo) v3.3-rc2~22 (William) Not broken: v3.1.8 + Debian patches, presumably (zigo) v3.1.5 (William) [...]>> Here is the call trace when loading the module in dom0: > > Is the problem present with baremetal (same exact kernel?)No.> Do you see this if you run a 64-bit dom0?I''m guessing not, just based on the crazy coincidence that both reports were with 32-bit kernels. But who knows. ;-) [...]>> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163! >> invalid opcode: 0000 [#1] SMP >> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip >> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp >> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod >> button >> >> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell C6100 /0D61XPThis is active = ioat2_ring_active(ioat); for (i = 0; i < active && !seen_current; i++) { ... if (tx->phys == phys_complete) seen_current = true; } ... BUG_ON(active && !seen_current); /* no active descs have written a completion? */ Any hints for tracking it down? Thanks, Jonathan
> Konrad Rzeszutek Wilk wrote:>> Do you see this if you run a 64-bit dom0?Looks like no. Thomas reports[1]:> I just tried with the amd64 kernel and Xen, and I didn''t see any issue. > > However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits > kernel, because that is the most optimized configuration (eg: 64 bits > hypervisor, 32 bits kernel and 32 bits userland).Maybe Andres''s patches are relevant. Hope that helps, Jonathan [1] http://bugs.debian.org/660554#25
On 02/21/2012 02:16 AM, Jonathan Nieder wrote:>> I just tried with the amd64 kernel and Xen, and I didn''t see any issue. >> >> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits >> kernel, because that is the most optimized configuration (eg: 64 bits >> hypervisor, 32 bits kernel and 32 bits userland). >> > Maybe Andres''s patches are relevant. > > Hope that helps, > Jonathan > > [1] http://bugs.debian.org/660554#25 >Hi, Which patch are you referring to? Is there anything I can do to help testing/investigating this? Should this be reported in the LKML? How can I find who''s the author of this driver? Thomas Goirand (zigo)
Hi Thomas, On Sat, Feb 25, 2012 at 8:46 AM, Thomas Goirand <thomas@goirand.fr> wrote:> How can I find who''s the author of this driver?I don''t think the problem is related to the driver itself, because it is working without xen. I''m also looking for hints to fix the problem. -- William
Dan Williams
2012-Mar-05 15:38 UTC
[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo at debian.org> wrote:> I will do my best to provide it ASAP. Should I compile with BUG_ON so > you see it crashing, as per the original code, or just with WARN_ON, so > you also see further things in dmesg?Yes, replacing with a WARN_ON might allow it to skid after the crash and give a bit more information. Thank you for grabbing this info. -- Dan
Thomas Goirand
2012-Mar-06 09:20 UTC
[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
On 03/05/2012 11:38 PM, Dan Williams wrote:> On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo at debian.org> wrote: >> I will do my best to provide it ASAP. Should I compile with BUG_ON so >> you see it crashing, as per the original code, or just with WARN_ON, so >> you also see further things in dmesg? > > Yes, replacing with a WARN_ON might allow it to skid after the crash > and give a bit more information. > > Thank you for grabbing this info. > > -- > DanHi Dan, Please find attached the log that you asked me, with WARN_ON instead of BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c. Let me know if you want me to do more, or if you want to have access to my server (in which case, provide me a public ssh key and sign your email with PGP). Thomas P.S: I compressed the dmesg.txt because on debian lists if a message is>= 40K, it requires administrator moderation, which I want to avoid.-------------- next part -------------- A non-text attachment was scrubbed... Name: dmesg.txt.gz Type: application/x-gzip Size: 20336 bytes Desc: not available URL: <http://lists.alioth.debian.org/pipermail/pkg-xen-devel/attachments/20120306/f7433628/attachment-0001.bin>
Bastian Blank
2012-Mar-06 10:33 UTC
Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
On Tue, Mar 06, 2012 at 05:20:54PM +0800, Thomas Goirand wrote:> Please find attached the log that you asked me, with WARN_ON instead of > BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.| ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x0 ctl: 0x0 (op: 0 int_en: 0 compl: 0) | ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x31 ctl: 0x9 (op: 0 int_en: 1 compl: 1) *counting* 9 hex digest, aka > 2^32. What did I say? Bastian -- The joys of love made her human and the agonies of love destroyed her. -- Spock, "Requiem for Methuselah", stardate 5842.8
Konrad Rzeszutek Wilk
2012-Mar-13 16:49 UTC
Re: [Xen-devel] [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
On Tue, Mar 06, 2012 at 06:39:12AM -0800, Ian Campbell wrote:> On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote: > > [ 9.276817] ioatdma 0000:00:16.4: desc[0]: > > (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 > > int_en: 1 compl: 1) > > ... > > [ 9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion: > > phys_complete: 0xcc7000 > > > > Thanks, this clearly shows that our descriptors are above 4GB and that > > the driver truncates the completion word. > > > > Is this new behavior for xen? > > Xen makes a distinction between physical addresses and DMA addresses and > the latter can potentially be anywhere in the machine''s real address > space while the former is what GFP_KERNEL etc controls. > > You are using pci_pool_alloc which is the correct API to use for these > things since it''s purpose is to handle cases where PHYS != DMA addr by > exposing the DMA address to the caller. As part of that you should also > be using dma_addr_t for DMA addresses since that is the type which is > defined to handle the appropriate DMA address size on the platform. > > I think this DMA!=PHYS can also be true of some non-x86 architecturesEspecially SPARC.> without Xen too but I guess ioat is quite x86 specific? In any case it > is wrong, or at least non-portable, to use unsigned long for these > addresses even though it happens on x86 that physaddr == dma addr > (usually).I think with the Intel VT-d that can be different. The bus addresses returned do seem different.
Williams, Dan J
2012-Mar-24 03:34 UTC
[Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
On Fri, Mar 23, 2012 at 7:25 PM, William Dauchy <wdauchy at gmail.com> wrote:> Hi Dan, > > On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams at intel.com> wrote: >> Thanks for the debug help, does this patch fix the issue for you? > > I successfully tested your patch and it works fine. Thanks again for your work. > > Reported-by: William Dauchy <wdauchy at gmail.com> > Tested-by: William Dauchy <wdauchy at gmail.com>Great, thanks for the test. -- Dan