Ward Vandewege
2011-Jan-28 18:58 UTC
[Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
Hi list, I''m having some problems trying to pass through a Mellaxnox ConnectX HCA to a domU. This is on Xen 4.0.1, with the latest Debian Testing packages: ii xen-hypervisor-4.0-amd64 4.0.1-2 ii linux-image-2.6.32-5-xen-amd64 2.6.32-30 The hardware is Supermicro H8DGT-HIBQF, BIOS revision 1.0c (date 10/29/10). It has two AMD Opteron 6128 CPUs, for a total of 16 cores. The machine has 32GiB of ram. The Mellannox adapter looks like this in the dom0: 02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) Subsystem: Super Micro Computer Inc Device 0048 Flags: fast devsel, IRQ 19 Memory at fea00000 (64-bit, non-prefetchable) [size=1M] Memory at fc800000 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 3 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable- Count=256 Masked- Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [100] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: pciback I''ve attached the output of xm dmesg (xm.dmesg.txt). I have the following in the domU config files: pci = [''0000:02:00.0''] I''ve attached the boot log from trying to boot the same kernel as a HVM guest (testsqueezehvm.bootlog.txt). Doing so generates these four lines of output in xm dmesg: (XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x255c000 (XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x255c080 (XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x255c040 (XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x255c0c0 The mlx4_core driver in the domU is not happy: [ 0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) [ 0.411879] mlx4_core: Initializing 0000:00:00.0 [ 0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) [ 0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 [ 1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0. [ 1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) [ 1.417527] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3. [ 1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. When trying to boot a PV domU with kernel options iommu=soft and swiotlb=force, the output is slightly different. The full bootlog is attached (testsqueeze.bootlog.txt). Here''s the relevant excerpt: [ 0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2 (August 4, 2010) [ 0.441696] mlx4_core: Initializing 0000:00:00.0 [ 0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) [ 0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 [ 2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X interrupt IRQ 54). [ 2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X. [ 2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate interrupt (IRQ 54), aborting. [ 2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing problem? [ 2.916920] mlx4_core: probe of 0000:00:00.0 failed with error -16 And xm dmesg quickly fills up with many, many lines like this: (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43000 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43020 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43040 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43060 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43080 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a430a0 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a430c0 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a430e0 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43100 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43120 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43140 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x70a4309170a43160 ... Booting a PV domU with only the swiotlb=force option makes the output much more like the HVM output. Any thoughts on what could be going on here? Thanks, Ward. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-28 19:27 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
> The mlx4_core driver in the domU is not happy: > > [ 0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) > [ 0.411879] mlx4_core: Initializing 0000:00:00.0 > [ 0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) > [ 0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 > [ 1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command > interface revision 0. > [ 1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) > [ 1.417527] mlx4_core 0000:00:00.0: This driver version supports only > revisions 2 to 3. > [ 1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. > > When trying to boot a PV domU with kernel options iommu=soft and > swiotlb=force, the output is slightly different. The full bootlog is attachedDon''t use swiotlb=force unless neccessary.> (testsqueeze.bootlog.txt). Here''s the relevant excerpt:That is b/c you are missing iommu=pv on Xen hypervisor line, and you might need to make sure your driver is using the VM_IO flag. There was some discussion on LKML about this and they proposed a patch that wasn''t neccessary. Don''t remember the details but I can look that up next week.> > [ 0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2 > (August 4, 2010) > [ 0.441696] mlx4_core: Initializing 0000:00:00.0 > [ 0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) > [ 0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 > [ 2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X > interrupt IRQ 54). > [ 2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X. > [ 2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate > interrupt (IRQ 54), aborting. > [ 2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ward Vandewege
2011-Jan-28 20:38 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
On Fri, Jan 28, 2011 at 02:27:42PM -0500, Konrad Rzeszutek Wilk wrote:> > The mlx4_core driver in the domU is not happy: > > > > [ 0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) > > [ 0.411879] mlx4_core: Initializing 0000:00:00.0 > > [ 0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) > > [ 0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 > > [ 1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command > > interface revision 0. > > [ 1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) > > [ 1.417527] mlx4_core 0000:00:00.0: This driver version supports only > > revisions 2 to 3. > > [ 1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. > > > > When trying to boot a PV domU with kernel options iommu=soft and > > swiotlb=force, the output is slightly different. The full bootlog is attached > > Don''t use swiotlb=force unless neccessary.OK; I just tried it without (for PV), same result: [ 0.420448] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2 (August 4, 2010) [ 0.420462] mlx4_core: Initializing 0000:00:00.0 [ 0.420804] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) [ 0.421477] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 [ 1.429824] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0. [ 1.429858] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) [ 1.429876] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3. [ 1.429895] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. and (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x2c03000 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x2c03040 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x2c03080 (XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault address:0x2c030c0> That is b/c you are missing iommu=pv on Xen hypervisor line, andHmm, I do have it: # xm dmesg |grep iommu (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug But maybe it''s not being picked up?> you might need to make sure your driver is using the VM_IO flag. > > There was some discussion on LKML about this and they proposed > a patch that wasn''t neccessary. Don''t remember the details but I can > look that up next week.Do you mean this thread? http://xen.1045712.n5.nabble.com/Infiniband-from-userland-in-dom0-process-killed-bad-pagetable-td3259124.html Thanks, Ward. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-31 18:45 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
> Hmm, I do have it:Indeed you do. Good!> > # xm dmesg |grep iommu > (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug > > But maybe it''s not being picked up?You should see something passthrough in the log.. thought that might be only if you are using Intel VT-d? Not sure.> > > you might need to make sure your driver is using the VM_IO flag. > > > > There was some discussion on LKML about this and they proposed > > a patch that wasn''t neccessary. Don''t remember the details but I can > > look that up next week.Found it.. it was from Vivien but in another thread: http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ward Vandewege
2011-Jan-31 19:51 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
Hi Konrad, On Mon, Jan 31, 2011 at 01:45:03PM -0500, Konrad Rzeszutek Wilk wrote:> > Hmm, I do have it: > > Indeed you do. Good! > > > > > # xm dmesg |grep iommu > > (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug > > > > But maybe it''s not being picked up? > > You should see something passthrough in the log.. thought that might > be only if you are using Intel VT-d? Not sure.This seems related: (XEN) HVM: ASIDs enabled. (XEN) HVM: SVM enabled (XEN) HVM: Hardware Assisted Paging detected. (XEN) AMD-Vi: IOMMU 0 Enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Total of 16 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method I''ve attached the full xm dmesg output.> > > you might need to make sure your driver is using the VM_IO flag. > > > > > > There was some discussion on LKML about this and they proposed > > > a patch that wasn''t neccessary. Don''t remember the details but I can > > > look that up next week. > > Found it.. it was from Vivien but in another thread: > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.htmlAh. Is your devel/p2m-identity.v4.5 still the one I should test with to see if it fixes this problem? I see you''ve got newer versions (up to v4.7) now too. Or has this patch meanwhile been pushed into the kernel? Thanks, Ward. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-31 20:03 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
On Mon, Jan 31, 2011 at 02:51:54PM -0500, Ward Vandewege wrote:> Hi Konrad, > > On Mon, Jan 31, 2011 at 01:45:03PM -0500, Konrad Rzeszutek Wilk wrote: > > > Hmm, I do have it: > > > > Indeed you do. Good! > > > > > > > > # xm dmesg |grep iommu > > > (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug > > > > > > But maybe it''s not being picked up? > > > > You should see something passthrough in the log.. thought that might > > be only if you are using Intel VT-d? Not sure. > > This seems related: > > (XEN) HVM: ASIDs enabled. > (XEN) HVM: SVM enabled > (XEN) HVM: Hardware Assisted Paging detected. > (XEN) AMD-Vi: IOMMU 0 Enabled. > (XEN) I/O virtualisation enabled > (XEN) - Dom0 mode: Relaxed > (XEN) Total of 16 processors activated. > (XEN) ENABLING IO-APIC IRQs > (XEN) -> Using new ACK method > > I''ve attached the full xm dmesg output. > > > > > you might need to make sure your driver is using the VM_IO flag. > > > > > > > > There was some discussion on LKML about this and they proposed > > > > a patch that wasn''t neccessary. Don''t remember the details but I can > > > > look that up next week. > > > > Found it.. it was from Vivien but in another thread: > > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html > > Ah. Is your > > devel/p2m-identity.v4.5 > > still the one I should test with to see if it fixes this problem? I see > you''ve got newer versions (up to v4.7) now too.It has a bug that I am working on. I would just look for the VM_IO flag and see if it has been applied somewhere. Or vice-versa - look for where it has _not_ been applied.> > Or has this patch meanwhile been pushed into the kernel?Not yet. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ward Vandewege
2011-Feb-03 23:24 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
On Mon, Jan 31, 2011 at 03:03:22PM -0500, Konrad Rzeszutek Wilk wrote:> > > > > you might need to make sure your driver is using the VM_IO flag. > > > > > > > > > > There was some discussion on LKML about this and they proposed > > > > > a patch that wasn''t neccessary. Don''t remember the details but I can > > > > > look that up next week. > > > > > > Found it.. it was from Vivien but in another thread: > > > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html > > > > Ah. Is your > > > > devel/p2m-identity.v4.5 > > > > still the one I should test with to see if it fixes this problem? I see > > you''ve got newer versions (up to v4.7) now too. > > It has a bug that I am working on. I would just look for the VM_IO flag > and see if it has been applied somewhere. Or vice-versa - look for where > it has _not_ been applied.There are no VM_IO references in the mlx4 driver (the one from OFED 1.5.2). Analogous with what Vivien did, I added --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -548,6 +548,8 @@ return -EINVAL; if (vma->vm_pgoff == 0) { + vma->vm_flags |= VM_IO; + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); if (io_remap_pfn_range(vma, vma->vm_start, @@ -555,6 +557,8 @@ PAGE_SIZE, vma->vm_page_prot)) return -EAGAIN; } else if (vma->vm_pgoff == 1 && dev->dev->caps.bf_reg_size != 0) { + vma->vm_flags |= VM_IO; + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); vma->vm_page_prot = pgprot_wc(vma->vm_page_prot); if (io_remap_pfn_range(vma, vma->vm_start, But that didn''t change a thing. The driver still complains when loaded: [ 1.984843] mlx4_core: Initializing 0000:00:00.0 [ 1.985007] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) [ 1.985007] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 [ 2.994953] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0. [ 2.994997] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) [ 2.995058] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3. [ 2.995087] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. And it still generates this in Xen''s dmesg on the dom0: [ 2862.038307] pciback: vpci: 0000:02:00.0: assign to virtual slot 0 [ 2862.041910] pciback 0000:02:00.0: device has been assigned to another domain! Over-writting the ownership, but beware. [ 2863.076729] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) [ 2863.097501] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi) [ 2864.863782] pciback 0000:02:00.0: enabling device (0000 -> 0002) [ 2864.864217] xen_allocate_pirq: returning irq 19 for gsi 19 [ 2864.864867] Already setup the GSI :19 [ 2864.865232] pciback 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000 (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040 (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080 (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0 I guess there must be something else going on, and/or the above change is not the right one. Thanks, Ward. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Feb-07 16:41 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
Joerg, Any idea what this error might signify?> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0We have been stabing in the dark enabling certain knobs, .. but I am just curious - the fault address - that is the real physical address right?>From the looks of it looks like a normal RAM region, not the PCI BAR space - theAMD VI chipset doesn''t really distinguish between those, or does it? Ward, can you post your lspci -vvv -s 02:00.0 output? I am curious to see what the PCI BAR space is. On Thu, Feb 03, 2011 at 06:24:33PM -0500, Ward Vandewege wrote:> On Mon, Jan 31, 2011 at 03:03:22PM -0500, Konrad Rzeszutek Wilk wrote: > > > > > > you might need to make sure your driver is using the VM_IO flag. > > > > > > > > > > > > There was some discussion on LKML about this and they proposed > > > > > > a patch that wasn''t neccessary. Don''t remember the details but I can > > > > > > look that up next week. > > > > > > > > Found it.. it was from Vivien but in another thread: > > > > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html > > > > > > Ah. Is your > > > > > > devel/p2m-identity.v4.5 > > > > > > still the one I should test with to see if it fixes this problem? I see > > > you''ve got newer versions (up to v4.7) now too. > > > > It has a bug that I am working on. I would just look for the VM_IO flag > > and see if it has been applied somewhere. Or vice-versa - look for where > > it has _not_ been applied. > > There are no VM_IO references in the mlx4 driver (the one from OFED 1.5.2). > Analogous with what Vivien did, I added > > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -548,6 +548,8 @@ > return -EINVAL; > > if (vma->vm_pgoff == 0) { > + vma->vm_flags |= VM_IO; > + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > if (io_remap_pfn_range(vma, vma->vm_start, > @@ -555,6 +557,8 @@ > PAGE_SIZE, vma->vm_page_prot)) > return -EAGAIN; > } else if (vma->vm_pgoff == 1 && dev->dev->caps.bf_reg_size != 0) { > + vma->vm_flags |= VM_IO; > + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); > vma->vm_page_prot = pgprot_wc(vma->vm_page_prot); > > if (io_remap_pfn_range(vma, vma->vm_start, > > But that didn''t change a thing. The driver still complains when loaded: > > [ 1.984843] mlx4_core: Initializing 0000:00:00.0 > [ 1.985007] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002) > [ 1.985007] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19 > [ 2.994953] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0. > [ 2.994997] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000) > [ 2.995058] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3. > [ 2.995087] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting. > > And it still generates this in Xen''s dmesg on the dom0: > > [ 2862.038307] pciback: vpci: 0000:02:00.0: assign to virtual slot 0 > [ 2862.041910] pciback 0000:02:00.0: device has been assigned to another domain! Over-writting the ownership, but beware. > [ 2863.076729] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) > [ 2863.097501] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi) > [ 2864.863782] pciback 0000:02:00.0: enabling device (0000 -> 0002) > [ 2864.864217] xen_allocate_pirq: returning irq 19 for gsi 19 > [ 2864.864867] Already setup the GSI :19 > [ 2864.865232] pciback 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080 > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0 > > I guess there must be something else going on, and/or the above change is not > the right one. > > Thanks, > Ward. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roedel, Joerg
2011-Feb-07 17:03 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
On Mon, Feb 07, 2011 at 11:41:33AM -0500, Konrad Rzeszutek Wilk wrote:> Any idea what this error might signify? > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0 > > We have been stabing in the dark enabling certain knobs, .. but I am > just curious - the fault address - that is the real physical address right? > From the looks of it looks like a normal RAM region, not the PCI BAR space - the > AMD VI chipset doesn''t really distinguish between those, or does it?The fault-address is io-virtual, so this is not a ram physical address. Basically this is the address the device sent a request to and which the IOMMU tried to re-map. You should look into the guest memory layout to find out what might be at those addresses. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ward Vandewege
2011-Feb-07 17:42 UTC
Re: [Xen-devel] AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
On Mon, Feb 07, 2011 at 11:41:33AM -0500, Konrad Rzeszutek Wilk wrote:> Joerg, > > Any idea what this error might signify? > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080 > > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0 > > We have been stabing in the dark enabling certain knobs, .. but I am > just curious - the fault address - that is the real physical address right? > >From the looks of it looks like a normal RAM region, not the PCI BAR space - the > AMD VI chipset doesn''t really distinguish between those, or does it? > > Ward, can you post your lspci -vvv -s 02:00.0 output? I am curious to see > what the PCI BAR space is.Of course, here it is. Booted into 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12 05:46:49 UTC 2011 x86_64 GNU/Linux, from the dom0: # lspci -vvv -s 02:00.0 02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) Subsystem: Super Micro Computer Inc Device 0048 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 19 Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at fc800000 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data pcilib: sysfs_read_vpd: read failed: Connection timed out Not readable Capabilities: [9c] MSI-X: Enable- Count=256 Masked- Vector table: BAR=0 offset=0007c000 PBA: BAR=0 offset=0007d000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: pciback Thanks, Ward. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel