We''re running 64-bit Xex 4.1.1 and 32-bit Linux 3.0.4 Dom0 (Linux 3.1 shows the same symptom.) Several PCI drivers are unable to use DMA. Most fallback to using PIO but in two instances the network drivers (e1000 and pcinet32) abort. The same kernel running on the same hardware without Xen works fine. Digging through the code, in swiotlb-xen.c I find "DMA_BIT_MASK(32)" (0x00000000ffffffff) compared to "xen_virt_to_bus(xen_io_tlb_end - 1)" which resolves to 0x1,20fd,feff. Since the address is larger than the mask, DMA is declared as unsupportable. In talking with others I hear Linux handles this situation with bounce buffers. Is there a config setting I''ve missed to enable that for Xen? (Config file attached) Relevant slice of source callback list: xen_swiotlb_dma_supported (drivers/xen/swiotlb-xen.c: line 591) dma_supported (arch/x86/kernel/pci-dma.c: line 199) dma_set_mask (arch/x86/kernel/pci-dma.c: line 59) e1000_probe (drivers/net/e1000/e1000_main.c: line 986) Relevant patches: https://lkml.org/lkml/2011/9/1/100, "[PATCH v2] xen: x86_32: do not enable iterrupts when returning from exception in interrupt context" http://xen.1045712.n5.nabble.com/PATCH-mm-sync-vmalloc-address-space-page-tables-in-alloc-vm-area-td4757995.html "[PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()" (this patch was reverted for 3.1 but this is 3.0.4) and an additional 2048 NR_IRQS to support (as I understand it) all the virtual devices we might support with 50 guests. Not so relevant patches in md, nbd and loop. lspci: 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09) 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09) 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 09) 00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 09) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 01:00.0 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (A-Segment Bridge) (rev 06) 01:00.2 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (B-Segment Bridge) (rev 06) 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 05:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) 05:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) 06:07.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) 07:08.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) 09:05.0 Class ff00: Dell Remote Access Card 4 Daughter Card 09:05.1 Class ff00: Dell Remote Access Card 4 Daughter Card Virtual UART 09:05.2 Class ff00: Dell Remote Access Card 4 Daughter Card SMIC interface 09:06.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02) 09:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 1 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09) 2 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09) 3 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09) 4 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 09) 5 00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 09) 6 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 7 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 8 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 9 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 10 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 11 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 12 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 13 01:00.0 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (A-Segment Bridge) (rev 06) 14 01:00.2 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (B-Segment Bridge) (rev 06) 15 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 16 05:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) 17 05:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) 18 06:07.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) 19 07:08.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) 20 09:05.0 Class ff00: Dell Remote Access Card 4 Daughter Card 21 09:05.1 Class ff00: Dell Remote Access Card 4 Daughter Card Virtual UART 22 09:05.2 Class ff00: Dell Remote Access Card 4 Daughter Card SMIC interface 23 09:06.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02) 24 09:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 06:07.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) Subsystem: Dell PRO/1000 MT Network Connection Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 64 Region 0: Memory at dfae0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at dcc0 [size=64] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- log file attached. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Dec 09, 2011 at 08:19:47PM +0000, Taylor, Neal E wrote:> We''re running 64-bit Xex 4.1.1 and 32-bit Linux 3.0.4 Dom0 (Linux 3.1 shows the same symptom.)Hm, 32-bit. Did it work if the Dom0 was 64-bit?> > Several PCI drivers are unable to use DMA. Most fallback to using PIO but in two instances the network drivers (e1000 and pcinet32) abort. The same kernel running on the same hardware without Xen works fine. > > Digging through the code, in swiotlb-xen.c I find "DMA_BIT_MASK(32)" (0x00000000ffffffff) compared to "xen_virt_to_bus(xen_io_tlb_end - 1)" which resolves to 0x1,20fd,feff. Since the address is larger than the mask, DMA is declared as unsupportable.<blinks> xen_io_tlb_end resolved to 120fdfeff? That is a definite bug. Can you attach you full bootup serial log with ''debug loglevel=8'' parameters on the Linux line please?> In talking with others I hear Linux handles this situation with bounce buffers. Is there a config setting I''ve missed to enable that for Xen? (Config file attached)The Xen SWIOTLB is by default enabled, so it is on, but the xen_virt_to_bus(xen_io_tlb_end - 1) _MUST_ never be above 4GB. In your case it is, which is bad. It is rather surprising as I had not seen this ever happen.
We''re having trouble getting a serial log. Are there other ways to capture the information you need? Attached is a dmesg with ''debug loglevel 8'' set on the kernel line... actually, with #define DEFAULT_MESSAGE_LOGLEVEL 8 set near the top of printk.c as well, since I wasn''t seeing any difference in the log files with loglevel set to 8. Neal -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Sent: Friday, December 09, 2011 12:30 PM To: Taylor, Neal E Cc: xen-devel; Kalev, Leonid; Dave, Tushar N Subject: Re: [Xen-devel] Buffers not reachable by PCI On Fri, Dec 09, 2011 at 08:19:47PM +0000, Taylor, Neal E wrote:> We''re running 64-bit Xex 4.1.1 and 32-bit Linux 3.0.4 Dom0 (Linux 3.1 shows the same symptom.)Hm, 32-bit. Did it work if the Dom0 was 64-bit?> > Several PCI drivers are unable to use DMA. Most fallback to using PIO but in two instances the network drivers (e1000 and pcinet32) abort. The same kernel running on the same hardware without Xen works fine. > > Digging through the code, in swiotlb-xen.c I find "DMA_BIT_MASK(32)" (0x00000000ffffffff) compared to "xen_virt_to_bus(xen_io_tlb_end - 1)" which resolves to 0x1,20fd,feff. Since the address is larger than the mask, DMA is declared as unsupportable.<blinks> xen_io_tlb_end resolved to 120fdfeff? That is a definite bug. Can you attach you full bootup serial log with ''debug loglevel=8'' parameters on the Linux line please?> In talking with others I hear Linux handles this situation with bounce buffers. Is there a config setting I''ve missed to enable that for Xen? (Config file attached)The Xen SWIOTLB is by default enabled, so it is on, but the xen_virt_to_bus(xen_io_tlb_end - 1) _MUST_ never be above 4GB. In your case it is, which is bad. It is rather surprising as I had not seen this ever happen. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Dec 12, 2011 at 10:11:20PM +0000, Taylor, Neal E wrote:> We''re having trouble getting a serial log. Are there other ways to capture the information you need? > > Attached is a dmesg with ''debug loglevel 8'' set on the kernel line... actually, with > > #define DEFAULT_MESSAGE_LOGLEVEL 8 > > set near the top of printk.c as well, since I wasn''t seeing any difference in the log files with loglevel set to 8.I needed this: [ 0.000000] Reserving virtual address space above 0xff800000 [ 0.000000] Linux version 3.0.4-36.xen0 (root@nt-dev-Cent55-32) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Mon Dec 12 14:54:39 EST 2011 [ 0.000000] released 0 pages of unused memory [ 0.000000] 1-1 mapping on a0->100 [ 0.000000] 1-1 mapping on bffc0->100000 [ 0.000000] Set 262304 page(s) to 1-1 mapping. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000020000000 (usable) [ 0.000000] Xen: 0000000020000000 - 00000000bffc0000 (unusable) [ 0.000000] Xen: 00000000bffc0000 - 00000000bffcfc00 (ACPI data) [ 0.000000] Xen: 00000000bffcfc00 - 00000000bffff000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000fec90000 (reserved) [ 0.000000] Xen: 00000000fed00000 - 00000000fed00400 (reserved) [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) [ 0.000000] Xen: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] Xen: 0000000100000000 - 00000001dffc0000 (usable) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.3 present. [ 0.000000] DMI: Dell Computer Corporation PowerEdge 1850/0RC130, BIOS A05 01/09/2006 [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) [ 0.000000] last_pfn = 0x1dffc0 max_arch_pfn = 0x1000000 [ 0.000000] found SMP MP-table at [c00fe710] fe710 [ 0.000000] initial memory mapped : 0 - 023ff000 [ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096 [ 0.000000] init_memory_mapping: 0000000000000000-00000000373fe000 [ 0.000000] 0000000000 - 00373fe000 page 4k [ 0.000000] kernel direct mapping tables up to 373fe000 @ 2242000-23ff000 [ 0.000000] xen: setting RW the range 23ea000 - 23ff000 [ 0.000000] RAMDISK: 016fb000 - 01ab2000 .. snip.. [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 And that tells me that 1) it is allocated above the 4GB - which from a PFN perspectivie is not a big deal, as the MFNs are below 4GB 2), but it messes up the other drivers which expect the SWIOTLB to be under 4GB. Try this patch: diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 8e964b9..600b53c 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -166,7 +166,7 @@ retry: /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_low(bytes); if (!xen_io_tlb_start) { m = "Cannot allocate Xen-SWIOTLB buffer!\n"; goto error;
I''m not seeing any difference in symptom or log. Also, where is the log telling you that it''s allocated above 4GB? Same snippet of the new log: [ 0.000000] Reserving virtual address space above 0xff800000 [ 0.000000] Linux version 3.0.4-37.xen0 (root@nt-dev-Cent55-32) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Dec 13 12:19:21 EST 2011 [ 0.000000] released 0 pages of unused memory [ 0.000000] Set 262304 page(s) to 1-1 mapping. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000020000000 (usable) [ 0.000000] Xen: 0000000020000000 - 00000000bffc0000 (unusable) [ 0.000000] Xen: 00000000bffc0000 - 00000000bffcfc00 (ACPI data) [ 0.000000] Xen: 00000000bffcfc00 - 00000000bffff000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000fec90000 (reserved) [ 0.000000] Xen: 00000000fed00000 - 00000000fed00400 (reserved) [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) [ 0.000000] Xen: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] Xen: 0000000100000000 - 00000001dffc0000 (usable) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.3 present. [ 0.000000] last_pfn = 0x1dffc0 max_arch_pfn = 0x1000000 [ 0.000000] found SMP MP-table at [c00fe710] fe710 [ 0.000000] init_memory_mapping: 0000000000000000-00000000373fe000 [ 0.000000] RAMDISK: 016fb000 - 01ab2000 .. snip.. [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 And, to let you double check that not chasing a error on my part, this is the place and how I''m seeing the above 4GB address: last lines of drivers/xen/swiotlb-xen.c: /* * Return whether the given device DMA address mask can be supported * properly. For example, if your device can only drive the low 24-bits * during bus mastering, then you would pass 0x00ffffff as the mask to * this function. */ int xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { phys_addr_t phys; dma_addr_t tlb_end; phys = virt_to_phys(xen_io_tlb_end); tlb_end = xen_virt_to_bus(xen_io_tlb_end - 1); printk(KERN_ERR "NT_DBG: xen_io_tlb_end: %lx, converted to ''phys'': %Lx, then ''bus'': %Lx for answer %d.\n", xen_io_tlb_end, phys, tlb_end, (tlb_end <= mask)); return tlb_end <= mask; // return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask; } EXPORT_SYMBOL_GPL(xen_swiotlb_dma_supported); Neal -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.r.wilk@gmail.com] On Behalf Of Konrad Rzeszutek Wilk Sent: Monday, December 12, 2011 4:19 PM To: Taylor, Neal E Cc: xen-devel; Kalev, Leonid; Dave, Tushar N Subject: Re: [Xen-devel] Buffers not reachable by PCI On Mon, Dec 12, 2011 at 10:11:20PM +0000, Taylor, Neal E wrote:> We''re having trouble getting a serial log. Are there other ways to capture the information you need? > > Attached is a dmesg with ''debug loglevel 8'' set on the kernel line... actually, with > > #define DEFAULT_MESSAGE_LOGLEVEL 8 > > set near the top of printk.c as well, since I wasn''t seeing any difference in the log files with loglevel set to 8.I needed this: [ 0.000000] Reserving virtual address space above 0xff800000 [ 0.000000] Linux version 3.0.4-36.xen0 (root@nt-dev-Cent55-32) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Mon Dec 12 14:54:39 EST 2011 [ 0.000000] released 0 pages of unused memory [ 0.000000] 1-1 mapping on a0->100 [ 0.000000] 1-1 mapping on bffc0->100000 [ 0.000000] Set 262304 page(s) to 1-1 mapping. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000020000000 (usable) [ 0.000000] Xen: 0000000020000000 - 00000000bffc0000 (unusable) [ 0.000000] Xen: 00000000bffc0000 - 00000000bffcfc00 (ACPI data) [ 0.000000] Xen: 00000000bffcfc00 - 00000000bffff000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000fec90000 (reserved) [ 0.000000] Xen: 00000000fed00000 - 00000000fed00400 (reserved) [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) [ 0.000000] Xen: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] Xen: 0000000100000000 - 00000001dffc0000 (usable) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.3 present. [ 0.000000] DMI: Dell Computer Corporation PowerEdge 1850/0RC130, BIOS A05 01/09/2006 [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) [ 0.000000] last_pfn = 0x1dffc0 max_arch_pfn = 0x1000000 [ 0.000000] found SMP MP-table at [c00fe710] fe710 [ 0.000000] initial memory mapped : 0 - 023ff000 [ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096 [ 0.000000] init_memory_mapping: 0000000000000000-00000000373fe000 [ 0.000000] 0000000000 - 00373fe000 page 4k [ 0.000000] kernel direct mapping tables up to 373fe000 @ 2242000-23ff000 [ 0.000000] xen: setting RW the range 23ea000 - 23ff000 [ 0.000000] RAMDISK: 016fb000 - 01ab2000 .. snip.. [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 And that tells me that 1) it is allocated above the 4GB - which from a PFN perspectivie is not a big deal, as the MFNs are below 4GB 2), but it messes up the other drivers which expect the SWIOTLB to be under 4GB. Try this patch: diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 8e964b9..600b53c 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -166,7 +166,7 @@ retry: /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_low(bytes); if (!xen_io_tlb_start) { m = "Cannot allocate Xen-SWIOTLB buffer!\n"; goto error;
Is it the translation that''s in error? Modeled after the translation in xen_swiotlb_dma_supported that''s used for the problematic comparison, I added "the same" translation to swiotlb_print. I don''t understand the results as dend - dstart is vastly larger that pend - pstart. void swiotlb_print_info(void) { unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT; phys_addr_t pstart, pend; + dma_addr_t dstart, dend; + pstart = virt_to_phys(io_tlb_start); + pend = virt_to_phys(io_tlb_end); + dstart = phys_to_machine(XPADDR(pstart)).maddr; dend = phys_to_machine(XPADDR(pend)). maddr; printk(KERN_INFO "Placing %luMB software IO TLB between %p - %p\n", bytes >> 20, io_tlb_start, io_tlb_end); printk(KERN_INFO "software IO TLB at phys %#llx - %#llx\n", (unsigned long long)pstart, (unsigned long long)pend); + printk(KERN_INFO "software IO TLB at bus %#llx - %#llx\n", + (unsigned long long)dstart, + (unsigned long long)dend); } Yields: [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00 -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.r.wilk@gmail.com] On Behalf Of Konrad Rzeszutek Wilk Sent: Monday, December 12, 2011 4:19 PM To: Taylor, Neal E Cc: xen-devel; Kalev, Leonid; Dave, Tushar N Subject: Re: [Xen-devel] Buffers not reachable by PCI On Mon, Dec 12, 2011 at 10:11:20PM +0000, Taylor, Neal E wrote:> We''re having trouble getting a serial log. Are there other ways to capture the information you need? > > Attached is a dmesg with ''debug loglevel 8'' set on the kernel line... actually, with > > #define DEFAULT_MESSAGE_LOGLEVEL 8 > > set near the top of printk.c as well, since I wasn''t seeing any difference in the log files with loglevel set to 8.I needed this: [ 0.000000] Reserving virtual address space above 0xff800000 [ 0.000000] Linux version 3.0.4-36.xen0 (root@nt-dev-Cent55-32) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Mon Dec 12 14:54:39 EST 2011 [ 0.000000] released 0 pages of unused memory [ 0.000000] 1-1 mapping on a0->100 [ 0.000000] 1-1 mapping on bffc0->100000 [ 0.000000] Set 262304 page(s) to 1-1 mapping. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000020000000 (usable) [ 0.000000] Xen: 0000000020000000 - 00000000bffc0000 (unusable) [ 0.000000] Xen: 00000000bffc0000 - 00000000bffcfc00 (ACPI data) [ 0.000000] Xen: 00000000bffcfc00 - 00000000bffff000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000fec90000 (reserved) [ 0.000000] Xen: 00000000fed00000 - 00000000fed00400 (reserved) [ 0.000000] Xen: 00000000fee00000 - 00000000fee10000 (reserved) [ 0.000000] Xen: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] Xen: 0000000100000000 - 00000001dffc0000 (usable) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.3 present. [ 0.000000] DMI: Dell Computer Corporation PowerEdge 1850/0RC130, BIOS A05 01/09/2006 [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) [ 0.000000] last_pfn = 0x1dffc0 max_arch_pfn = 0x1000000 [ 0.000000] found SMP MP-table at [c00fe710] fe710 [ 0.000000] initial memory mapped : 0 - 023ff000 [ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096 [ 0.000000] init_memory_mapping: 0000000000000000-00000000373fe000 [ 0.000000] 0000000000 - 00373fe000 page 4k [ 0.000000] kernel direct mapping tables up to 373fe000 @ 2242000-23ff000 [ 0.000000] xen: setting RW the range 23ea000 - 23ff000 [ 0.000000] RAMDISK: 016fb000 - 01ab2000 .. snip.. [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 And that tells me that 1) it is allocated above the 4GB - which from a PFN perspectivie is not a big deal, as the MFNs are below 4GB 2), but it messes up the other drivers which expect the SWIOTLB to be under 4GB. Try this patch: diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 8e964b9..600b53c 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -166,7 +166,7 @@ retry: /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_low(bytes); if (!xen_io_tlb_start) { m = "Cannot allocate Xen-SWIOTLB buffer!\n"; goto error;
On Tue, Dec 13, 2011 at 10:17:50PM +0000, Taylor, Neal E wrote:> > Is it the translation that''s in error? > > Modeled after the translation in xen_swiotlb_dma_supported that''s used for the problematic comparison, I added "the same" translation to swiotlb_print. I don''t understand the results as dend - dstart is vastly larger that pend - pstart.You might want to instrument the xen_swiotlb_fixup code to get an idea. But basically there are "chunks" of 2MB (I think) of contingous memory that is swizzled in to the memory that starts at io_tlb_start. But all of that memory SHOULD be under the 4GB limit (set by max_dma_bits). Sadly in your case one of those "chunks" ends up being past the 4GB limit - which should never happen. Or if it did happen it would print out "Failed to get contiguous memory for DMA from.." But you don''t get any of that. To get a good idea of this, you could do something like this unsigned long mfn, next_mfn; mfn= PFN_DOWN(phys_to_machine(XPADDR(pstart)).maddr); for (i = pstart; i < pend;) { next_mfn = PFN_DOWN(phys_to_machine(XPADDR(i)).maddr); if (next_mfn == mfn+1) { mfn++; } else { printk(KERN_INFO "MFN 0x%lx->0x%lx\n", mfn, next_mfn); mfn = next_mfn; } i+=PAGE_SIZE; } which should print you those "chunks", if my logic here is right. Can you send me your ''xl info'' (or ''xl dmesg''), please? I tried to reproduce this with a 3.0.4 kernel on a 8GB and I couldn''t reproduce this. Hm, will look in your .config in case there is something funky there.
Instrumentation results: [ 0.000000] MFN 0x1c0->0x1c0 [ 0.000000] MFN 0x1ff->0x180 [ 0.000000] MFN 0x1bf->0x140 [ 0.000000] MFN 0x17f->0x100 [ 0.000000] MFN 0x13f->0x3c0 [ 0.000000] MFN 0x3ff->0x380 [ 0.000000] MFN 0x3bf->0x340 [ 0.000000] MFN 0x37f->0x300 [ 0.000000] MFN 0x33f->0x2c0 [ 0.000000] MFN 0x2ff->0x280 [ 0.000000] MFN 0x2bf->0x240 [ 0.000000] MFN 0x27f->0x200 [ 0.000000] MFN 0x23f->0x7c0 [ 0.000000] MFN 0x7ff->0x780 [ 0.000000] MFN 0x7bf->0x740 [ 0.000000] MFN 0x77f->0x700 [ 0.000000] MFN 0x73f->0x6c0 [ 0.000000] MFN 0x6ff->0x680 [ 0.000000] MFN 0x6bf->0x640 [ 0.000000] MFN 0x67f->0x600 [ 0.000000] MFN 0x63f->0x5c0 [ 0.000000] MFN 0x5ff->0x580 [ 0.000000] MFN 0x5bf->0x540 [ 0.000000] MFN 0x57f->0x500 [ 0.000000] MFN 0x53f->0x4c0 [ 0.000000] MFN 0x4ff->0x480 [ 0.000000] MFN 0x4bf->0x440 [ 0.000000] MFN 0x47f->0x400 [ 0.000000] MFN 0x43f->0xfc0 [ 0.000000] MFN 0xfff->0xf80 [ 0.000000] MFN 0xfbf->0xf40 [ 0.000000] MFN 0xf7f->0xf00 [ 0.000000] MFN 0xf3f->0xec0 [ 0.000000] MFN 0xeff->0xe80 [ 0.000000] MFN 0xebf->0xe40 [ 0.000000] MFN 0xe7f->0xe00 [ 0.000000] MFN 0xe3f->0xdc0 [ 0.000000] MFN 0xdff->0xd80 [ 0.000000] MFN 0xdbf->0xd40 [ 0.000000] MFN 0xd7f->0xd00 [ 0.000000] MFN 0xd3f->0xcc0 [ 0.000000] MFN 0xcff->0xc80 [ 0.000000] MFN 0xcbf->0xc40 [ 0.000000] MFN 0xc7f->0xc00 [ 0.000000] MFN 0xc3f->0xbc0 [ 0.000000] MFN 0xbff->0xb80 [ 0.000000] MFN 0xbbf->0xb40 [ 0.000000] MFN 0xb7f->0xb00 [ 0.000000] MFN 0xb3f->0xac0 [ 0.000000] MFN 0xaff->0xa80 [ 0.000000] MFN 0xabf->0xa40 [ 0.000000] MFN 0xa7f->0xa00 [ 0.000000] MFN 0xa3f->0x9c0 [ 0.000000] MFN 0x9ff->0x980 [ 0.000000] MFN 0x9bf->0x940 [ 0.000000] MFN 0x97f->0x900 [ 0.000000] MFN 0x93f->0x8c0 [ 0.000000] MFN 0x8ff->0x880 [ 0.000000] MFN 0x8bf->0x840 [ 0.000000] MFN 0x87f->0x800 [ 0.000000] MFN 0x83f->0x1fc0 [ 0.000000] MFN 0x1fff->0x1f80 [ 0.000000] MFN 0x1fbf->0x1f40 [ 0.000000] MFN 0x1f7f->0x1f00 [ 0.000000] MFN 0x1f3f->0x1ec0 [ 0.000000] MFN 0x1eff->0x1e80 [ 0.000000] MFN 0x1ebf->0x1e40 [ 0.000000] MFN 0x1e7f->0x1e00 [ 0.000000] MFN 0x1e3f->0x1dc0 [ 0.000000] MFN 0x1dff->0x1d80 [ 0.000000] MFN 0x1dbf->0x1d40 [ 0.000000] MFN 0x1d7f->0x1d00 [ 0.000000] MFN 0x1d3f->0x1cc0 [ 0.000000] MFN 0x1cff->0x1c80 [ 0.000000] MFN 0x1cbf->0x1c40 [ 0.000000] MFN 0x1c7f->0x1c00 [ 0.000000] MFN 0x1c3f->0x1bc0 [ 0.000000] MFN 0x1bff->0x1b80 [ 0.000000] MFN 0x1bbf->0x1b40 [ 0.000000] MFN 0x1b7f->0x1b00 [ 0.000000] MFN 0x1b3f->0x1ac0 [ 0.000000] MFN 0x1aff->0x1a80 [ 0.000000] MFN 0x1abf->0x1a40 [ 0.000000] MFN 0x1a7f->0x1a00 [ 0.000000] MFN 0x1a3f->0x19c0 [ 0.000000] MFN 0x19ff->0x1980 [ 0.000000] MFN 0x19bf->0x1940 [ 0.000000] MFN 0x197f->0x1900 [ 0.000000] MFN 0x193f->0x18c0 [ 0.000000] MFN 0x18ff->0x1880 [ 0.000000] MFN 0x18bf->0x1840 [ 0.000000] MFN 0x187f->0x1800 [ 0.000000] MFN 0x183f->0x17c0 [ 0.000000] MFN 0x17ff->0x1780 [ 0.000000] MFN 0x17bf->0x1740 [ 0.000000] MFN 0x177f->0x1700 [ 0.000000] MFN 0x173f->0x16c0 [ 0.000000] MFN 0x16ff->0x1680 [ 0.000000] MFN 0x16bf->0x1640 [ 0.000000] MFN 0x167f->0x1600 [ 0.000000] MFN 0x163f->0x15c0 [ 0.000000] MFN 0x15ff->0x1580 [ 0.000000] MFN 0x15bf->0x1540 [ 0.000000] MFN 0x157f->0x1500 [ 0.000000] MFN 0x153f->0x14c0 [ 0.000000] MFN 0x14ff->0x1480 [ 0.000000] MFN 0x14bf->0x1440 [ 0.000000] MFN 0x147f->0x1400 [ 0.000000] MFN 0x143f->0x13c0 [ 0.000000] MFN 0x13ff->0x1380 [ 0.000000] MFN 0x13bf->0x1340 [ 0.000000] MFN 0x137f->0x1300 [ 0.000000] MFN 0x133f->0x12c0 [ 0.000000] MFN 0x12ff->0x1280 [ 0.000000] MFN 0x12bf->0x1240 [ 0.000000] MFN 0x127f->0x1200 [ 0.000000] MFN 0x123f->0x11c0 [ 0.000000] MFN 0x11ff->0x1180 [ 0.000000] MFN 0x11bf->0x1140 [ 0.000000] MFN 0x117f->0x1100 [ 0.000000] MFN 0x113f->0x10c0 [ 0.000000] MFN 0x10ff->0x1080 [ 0.000000] MFN 0x10bf->0x1040 [ 0.000000] MFN 0x107f->0x1000 [ 0.000000] MFN 0x103f->0x3fc0 [ 0.000000] MFN 0x3fff->0x3f80 [ 0.000000] MFN 0x3fbf->0x3f40 [ 0.000000] MFN 0x3f7f->0x3f00 [ 0.000000] MFN 0x3f3f->0x3ec0 [ 0.000000] MFN 0x3eff->0x3e80 [ 0.000000] MFN 0x3ebf->0x3e40 [ 0.000000] MFN 0x3e7f->0x3e00 [ 0.000000] MFN 0x3e3f->0x3dc0 [ 0.000000] MFN 0x3dff->0x3d80 [ 0.000000] MFN 0x3dbf->0x3d40 [ 0.000000] MFN 0x3d7f->0x3d00 [ 0.000000] MFN 0x3d3f->0x3cc0 [ 0.000000] MFN 0x3cff->0x3c80 [ 0.000000] MFN 0x3cbf->0x3c40 [ 0.000000] MFN 0x3c7f->0x3c00 [ 0.000000] MFN 0x3c3f->0x3bc0 [ 0.000000] MFN 0x3bff->0x3b80 [ 0.000000] MFN 0x3bbf->0x3b40 [ 0.000000] MFN 0x3b7f->0x3b00 [ 0.000000] MFN 0x3b3f->0x3ac0 [ 0.000000] MFN 0x3aff->0x3a80 [ 0.000000] MFN 0x3abf->0x3a40 [ 0.000000] MFN 0x3a7f->0x3a00 [ 0.000000] MFN 0x3a3f->0x39c0 [ 0.000000] MFN 0x39ff->0x3980 [ 0.000000] MFN 0x39bf->0x3940 [ 0.000000] MFN 0x397f->0x3900 [ 0.000000] MFN 0x393f->0x38c0 [ 0.000000] MFN 0x38ff->0x3880 [ 0.000000] MFN 0x38bf->0x3840 [ 0.000000] MFN 0x387f->0x3800 [ 0.000000] MFN 0x383f->0x37c0 [ 0.000000] MFN 0x37ff->0x3780 [ 0.000000] MFN 0x37bf->0x3740 [ 0.000000] MFN 0x377f->0x3700 [ 0.000000] MFN 0x373f->0x36c0 [ 0.000000] MFN 0x36ff->0x3680 [ 0.000000] MFN 0x36bf->0x3640 [ 0.000000] MFN 0x367f->0x3600 [ 0.000000] MFN 0x363f->0x35c0 [ 0.000000] MFN 0x35ff->0x3580 [ 0.000000] MFN 0x35bf->0x3540 [ 0.000000] MFN 0x357f->0x3500 [ 0.000000] MFN 0x353f->0x34c0 [ 0.000000] MFN 0x34ff->0x3480 [ 0.000000] MFN 0x34bf->0x3440 [ 0.000000] MFN 0x347f->0x3400 [ 0.000000] MFN 0x343f->0x33c0 [ 0.000000] MFN 0x33ff->0x3380 [ 0.000000] MFN 0x33bf->0x3340 [ 0.000000] MFN 0x337f->0x3300 [ 0.000000] MFN 0x333f->0x32c0 [ 0.000000] MFN 0x32ff->0x3280 [ 0.000000] MFN 0x32bf->0x3240 [ 0.000000] MFN 0x327f->0x3200 [ 0.000000] MFN 0x323f->0x31c0 [ 0.000000] MFN 0x31ff->0x3180 [ 0.000000] MFN 0x31bf->0x3140 [ 0.000000] MFN 0x317f->0x3100 [ 0.000000] MFN 0x313f->0x30c0 [ 0.000000] MFN 0x30ff->0x3080 [ 0.000000] MFN 0x30bf->0x3040 [ 0.000000] MFN 0x307f->0x3000 [ 0.000000] MFN 0x303f->0x2fc0 [ 0.000000] MFN 0x2fff->0x2f80 [ 0.000000] MFN 0x2fbf->0x2f40 [ 0.000000] MFN 0x2f7f->0x2f00 [ 0.000000] MFN 0x2f3f->0x2ec0 [ 0.000000] MFN 0x2eff->0x2e80 [ 0.000000] MFN 0x2ebf->0x2e40 [ 0.000000] MFN 0x2e7f->0x2e00 [ 0.000000] MFN 0x2e3f->0x2dc0 [ 0.000000] MFN 0x2dff->0x2d80 [ 0.000000] MFN 0x2dbf->0x2d40 [ 0.000000] MFN 0x2d7f->0x2d00 [ 0.000000] MFN 0x2d3f->0x2cc0 [ 0.000000] MFN 0x2cff->0x2c80 [ 0.000000] MFN 0x2cbf->0x2c40 [ 0.000000] MFN 0x2c7f->0x2c00 [ 0.000000] MFN 0x2c3f->0x2bc0 [ 0.000000] MFN 0x2bff->0x2b80 [ 0.000000] MFN 0x2bbf->0x2b40 [ 0.000000] MFN 0x2b7f->0x2b00 [ 0.000000] MFN 0x2b3f->0x2ac0 [ 0.000000] MFN 0x2aff->0x2a80 [ 0.000000] MFN 0x2abf->0x2a40 [ 0.000000] MFN 0x2a7f->0x2a00 [ 0.000000] MFN 0x2a3f->0x29c0 [ 0.000000] MFN 0x29ff->0x2980 [ 0.000000] MFN 0x29bf->0x2940 [ 0.000000] MFN 0x297f->0x2900 [ 0.000000] MFN 0x293f->0x28c0 [ 0.000000] MFN 0x28ff->0x2880 [ 0.000000] MFN 0x28bf->0x2840 [ 0.000000] MFN 0x287f->0x2800 [ 0.000000] MFN 0x283f->0x27c0 [ 0.000000] MFN 0x27ff->0x2780 [ 0.000000] MFN 0x27bf->0x2740 [ 0.000000] MFN 0x277f->0x2700 [ 0.000000] MFN 0x273f->0x26c0 [ 0.000000] MFN 0x26ff->0x2680 [ 0.000000] MFN 0x26bf->0x2640 [ 0.000000] MFN 0x267f->0x2600 [ 0.000000] MFN 0x263f->0x25c0 [ 0.000000] MFN 0x25ff->0x2580 [ 0.000000] MFN 0x25bf->0x2540 [ 0.000000] MFN 0x257f->0x2500 [ 0.000000] MFN 0x253f->0x24c0 [ 0.000000] MFN 0x24ff->0x2480 [ 0.000000] MFN 0x24bf->0x2440 [ 0.000000] MFN 0x247f->0x2400 [ 0.000000] MFN 0x243f->0x23c0 [ 0.000000] MFN 0x23ff->0x2380 [ 0.000000] MFN 0x23bf->0x2340 [ 0.000000] MFN 0x237f->0x2300 [ 0.000000] MFN 0x233f->0x22c0 [ 0.000000] MFN 0x22ff->0x2280 [ 0.000000] MFN 0x22bf->0x2240 [ 0.000000] MFN 0x227f->0x2200 [ 0.000000] MFN 0x223f->0x21c0 [ 0.000000] MFN 0x21ff->0x2180 [ 0.000000] MFN 0x21bf->0x2140 [ 0.000000] MFN 0x217f->0x2100 [ 0.000000] MFN 0x213f->0x20c0 [ 0.000000] MFN 0x20ff->0x2080 [ 0.000000] MFN 0x20bf->0x2040 [ 0.000000] MFN 0x207f->0x2000 [ 0.000000] MFN 0x203f->0x7fc0 [ 0.000000] MFN 0x7fff->0x7f80 [ 0.000000] MFN 0x7fbf->0x7f40 [ 0.000000] MFN 0x7f7f->0x7f00 [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00 [ 0.000000] Initializing HighMem for node 0 (000373fe:001dffc0) [ 0.000000] Memory: 384804k/7864064k available (2338k kernel code, 139036k reserved, 1197k data, 372k init, 0k highmem) Actual code in xen_swiotlb_init after xen_swiotlb_fixup call: { phys_addr_t pstart, pend; unsigned long mfn, next_mfn; int i; pstart = virt_to_phys(xen_io_tlb_start); pend = virt_to_phys(xen_io_tlb_end); mfn= PFN_DOWN(phys_to_machine(XPADDR(pstart)).maddr); for (i = pstart; i < pend;) { next_mfn = PFN_DOWN(phys_to_machine(XPADDR(i)).maddr); if (next_mfn == mfn+1) { mfn++; } else { printk(KERN_INFO "MFN 0x%lx->0x%lx\n", mfn, next_mfn); mfn = next_mfn; } i+=PAGE_SIZE; } } We''re not getting so far along as to be running xend yet. That normally gets started remotely and without networks it doesn''t happen. I can probably do that tomorrow. -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.r.wilk@gmail.com] On Behalf Of Konrad Rzeszutek Wilk Sent: Tuesday, December 13, 2011 3:28 PM To: Taylor, Neal E Cc: xen-devel; Kalev, Leonid; Dave, Tushar N Subject: Re: [Xen-devel] Buffers not reachable by PCI On Tue, Dec 13, 2011 at 10:17:50PM +0000, Taylor, Neal E wrote:> > Is it the translation that''s in error? > > Modeled after the translation in xen_swiotlb_dma_supported that''s used for the problematic comparison, I added "the same" translation to swiotlb_print. I don''t understand the results as dend - dstart is vastly larger that pend - pstart.You might want to instrument the xen_swiotlb_fixup code to get an idea. But basically there are "chunks" of 2MB (I think) of contingous memory that is swizzled in to the memory that starts at io_tlb_start. But all of that memory SHOULD be under the 4GB limit (set by max_dma_bits). Sadly in your case one of those "chunks" ends up being past the 4GB limit - which should never happen. Or if it did happen it would print out "Failed to get contiguous memory for DMA from.." But you don''t get any of that. To get a good idea of this, you could do something like this unsigned long mfn, next_mfn; mfn= PFN_DOWN(phys_to_machine(XPADDR(pstart)).maddr); for (i = pstart; i < pend;) { next_mfn = PFN_DOWN(phys_to_machine(XPADDR(i)).maddr); if (next_mfn == mfn+1) { mfn++; } else { printk(KERN_INFO "MFN 0x%lx->0x%lx\n", mfn, next_mfn); mfn = next_mfn; } i+=PAGE_SIZE; } which should print you those "chunks", if my logic here is right. Can you send me your ''xl info'' (or ''xl dmesg''), please? I tried to reproduce this with a 3.0.4 kernel on a 8GB and I couldn''t reproduce this. Hm, will look in your .config in case there is something funky there.
>>> On 14.12.11 at 01:38, "Taylor, Neal E" <Neal.Taylor@ca.com> wrote: > [ 0.000000] MFN 0x7f7f->0x7f00This is clearly indicating the last chunk ends well below the 4G boundary.> [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 > [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 > [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00Consequently, the question is how you got to this value, or what changed between the first and last quoted printouts. Jan
The last quoted printout is calculated the same way as the test that fails which leads me to question the computation''s validity. The test that fails is (from drivers/xen/swiotlb-xen.c): xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) { return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask; } "xen_virt_to_bus" in turn: static dma_addr_t xen_virt_to_bus(void *address) { return xen_phys_to_bus(virt_to_phys(address)); } Now, "virt_to_phys" (out of arch/x86/include/asm/io.h) is defined as follows but carries a comment bothersome to this usage of it as we''re, basically, dealing with a dma "transfer" and calling from a device driver: /** * virt_to_phys - map virtual addresses to physical * @address: address to remap * * The returned physical address is the physical (CPU) mapping for * the memory address given. It is only valid to use this function on * addresses directly mapped or allocated via kmalloc. * * This function does not give bus mappings for DMA transfers. In * almost all conceivable cases a device driver should not be using * this function */ static inline phys_addr_t virt_to_phys(volatile void *address) { return __pa(address); } Going a couple of steps further, "__pa" is defined (in arch/x86/include/asm/page.h) as: #define __pa(x) __phys_addr((unsigned long)(x)) and "__phys_addr" (in arch/x86/mm/physaddr.c) as: unsigned long __phys_addr(unsigned long x) { if (x >= __START_KERNEL_map) { x -= __START_KERNEL_map; VIRTUAL_BUG_ON(x >= KERNEL_IMAGE_SIZE); x += phys_base; } else { VIRTUAL_BUG_ON(x < PAGE_OFFSET); x -= PAGE_OFFSET; VIRTUAL_BUG_ON(!phys_addr_valid(x)); } return x; } EXPORT_SYMBOL(__phys_addr); So, if "virt_to_phys" isn''t yielding a valid test of whether addresses will be reachable from a PCI device, what''s the correct way to test it and should xen_swiotlb_dma_supported be updated to the correct way (I think so) or a new function be created? Neal -----Original Message----- From: Jan Beulich [mailto:JBeulich@suse.com] Sent: Wednesday, December 14, 2011 1:20 AM To: Taylor, Neal E Cc: Kalev, Leonid; Konrad Rzeszutek Wilk; Tushar N Dave; xen-devel Subject: Re: [Xen-devel] Buffers not reachable by PCI>>> On 14.12.11 at 01:38, "Taylor, Neal E" <Neal.Taylor@ca.com> wrote: > [ 0.000000] MFN 0x7f7f->0x7f00This is clearly indicating the last chunk ends well below the 4G boundary.> [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 > [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 > [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00Consequently, the question is how you got to this value, or what changed between the first and last quoted printouts. Jan
On 12/14/2011 06:42 PM, Taylor, Neal E wrote:> The last quoted printout is calculated the same way as the test that fails which leads me to question the computation''s validity.The computation validity seems OK to me (the ''phys'' address is, as correctly stated in the comments, not suitable for DMA, but it is first translated via xen_phys_to_bus, which gives the machine address - and on x86 that is also the bus address) I have a STUPID question, though: why are the start and end addresses of the swiotlb memory area not page-aligned??? The io_tlb_end address is one byte PAST the valid area, which is why the xen_swiotlb_dma_supported function uses (xen_io_tlb_end-1). However, this will work properly only if the value is page-aligned. If it isn''t, then decrementing it by one will keep the value in the same page (which is one page past the last valid one). The function that allocates the memory uses alloc_bootmem(), which provides just cache-aligned memory (not page-aligned). If it is OK for the swiotlb area not to be page-aligned, then the xen_swiotlb_dma_supported should use (xen_io_tlb_end - (PAGE_SIZE-1)) If the memory should be in fact aligned, then the allocation must be changed to alloc_bootmem_pages() (which is the same as alloc_bootmem, but page-aligned).> > The test that fails is (from drivers/xen/swiotlb-xen.c): > xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) > { > return xen_virt_to_bus(xen_io_tlb_end - 1)<= mask; > } > > > "xen_virt_to_bus" in turn: > static dma_addr_t xen_virt_to_bus(void *address) > { > return xen_phys_to_bus(virt_to_phys(address)); > } > > > Now, "virt_to_phys" (out of arch/x86/include/asm/io.h) is defined as follows but carries a comment bothersome to this usage of it as we''re, basically, dealing with a dma "transfer" and calling from a device driver: > /** > * virt_to_phys - map virtual addresses to physical > * @address: address to remap > * > * The returned physical address is the physical (CPU) mapping for > * the memory address given. It is only valid to use this function on > * addresses directly mapped or allocated via kmalloc. > * > * This function does not give bus mappings for DMA transfers. In > * almost all conceivable cases a device driver should not be using > * this function > */ > static inline phys_addr_t virt_to_phys(volatile void *address) > { > return __pa(address); > } > > > Going a couple of steps further, "__pa" is defined (in arch/x86/include/asm/page.h) as: > #define __pa(x) __phys_addr((unsigned long)(x)) > > > and "__phys_addr" (in arch/x86/mm/physaddr.c) as: > unsigned long __phys_addr(unsigned long x) > { > if (x>= __START_KERNEL_map) { > x -= __START_KERNEL_map; > VIRTUAL_BUG_ON(x>= KERNEL_IMAGE_SIZE); > x += phys_base; > } else { > VIRTUAL_BUG_ON(x< PAGE_OFFSET); > x -= PAGE_OFFSET; > VIRTUAL_BUG_ON(!phys_addr_valid(x)); > } > return x; > } > EXPORT_SYMBOL(__phys_addr); > > > So, if "virt_to_phys" isn''t yielding a valid test of whether addresses will be reachable from a PCI device, what''s the correct way to test it and should xen_swiotlb_dma_supported be updated to the correct way (I think so) or a new function be created? > > Neal > > > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, December 14, 2011 1:20 AM > To: Taylor, Neal E > Cc: Kalev, Leonid; Konrad Rzeszutek Wilk; Tushar N Dave; xen-devel > Subject: Re: [Xen-devel] Buffers not reachable by PCI > >>>> On 14.12.11 at 01:38, "Taylor, Neal E"<Neal.Taylor@ca.com> wrote: >> [ 0.000000] MFN 0x7f7f->0x7f00 > > This is clearly indicating the last chunk ends well below the 4G boundary. > >> [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 >> [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 >> [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00 > > Consequently, the question is how you got to this value, or what > changed between the first and last quoted printouts. > > Jan >-- Leonid Kalev CA Technologies Principal Software Engineer Tel: +972 4 825 3952 Mobile: +972 54 4631508 Leonid.Kalev@ca.com
On Wed, Dec 14, 2011 at 09:20:10AM +0000, Jan Beulich wrote:> >>> On 14.12.11 at 01:38, "Taylor, Neal E" <Neal.Taylor@ca.com> wrote: > > [ 0.000000] MFN 0x7f7f->0x7f00 > > This is clearly indicating the last chunk ends well below the 4G boundary. > > > [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 > > [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 > > [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00 > > Consequently, the question is how you got to this value, or what > changed between the first and last quoted printouts.<nods> Neal, I would also strongly recommend you try v3.0.6 - as there are a couple of important fixes in it: 310fef9 xen/e820: if there is no dom0_mem=, don''t tweak extra_pages. 0208b80 xen: use maximum reservation to limit amount of usable RAM d63c8a0 mm: sync vmalloc address space page tables in alloc_vm_area() 0b129e1 xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead. 1f51b5d xen: x86_32: do not enable iterrupts when returning from exception in interrupt context Especially the dom0_mem - which I think you are using but the values are not latching on. This is seperate from the issue you are hitting but I do wonder if they might have an impact.
On Wed, Dec 14, 2011 at 06:42:07PM +0000, Kalev, Leonid wrote:> On 12/14/2011 06:42 PM, Taylor, Neal E wrote: > > The last quoted printout is calculated the same way as the test that fails which leads me to question the computation''s validity. > > The computation validity seems OK to me (the ''phys'' address is, as correctly stated > in the comments, not suitable for DMA, but it is first translated via > xen_phys_to_bus, which gives the machine address - and on x86 that is also the bus > address) > > I have a STUPID question, though: why are the start and end addresses of the swiotlb > memory area not page-aligned???That is not a stupid question.> > The io_tlb_end address is one byte PAST the valid area, which is why the > xen_swiotlb_dma_supported function uses (xen_io_tlb_end-1). However, this will work > properly only if the value is page-aligned. If it isn''t, then decrementing it by one > will keep the value in the same page (which is one page past the last valid one). > > The function that allocates the memory uses alloc_bootmem(), which provides just > cache-aligned memory (not page-aligned).Yup. It is actually funny (sad?), b/c I am the committer for the e79f86b2ef9c0a8c47225217c1018b7d3d90101c which adds something like this: alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int) in the swiotlb code but I completly missed doing it for the Xen SWIOTLB. <sigh> I think this patch: From 47409eecc08effe20fc4aa0da899dd6ac475cb0b Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Wed, 14 Dec 2011 20:48:01 -0500 Subject: [PATCH] xen/swiotlb: Use page alignment for early buffer allocation. This piggybacks on git commit e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment for early buffer allocation" which: "We could call free_bootmem_late() if swiotlb is not used, and it will shrink to page alignment. So alloc them with page alignment at first, to avoid lose two pages" Reported-by: "Kalev, Leonid" <Leonid.Kalev@ca.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/swiotlb-xen.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 8e964b9..5c8e445 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -166,7 +166,8 @@ retry: /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes)); + if (!xen_io_tlb_start) { m = "Cannot allocate Xen-SWIOTLB buffer!\n"; goto error; @@ -179,7 +180,7 @@ retry: bytes, xen_io_tlb_nslabs); if (rc) { - free_bootmem(__pa(xen_io_tlb_start), bytes); + free_bootmem(__pa(xen_io_tlb_start), PAGE_ALIGN(bytes)); m = "Failed to get contiguous memory for DMA from Xen!\n"\ "You either: don''t have the permissions, do not have"\ " enough free memory under 4GB, or the hypervisor memory"\ -- 1.7.1 is in order.> > If it is OK for the swiotlb area not to be page-aligned, then the > xen_swiotlb_dma_supported should use (xen_io_tlb_end - (PAGE_SIZE-1))Lets make it page aligned.> > If the memory should be in fact aligned, then the allocation must be changed to > alloc_bootmem_pages() (which is the same as alloc_bootmem, but page-aligned).
At Leonid''s suggestion (and provision of a patch) I tried it with page aligned memory and it works. Is this "the" real solution or just a change that happens to make things work for the case I have? The patch I used is: --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -162,7 +162,7 @@ /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_pages(bytes); if (!xen_io_tlb_start) panic("Cannot allocate SWIOTLB buffer"); The test that was failing now passes: finding the "bus" address of 7f3ffff to be within a 32-bit mask and the last few MFN entries are: [ 0.000000] MFN 0x20ff->0x2080 [ 0.000000] MFN 0x20bf->0x2040 [ 0.000000] MFN 0x207f->0x2000 [ 0.000000] MFN 0x203f->0x7fc0 [ 0.000000] MFN 0x7fff->0x7f80 [ 0.000000] MFN 0x7fbf->0x7f40 [ 0.000000] MFN 0x7f7f->0x7f00 [ 0.000000] Placing 64MB software IO TLB between d832c000 - dc32c000 [ 0.000000] software IO TLB at phys 0x1832c000 - 0x1c32c000 [ 0.000000] software IO TLB at bus 0x1c0000 - 0x120fdf000 (This last address, 0x120fdf000, hasn''t been through the final "phys_to_machine" translation. My error. Sorry.) Neal -----Original Message----- From: Kalev, Leonid Sent: Wednesday, December 14, 2011 10:42 AM To: Taylor, Neal E Cc: Jan Beulich; Konrad Rzeszutek Wilk; Tushar N Dave; xen-devel Subject: Re: [Xen-devel] Buffers not reachable by PCI On 12/14/2011 06:42 PM, Taylor, Neal E wrote:> The last quoted printout is calculated the same way as the test that fails which leads me to question the computation''s validity.The computation validity seems OK to me (the ''phys'' address is, as correctly stated in the comments, not suitable for DMA, but it is first translated via xen_phys_to_bus, which gives the machine address - and on x86 that is also the bus address) I have a STUPID question, though: why are the start and end addresses of the swiotlb memory area not page-aligned??? The io_tlb_end address is one byte PAST the valid area, which is why the xen_swiotlb_dma_supported function uses (xen_io_tlb_end-1). However, this will work properly only if the value is page-aligned. If it isn''t, then decrementing it by one will keep the value in the same page (which is one page past the last valid one). The function that allocates the memory uses alloc_bootmem(), which provides just cache-aligned memory (not page-aligned). If it is OK for the swiotlb area not to be page-aligned, then the xen_swiotlb_dma_supported should use (xen_io_tlb_end - (PAGE_SIZE-1)) If the memory should be in fact aligned, then the allocation must be changed to alloc_bootmem_pages() (which is the same as alloc_bootmem, but page-aligned).> > The test that fails is (from drivers/xen/swiotlb-xen.c): > xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) > { > return xen_virt_to_bus(xen_io_tlb_end - 1)<= mask; > } > > > "xen_virt_to_bus" in turn: > static dma_addr_t xen_virt_to_bus(void *address) > { > return xen_phys_to_bus(virt_to_phys(address)); > } > > > Now, "virt_to_phys" (out of arch/x86/include/asm/io.h) is defined as follows but carries a comment bothersome to this usage of it as we''re, basically, dealing with a dma "transfer" and calling from a device driver: > /** > * virt_to_phys - map virtual addresses to physical > * @address: address to remap > * > * The returned physical address is the physical (CPU) mapping for > * the memory address given. It is only valid to use this function on > * addresses directly mapped or allocated via kmalloc. > * > * This function does not give bus mappings for DMA transfers. In > * almost all conceivable cases a device driver should not be using > * this function > */ > static inline phys_addr_t virt_to_phys(volatile void *address) > { > return __pa(address); > } > > > Going a couple of steps further, "__pa" is defined (in arch/x86/include/asm/page.h) as: > #define __pa(x) __phys_addr((unsigned long)(x)) > > > and "__phys_addr" (in arch/x86/mm/physaddr.c) as: > unsigned long __phys_addr(unsigned long x) > { > if (x>= __START_KERNEL_map) { > x -= __START_KERNEL_map; > VIRTUAL_BUG_ON(x>= KERNEL_IMAGE_SIZE); > x += phys_base; > } else { > VIRTUAL_BUG_ON(x< PAGE_OFFSET); > x -= PAGE_OFFSET; > VIRTUAL_BUG_ON(!phys_addr_valid(x)); > } > return x; > } > EXPORT_SYMBOL(__phys_addr); > > > So, if "virt_to_phys" isn''t yielding a valid test of whether addresses will be reachable from a PCI device, what''s the correct way to test it and should xen_swiotlb_dma_supported be updated to the correct way (I think so) or a new function be created? > > Neal > > > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, December 14, 2011 1:20 AM > To: Taylor, Neal E > Cc: Kalev, Leonid; Konrad Rzeszutek Wilk; Tushar N Dave; xen-devel > Subject: Re: [Xen-devel] Buffers not reachable by PCI > >>>> On 14.12.11 at 01:38, "Taylor, Neal E"<Neal.Taylor@ca.com> wrote: >> [ 0.000000] MFN 0x7f7f->0x7f00 > > This is clearly indicating the last chunk ends well below the 4G boundary. > >> [ 0.000000] Placing 64MB software IO TLB between d832cf00 - dc32cf00 >> [ 0.000000] software IO TLB at phys 0x1832cf00 - 0x1c32cf00 >> [ 0.000000] software IO TLB at bus 0x1c0f00 - 0x120fdff00 > > Consequently, the question is how you got to this value, or what > changed between the first and last quoted printouts. > > Jan >-- Leonid Kalev CA Technologies Principal Software Engineer Tel: +972 4 825 3952 Mobile: +972 54 4631508 Leonid.Kalev@ca.com
Missed this when I was composing the previous e-mail. Your answer is here. Thank you. Neal -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.r.wilk@gmail.com] On Behalf Of Konrad Rzeszutek Wilk Sent: Wednesday, December 14, 2011 6:00 PM To: Kalev, Leonid; konrad.wilk@oracle.com Cc: Taylor, Neal E; Jan Beulich; Tushar N Dave; xen-devel Subject: Re: [Xen-devel] Buffers not reachable by PCI On Wed, Dec 14, 2011 at 06:42:07PM +0000, Kalev, Leonid wrote:> On 12/14/2011 06:42 PM, Taylor, Neal E wrote: > > The last quoted printout is calculated the same way as the test that fails which leads me to question the computation''s validity. > > The computation validity seems OK to me (the ''phys'' address is, as correctly stated > in the comments, not suitable for DMA, but it is first translated via > xen_phys_to_bus, which gives the machine address - and on x86 that is also the bus > address) > > I have a STUPID question, though: why are the start and end addresses of the swiotlb > memory area not page-aligned???That is not a stupid question.> > The io_tlb_end address is one byte PAST the valid area, which is why the > xen_swiotlb_dma_supported function uses (xen_io_tlb_end-1). However, this will work > properly only if the value is page-aligned. If it isn''t, then decrementing it by one > will keep the value in the same page (which is one page past the last valid one). > > The function that allocates the memory uses alloc_bootmem(), which provides just > cache-aligned memory (not page-aligned).Yup. It is actually funny (sad?), b/c I am the committer for the e79f86b2ef9c0a8c47225217c1018b7d3d90101c which adds something like this: alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int) in the swiotlb code but I completly missed doing it for the Xen SWIOTLB. <sigh> I think this patch: From 47409eecc08effe20fc4aa0da899dd6ac475cb0b Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Date: Wed, 14 Dec 2011 20:48:01 -0500 Subject: [PATCH] xen/swiotlb: Use page alignment for early buffer allocation. This piggybacks on git commit e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment for early buffer allocation" which: "We could call free_bootmem_late() if swiotlb is not used, and it will shrink to page alignment. So alloc them with page alignment at first, to avoid lose two pages" Reported-by: "Kalev, Leonid" <Leonid.Kalev@ca.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/swiotlb-xen.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 8e964b9..5c8e445 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -166,7 +166,8 @@ retry: /* * Get IO TLB memory from any location. */ - xen_io_tlb_start = alloc_bootmem(bytes); + xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes)); + if (!xen_io_tlb_start) { m = "Cannot allocate Xen-SWIOTLB buffer!\n"; goto error; @@ -179,7 +180,7 @@ retry: bytes, xen_io_tlb_nslabs); if (rc) { - free_bootmem(__pa(xen_io_tlb_start), bytes); + free_bootmem(__pa(xen_io_tlb_start), PAGE_ALIGN(bytes)); m = "Failed to get contiguous memory for DMA from Xen!\n"\ "You either: don''t have the permissions, do not have"\ " enough free memory under 4GB, or the hypervisor memory"\ -- 1.7.1 is in order.> > If it is OK for the swiotlb area not to be page-aligned, then the > xen_swiotlb_dma_supported should use (xen_io_tlb_end - (PAGE_SIZE-1))Lets make it page aligned.> > If the memory should be in fact aligned, then the allocation must be changed to > alloc_bootmem_pages() (which is the same as alloc_bootmem, but page-aligned).
On Thu, Dec 15, 2011 at 02:19:55AM +0000, Taylor, Neal E wrote:> Missed this when I was composing the previous e-mail. Your answer is here. Thank you. >Can I put ''Tested-by: Neal ..'' on this patch:> >From 47409eecc08effe20fc4aa0da899dd6ac475cb0b Mon Sep 17 00:00:00 2001 > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Date: Wed, 14 Dec 2011 20:48:01 -0500 > Subject: [PATCH] xen/swiotlb: Use page alignment for early buffer allocation. > > This piggybacks on git commit e79f86b2ef9c0a8c47225217c1018b7d3d90101c > "swiotlb: Use page alignment for early buffer allocation" which: > > "We could call free_bootmem_late() if swiotlb is not used, and > it will shrink to page alignment. > > So alloc them with page alignment at first, to avoid lose two pages" > > Reported-by: "Kalev, Leonid" <Leonid.Kalev@ca.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > drivers/xen/swiotlb-xen.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 8e964b9..5c8e445 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -166,7 +166,8 @@ retry: > /* > * Get IO TLB memory from any location. > */ > - xen_io_tlb_start = alloc_bootmem(bytes); > + xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes)); > + > if (!xen_io_tlb_start) { > m = "Cannot allocate Xen-SWIOTLB buffer!\n"; > goto error; > @@ -179,7 +180,7 @@ retry: > bytes, > xen_io_tlb_nslabs); > if (rc) { > - free_bootmem(__pa(xen_io_tlb_start), bytes); > + free_bootmem(__pa(xen_io_tlb_start), PAGE_ALIGN(bytes)); > m = "Failed to get contiguous memory for DMA from Xen!\n"\ > "You either: don''t have the permissions, do not have"\ > " enough free memory under 4GB, or the hypervisor memory"\ > -- > 1.7.1
Sure! But only if 3.0.4 is recent enough. However, that version does not have the code for the "free_bootmem" section of this patch so only the first line was actually tested. (Not that I could test the negative case if the code existed.) I''ll test on a more recent version (your pick) tomorrow, should you prefer. Neal -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] Sent: Wednesday, December 14, 2011 6:30 PM To: Taylor, Neal E Cc: Kalev, Leonid; konrad.wilk@oracle.com; xen-devel; Tushar N Dave; Jan Beulich Subject: Re: [Xen-devel] Buffers not reachable by PCI On Thu, Dec 15, 2011 at 02:19:55AM +0000, Taylor, Neal E wrote:> Missed this when I was composing the previous e-mail. Your answer is here. Thank you. >Can I put ''Tested-by: Neal ..'' on this patch:> >From 47409eecc08effe20fc4aa0da899dd6ac475cb0b Mon Sep 17 00:00:00 2001 > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Date: Wed, 14 Dec 2011 20:48:01 -0500 > Subject: [PATCH] xen/swiotlb: Use page alignment for early buffer allocation. > > This piggybacks on git commit e79f86b2ef9c0a8c47225217c1018b7d3d90101c > "swiotlb: Use page alignment for early buffer allocation" which: > > "We could call free_bootmem_late() if swiotlb is not used, and > it will shrink to page alignment. > > So alloc them with page alignment at first, to avoid lose two pages" > > Reported-by: "Kalev, Leonid" <Leonid.Kalev@ca.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > drivers/xen/swiotlb-xen.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 8e964b9..5c8e445 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -166,7 +166,8 @@ retry: > /* > * Get IO TLB memory from any location. > */ > - xen_io_tlb_start = alloc_bootmem(bytes); > + xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes)); > + > if (!xen_io_tlb_start) { > m = "Cannot allocate Xen-SWIOTLB buffer!\n"; > goto error; > @@ -179,7 +180,7 @@ retry: > bytes, > xen_io_tlb_nslabs); > if (rc) { > - free_bootmem(__pa(xen_io_tlb_start), bytes); > + free_bootmem(__pa(xen_io_tlb_start), PAGE_ALIGN(bytes)); > m = "Failed to get contiguous memory for DMA from Xen!\n"\ > "You either: don''t have the permissions, do not have"\ > " enough free memory under 4GB, or the hypervisor memory"\ > -- > 1.7.1
On Thu, Dec 15, 2011 at 02:40:32AM +0000, Taylor, Neal E wrote:> Sure! But only if 3.0.4 is recent enough. However, that version does not have the code for the "free_bootmem" section of this patch so only the first line was actually tested. (Not that I could test the negative case if the code existed.) > > I''ll test on a more recent version (your pick) tomorrow, should you prefer.Just as long as your modified code has this: alloc_bootmem_pages(PAGE_ALIGN(bytes)); then that is great! (The free_bootmem we can ignore for the 3.0.4 kernel).
The modified code has that. It''s tested. It works. Neal -----Original Message----- From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] Sent: Thursday, December 15, 2011 8:12 AM To: Taylor, Neal E Cc: Konrad Rzeszutek Wilk; Kalev, Leonid; xen-devel; Tushar N Dave; Jan Beulich Subject: Re: [Xen-devel] Buffers not reachable by PCI On Thu, Dec 15, 2011 at 02:40:32AM +0000, Taylor, Neal E wrote:> Sure! But only if 3.0.4 is recent enough. However, that version does not have the code for the "free_bootmem" section of this patch so only the first line was actually tested. (Not that I could test the negative case if the code existed.) > > I''ll test on a more recent version (your pick) tomorrow, should you prefer.Just as long as your modified code has this: alloc_bootmem_pages(PAGE_ALIGN(bytes)); then that is great! (The free_bootmem we can ignore for the 3.0.4 kernel).