Hi, In current linux-2.6.18-xen.h 64-bit, it looks like we''re doing potentially large amounts of unnecessary IO bounce-buffering. I noticed this when trying to pin down some odd behaviour concerning changeset 15405 in xen-3.1-testing: http://xenbits.xensource.com/xen-3.1-testing.hg?rev/30033a637942 # User kfraser@localhost.localdomain # Date 1183985110 -3600 # Node ID 30033a6379428f57269455f0963841743c6d5e46 # Parent 9cdade953890a612734d97b3abc21513a1a9cf6d x86: dma_map_sg() must handle multi-page segments. Signed-off-by: Keir Fraser <keir@xensource.com> which adds a test when we do swiotlb_map_sg(): --- a/linux-2.6-xen-sparse/arch/i386/kernel/swiotlb.c Fri Sep 14 16:33:44 2007 +0100 +++ b/linux-2.6-xen-sparse/arch/i386/kernel/swiotlb.c Mon Jul 09 13:45:10 2007 +0100 @@ -572,7 +572,9 @@ swiotlb_map_sg(struct device *hwdev, str for (i = 0; i < nelems; i++, sg++) { dev_addr = SG_ENT_PHYS_ADDRESS(sg); - if (address_needs_mapping(hwdev, dev_addr)) { + if (range_straddles_page_boundary(page_to_pseudophys(sg->page) + + sg->offset, sg->length) + || address_needs_mapping(hwdev, dev_addr)) { buffer.page = sg->page; This is still present in linux-2.6.18-xen.hg''s lib/swiotlb-xen.c; but it seems to conflict with a different test that we use when we merge bios together for multi-page disk IO, where we have include/asm-x86_64/mach-xen/asm/io.h: #define page_to_pseudophys(page) ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT) #define page_to_phys(page) (phys_to_machine(page_to_pseudophys(page))) #define page_to_bus(page) (phys_to_machine(page_to_pseudophys(page))) #define bio_to_pseudophys(bio) (page_to_pseudophys(bio_page((bio))) + \ (unsigned long) bio_offset((bio))) #define bvec_to_pseudophys(bv) (page_to_pseudophys((bv)->bv_page) + \ (unsigned long) (bv)->bv_offset) #define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \ (((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2))) && \ ((bvec_to_pseudophys((vec1)) + (vec1)->bv_len) == \ bvec_to_pseudophys((vec2)))) This test in io.h basically means that bios can, and will, span page boundaries if the pages in question are both physical- and machine- contiguous; but the range_straddles_page_boundary test does NOT check for machine-contiguous pages (it only tests whether pages are in a xen_create_contiguous_region() region, not whether pages happen to be contiguous by chance in non-guaranteed-contig regions.) So we''ll create bios which span multiple pages, assuming that that''s more efficient; but then we''ll force them to be copied because they span multiple pages! Have I missed something, or is this an oversight in the swiotlb.c patch above? The reason I discovered this is that including the patch seems to be causing SWIOTLB overflows on some hardware. Adding an extra test to swiotlb_map_sg() to catch sg segments which are truly machine-contig and don''t need mapping, and avoid copying those, cures those symptoms for me, but I suspect there may be other issues lurking in this area. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, the swiotlb check could be relaxed. Do we really do many mergeable page-straddling I/Os in practice though? I would expect most block I/O to be scattered by the page cache. -- Keir On 14/3/08 15:50, "Stephen C. Tweedie" <sct@redhat.com> wrote:> So we''ll create bios which span multiple pages, assuming that that''s > more efficient; but then we''ll force them to be copied because they span > multiple pages! > > Have I missed something, or is this an oversight in the swiotlb.c patch > above? > > The reason I discovered this is that including the patch seems to be > causing SWIOTLB overflows on some hardware. Adding an extra test to > swiotlb_map_sg() to catch sg segments which are truly machine-contig and > don''t need mapping, and avoid copying those, cures those symptoms for > me, but I suspect there may be other issues lurking in this area._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Fri, 2008-03-14 at 16:26 +0000, Keir Fraser wrote:> Yes, the swiotlb check could be relaxed. Do we really do many mergeable > page-straddling I/Os in practice though? I would expect most block I/O to be > scattered by the page cache.I would, too --- but it appears to be a genuine issue at boot time at least, when the page cache has not yet become too fragmented. I''m still trying to understand the corner cases involved. Some hardware seems to have more trouble than others --- it could well be special-case sg segments such as DMA drain buffers which are causing the trouble for those, in which case doing the copy for every map_sg() will potentially be a significant problem. (MarkMC hit that same problem bringing the dom0 patches onto 2.6.25 which has reworked drain buffer code.) --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/3/08 16:34, "Stephen C. Tweedie" <sct@redhat.com> wrote:>> Yes, the swiotlb check could be relaxed. Do we really do many mergeable >> page-straddling I/Os in practice though? I would expect most block I/O to be >> scattered by the page cache. > > I would, too --- but it appears to be a genuine issue at boot time at > least, when the page cache has not yet become too fragmented. > > I''m still trying to understand the corner cases involved. Some > hardware seems to have more trouble than others --- it could well be > special-case sg segments such as DMA drain buffers which are causing the > trouble for those, in which case doing the copy for every map_sg() will > potentially be a significant problem. (MarkMC hit that same problem > bringing the dom0 patches onto 2.6.25 which has reworked drain buffer > code.)It certainly seems sensible to improve the page-discontiguity check. Possibly we can stick it inside page_straddles_boundary() (perhaps with a name change) and fix every caller. Also, I agree it sounds like something fishier is going on and this is perhaps only a workaround that fortuitously works in this case... :-( -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Fri, 2008-03-14 at 16:39 +0000, Keir Fraser wrote:> > I''m still trying to understand the corner cases involved. Some > > hardware seems to have more trouble than others --- it could well be > > special-case sg segments such as DMA drain buffers which are causing the > > trouble for those, in which case doing the copy for every map_sg() will > > potentially be a significant problem. (MarkMC hit that same problem > > bringing the dom0 patches onto 2.6.25 which has reworked drain buffer > > code.) > > It certainly seems sensible to improve the page-discontiguity check. > Possibly we can stick it inside page_straddles_boundary() (perhaps with a > name change) and fix every caller.Patch below does exactly that, and seems to fix the problem for me. I can no longer provoke a BUG() simply by booting with swiotlb=off, and other boxes which were showing the SWIOTLB-full regression are fixed. Also, isn''t the contig bitmap completely superfluous with this new check?> Also, I agree it sounds like something fishier is going on and this is > perhaps only a workaround that fortuitously works in this case... :-(I _think_ the new test makes things pretty solid. But I''m not sure whether it leaves us open to some corner cases that might end up with a lot of potential swiotlb traffic once we add the test. The bright side is, the only cases that may risk that are cases which are actually broken without the change, and might BUG the kernel unnecessarily if we have swiotlb=off. Tested against RHEL-5 kernels, but I''ve also checked that it applies and builds on current linux-2.6.18-xen.hg too, it''s the same code here. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel