It seems the problems running mthca in a Xen domU have uncovered a bug in mthca: mthca uses dma_sync_single in mthca_arbel_write_mtt_seg() and mthca_arbel_map_phys_fmr() to sync the MTTs that get written. However, Documentation/DMA-API.txt says: void dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction) synchronise a single contiguous or scatter/gather mapping. All the parameters must be the same as those passed into the single mapping API. and mthca is *not* following this clear rule: it is trying to sync only a subrange of the mapping. Later on in the document, there is: void dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, unsigned long offset, size_t size, enum dma_data_direction direction) does a partial sync. starting at offset and continuing for size. You must be careful to observe the cache alignment and width when doing anything like this. You must also be extra careful about accessing memory you intend to sync partially. but that is in a section dealing with non-consistent memory so it''s not entirely clear to me whether it''s kosher to use this as mthca wants. The other alternative is to put the MTT table in coherent memory just like the MPT table. That might be the best solution I suppose... Michael, anyone else, thoughts on this? - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roland Dreier
2007-Jul-09 21:29 UTC
[Xen-devel] Re: [ofa-general] mthca use of dma_sync_single is bogus
> void> dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, > unsigned long offset, size_t size, > enum dma_data_direction direction) It seems the document has bitrotted a little, since dma_sync_single_range() doesn''t actually exist for most architectures; what is really implemented is dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device(). But assuming those are usable in our situation, they seem to be exactly what we want. I''ll try to get clarification from the DMA API experts (and also fix the documentation in the kernel). Unfortunately it seems like the kernel''s swiotlb does not implement the full DMA API so this won''t actually fix Xen :(. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> One thought is that if you *do* move to dma_sync_single_range() then> lib/swiotlb.c still needs fixing. It''s buggy in that > swiotlb_sync_single_range(dma_addr, offset) calls > swiotlb_sync_single(dma_addr+offset), and this will fail if the offset is > large enough that it ends up dereferencing a different slot index in > io_tlb_orig_addr. Yes, I realized the same thing (our emails crossed). > So, I should be able to get my swiotlb workaround fixes accepted upstream as > a genuine bug fix. :-) Yeah, seems so. > dma_sync_single_range() looks to me to be the right thing for you to be > using. But I''m not a DMA-API expert. yes, I''ll try to get confirmation from James Bottomley and/or Dave Miller that it is the right thing to do (and also fix the documentation to match what the kernel actually implements). - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
One thought is that if you *do* move to dma_sync_single_range() then lib/swiotlb.c still needs fixing. It''s buggy in that swiotlb_sync_single_range(dma_addr, offset) calls swiotlb_sync_single(dma_addr+offset), and this will fail if the offset is large enough that it ends up dereferencing a different slot index in io_tlb_orig_addr. So, I should be able to get my swiotlb workaround fixes accepted upstream as a genuine bug fix. :-) dma_sync_single_range() looks to me to be the right thing for you to be using. But I''m not a DMA-API expert. -- Keir On 9/7/07 22:16, "Roland Dreier" <rdreier@cisco.com> wrote:> It seems the problems running mthca in a Xen domU have uncovered a bug > in mthca: mthca uses dma_sync_single in mthca_arbel_write_mtt_seg() > and mthca_arbel_map_phys_fmr() to sync the MTTs that get written. > However, Documentation/DMA-API.txt says: > > void > dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size, > enum dma_data_direction direction) > > synchronise a single contiguous or scatter/gather mapping. All the > parameters must be the same as those passed into the single mapping > API. > > and mthca is *not* following this clear rule: it is trying to sync > only a subrange of the mapping. Later on in the document, there is: > > void > dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, > unsigned long offset, size_t size, > enum dma_data_direction direction) > > does a partial sync. starting at offset and continuing for size. You > must be careful to observe the cache alignment and width when doing > anything like this. You must also be extra careful about accessing > memory you intend to sync partially. > > but that is in a section dealing with non-consistent memory so it''s > not entirely clear to me whether it''s kosher to use this as mthca > wants. > > The other alternative is to put the MTT table in coherent memory just > like the MPT table. That might be the best solution I suppose... > > Michael, anyone else, thoughts on this? > > - R._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jul-09 21:36 UTC
[Xen-devel] Re: [ofa-general] mthca use of dma_sync_single is bogus
On 9/7/07 22:29, "Roland Dreier" <rdreier@cisco.com> wrote:> Unfortunately it seems like the kernel''s swiotlb does not implement > the full DMA API so this won''t actually fix Xen :(.It implements the sync_single_range_for_{cpu,device} functions. But we use our own swiotlb implementation anyway. arch/i386/kernel/swiotlb.c in a Xen-patched tree is used by both i386/xen and x64/xen. We haven''t yet merged with main lib/swiotlb.c. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael S. Tsirkin
2007-Jul-09 21:39 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
> Quoting Roland Dreier <rdreier@cisco.com>: > Subject: mthca use of dma_sync_single is bogus > > It seems the problems running mthca in a Xen domU have uncovered a bug > in mthca: mthca uses dma_sync_single in mthca_arbel_write_mtt_seg() > and mthca_arbel_map_phys_fmr() to sync the MTTs that get written. > However, Documentation/DMA-API.txt says: > > void > dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size, > enum dma_data_direction direction) > > synchronise a single contiguous or scatter/gather mapping. All the > parameters must be the same as those passed into the single mapping > API. > > and mthca is *not* following this clear rule: it is trying to sync > only a subrange of the mapping.Yes, this looks like a bug.> Later on in the document, there is: > > void > dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, > unsigned long offset, size_t size, > enum dma_data_direction direction) > > does a partial sync. starting at offset and continuing for size. You > must be careful to observe the cache alignment and width when doing > anything like this. You must also be extra careful about accessing > memory you intend to sync partially. > > but that is in a section dealing with non-consistent memory so it''s > not entirely clear to me whether it''s kosher to use this as mthca > wants.This is under Part II - Advanced dma_ usage - I don''t think it''s dealing with non-consistent memory only (e.g. dma_declare_coherent_memory is there), and this looks like a good fit. Most functions here work for both consistent and non-consistent memory... What makes you suspicious?> The other alternative is to put the MTT table in coherent memory just > like the MPT table. That might be the best solution I suppose... > > Michael, anyone else, thoughts on this?Certainly easy ... I''m concerned that MTTs need a fair amount of memory, while the amount of coherent memory might be limited. Not that non-coherent memory systems are widespread ... -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > void> > dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, > > unsigned long offset, size_t size, > > enum dma_data_direction direction) > This is under Part II - Advanced dma_ usage - I don''t think it''s dealing with > non-consistent memory only (e.g. dma_declare_coherent_memory is there), and this > looks like a good fit. Most functions here work for both consistent and > non-consistent memory... What makes you suspicious? I was suspicious because it is described between the main noncoherent API stuff and dma_cache_sync(). But I think it is probably OK. Unfortunately it is not that good a fit for our current code, since we use pci_map_sg() to do the DMA mapping on the MTT memory instead of dma_map_single(). > I''m concerned that MTTs need a fair amount of memory, > while the amount of coherent memory might be limited. > Not that non-coherent memory systems are widespread ... Yes, for example on ppc 4xx the amount of coherent memory is quite small by default (address space for non-cached mappings is actually what is limited, but it amounts to the same thing). Maybe the least bad solution is to change to using dma_map_single() instead of pci_map_sg() in mthca_memfree.c. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael S. Tsirkin
2007-Jul-10 07:15 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
> Quoting Roland Dreier <rdreier@cisco.com>: > Subject: Re: mthca use of dma_sync_single is bogus > > > > void > > > dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, > > > unsigned long offset, size_t size, > > > enum dma_data_direction direction) > > > This is under Part II - Advanced dma_ usage - I don''t think it''s dealing with > > non-consistent memory only (e.g. dma_declare_coherent_memory is there), and this > > looks like a good fit. Most functions here work for both consistent and > > non-consistent memory... What makes you suspicious? > > I was suspicious because it is described between the main noncoherent > API stuff and dma_cache_sync(). But I think it is probably OK. > > Unfortunately it is not that good a fit for our current code, since we > use pci_map_sg() to do the DMA mapping on the MTT memory instead of > dma_map_single(). > > > I''m concerned that MTTs need a fair amount of memory, > > while the amount of coherent memory might be limited. > > Not that non-coherent memory systems are widespread ... > > Yes, for example on ppc 4xx the amount of coherent memory is quite > small by default (address space for non-cached mappings is actually > what is limited, but it amounts to the same thing). > > Maybe the least bad solution is to change to using dma_map_single() > instead of pci_map_sg() in mthca_memfree.c.Hmm. What makes you think dma_sync_single_range can''t be used on memory mapped by pci_map_sg/dma_map_sg? -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Lukas Hejtmanek
2007-Jul-10 14:14 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
On Mon, Jul 09, 2007 at 11:48:06PM -0700, Roland Dreier wrote:> Yes, for example on ppc 4xx the amount of coherent memory is quite > small by default (address space for non-cached mappings is actually > what is limited, but it amounts to the same thing). > > Maybe the least bad solution is to change to using dma_map_single() > instead of pci_map_sg() in mthca_memfree.c.And what about the attached patch to mthca_memfree? It changes alloc_pages for pci_alloc_consistent. Using it, I can enable FMR and the driver runs fine. Indeed, it does not solve problem with dma_sync_single() per se, on the other hand, with pci_alloc_consistent() swiotlb is not needed thus dma_sync_single() does nothing. But I agree it is not conceptual. -- Lukáš Hejtmánek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> What makes you think dma_sync_single_range can''t be used on memory mapped> by pci_map_sg/dma_map_sg? The fact that it''s dma_sync_*SINGLE*_range, and that there''s a separate dma_sync_sg() function defined in DMA-API.txt. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael S. Tsirkin
2007-Jul-10 17:11 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
> Quoting Roland Dreier <rdreier@cisco.com>: > Subject: Re: mthca use of dma_sync_single is bogus > > > What makes you think dma_sync_single_range can''t be used on memory mapped > > by pci_map_sg/dma_map_sg? > > The fact that it''s dma_sync_*SINGLE*_range, and that there''s a > separate dma_sync_sg() function defined in DMA-API.txt.Aha. I looked at the code a bit. Basically is seems that some architectures use the dma handle and some the virtual address to flush the cache, that''s where the requirement that same parameters are used for sync single as for map single comes from. So it seems that this requirement does not apply to s/g, and that we can just build a scatterlist structure and do dma_sync_sg? -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> And what about the attached patch to mthca_memfree? It changes alloc_pages for> pci_alloc_consistent. Using it, I can enable FMR and the driver runs fine. As Michael said, this uses a lot of consistent memory. Probably too much on some systems. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roland Dreier
2007-Jul-10 18:09 UTC
[Xen-devel] Re: [ofa-general] Re: mthca use of dma_sync_single is bogus
> Aha. I looked at the code a bit.> Basically is seems that some architectures use the dma handle > and some the virtual address to flush the cache, that''s > where the requirement that same parameters are used for > sync single as for map single comes from. > > So it seems that this requirement does not apply to s/g, and that we can just > build a scatterlist structure and do dma_sync_sg? The statement synchronise a single contiguous or scatter/gather mapping. All the parameters must be the same as those passed into the single mapping API. in DMA-API.txt also is clearly attached to dma_sync_sg(). So I don''t think it''s a good idea to rely on being able to sync a different scatterlist than the one that was originally mapped. It actually doesn''t look too bad to replace our use of pci_map_sg() with dma_map_single(), at least at first glance. I''ll try to write a patch later. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael S. Tsirkin
2007-Jul-10 18:30 UTC
[Xen-devel] Re: [ofa-general] Re: mthca use of dma_sync_single is bogus
> Quoting Roland Dreier <rdreier@cisco.com>: > Subject: Re: [ofa-general] Re: mthca use of dma_sync_single is bogus > > > Aha. I looked at the code a bit. > > Basically is seems that some architectures use the dma handle > > and some the virtual address to flush the cache, that''s > > where the requirement that same parameters are used for > > sync single as for map single comes from. > > > > So it seems that this requirement does not apply to s/g, and that we can just > > build a scatterlist structure and do dma_sync_sg? > > The statement > > synchronise a single contiguous or scatter/gather mapping. All the > parameters must be the same as those passed into the single mapping > API. > > in DMA-API.txt also is clearly attached to dma_sync_sg(). So I don''t > think it''s a good idea to rely on being able to sync a different > scatterlist than the one that was originally mapped.Hmm. This means there''s no way to sync a range within mapping created with map_sg?> It actually doesn''t look too bad to replace our use of pci_map_sg() > with dma_map_single(), at least at first glance. I''ll try to write a > patch later.Well, the reason map_sg is there is presumably because on some architectures it''s worth it to try and make the region contigious in DMA space. But I agree this seems the lesser evil at this point ... -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Lukas Hejtmanek
2007-Jul-10 19:00 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
On Tue, Jul 10, 2007 at 11:06:29AM -0700, Roland Dreier wrote:> > And what about the attached patch to mthca_memfree? It changes alloc_pages > > for pci_alloc_consistent. Using it, I can enable FMR and the driver > > runs fine. > > As Michael said, this uses a lot of consistent memory. Probably too > much on some systems.I think he spoke about coherent, didn''t he? On i386/x86_64, the consistent and coherent are the same but on some architectures they are not and I think that using consistent (in particular pci_alloc_consistent) is exactly what should be used. Keir also recommended to use this one. And moreover, it avoids using swiotlb and bounce buffers, I think. Am I right, Keir? -- Lukáš Hejtmánek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I think he spoke about coherent, didn''t he? On i386/x86_64, the consistent and> coherent are the same but on some architectures they are not and I think that > using consistent (in particular pci_alloc_consistent) is exactly what should > be used. Keir also recommended to use this one. coherent and consistent are synonyms. It''s confusing because there is pci_alloc_consistent(), which is in general just a wrapper for dma_alloc_coherent(). > And moreover, it avoids using swiotlb and bounce buffers, I think. Am I right, > Keir? Yes, but I''m not really willing to make things worse for standard i386 just to make Xen work a little better. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Lukas Hejtmanek
2007-Jul-10 19:16 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
On Tue, Jul 10, 2007 at 12:08:43PM -0700, Roland Dreier wrote:> > I think he spoke about coherent, didn''t he? On i386/x86_64, the consistent and > > coherent are the same but on some architectures they are not and I think that > > using consistent (in particular pci_alloc_consistent) is exactly what should > > be used. Keir also recommended to use this one. > > coherent and consistent are synonyms. It''s confusing because there is > pci_alloc_consistent(), which is in general just a wrapper for > dma_alloc_coherent().According to DMA-mapping.txt they are not. Alpha, M68000 wihtout MMU, PPC, Sparc, Sparc64, V850 have own implementation of pci_alloc_consistent(). Yes, on i386, the pci_alloc_consistent() is just wrapper for dma_alloc_coherent().> > And moreover, it avoids using swiotlb and bounce buffers, I think. Am I right, > > Keir? > > Yes, but I''m not really willing to make things worse for standard i386 > just to make Xen work a little better.So, what about some #ifdefs ? E.g., allow config option - Xen optimizations? -- Lukáš Hejtmánek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > coherent and consistent are synonyms. It''s confusing because there is> > pci_alloc_consistent(), which is in general just a wrapper for > > dma_alloc_coherent(). > > According to DMA-mapping.txt they are not. Alpha, M68000 wihtout MMU, PPC, > Sparc, Sparc64, V850 have own implementation of pci_alloc_consistent(). > > Yes, on i386, the pci_alloc_consistent() is just wrapper for > dma_alloc_coherent(). Sorry, I was a little confusing. The implementations may be different but in general there is no real difference between consistent and coherent memory. Using either pci_alloc_consistent() or dma_alloc_coherent() will exhaust the same small pool of address space on powerpc 4xx for example. > So, what about some #ifdefs ? E.g., allow config option - Xen optimizations? Seems pretty ugly, especially given that Xen is not upstream. I think the Xen tree should just carry such patches, at least until Xen is merged. Even then I''m quite dubious about having two code paths for this. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roland Dreier
2007-Jul-10 19:25 UTC
[Xen-devel] Re: [ofa-general] Re: mthca use of dma_sync_single is bogus
> Hmm. This means there''s no way to sync a range within> mapping created with map_sg? It doesn''t seem that there is one right now at least. > > It actually doesn''t look too bad to replace our use of pci_map_sg() > > with dma_map_single(), at least at first glance. I''ll try to write a > > patch later. > > Well, the reason map_sg is there is presumably because on some > architectures it''s worth it to try and make the region contigious in DMA space. > But I agree this seems the lesser evil at this point ... Given that we''re already trying to allocate big chunks of physically contiguous memory, I think that any virtual merging we get is likely to be of very small benefit. It is kind of a shame to give this up though. - R. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Lukas Hejtmanek
2007-Jul-10 19:36 UTC
[Xen-devel] Re: mthca use of dma_sync_single is bogus
On Tue, Jul 10, 2007 at 12:24:02PM -0700, Roland Dreier wrote:> Sorry, I was a little confusing. The implementations may be different > but in general there is no real difference between consistent and > coherent memory. Using either pci_alloc_consistent() or > dma_alloc_coherent() will exhaust the same small pool of address space > on powerpc 4xx for example.I thought that consistent only refers to physically contiguous area whereas coherent refers to memory where no barrier need to be used. But I may be wrong. Anyway, with my patch, I can turn off swiotlb and I''m still able to load ib_mthca cleanly in DomU. On the other hand, Xen bug me about DMA bug in ib_ipoib, there may be another problem with dma_sync_single().> Seems pretty ugly, especially given that Xen is not upstream. I think > the Xen tree should just carry such patches, at least until Xen is > merged. Even then I''m quite dubious about having two code paths for this.OK, I will keep it for my own. -- Lukáš Hejtmánek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Michael S. Tsirkin
2007-Jul-18 13:36 UTC
[Xen-devel] Re: [ofa-general] Re: mthca use of dma_sync_single is bogus
> Quoting Roland Dreier <rdreier@cisco.com>: > Subject: Re: [ofa-general] Re: mthca use of dma_sync_single is bogus > > > Hmm. This means there''s no way to sync a range within > > mapping created with map_sg? > > It doesn''t seem that there is one right now at least. > > > > It actually doesn''t look too bad to replace our use of pci_map_sg() > > > with dma_map_single(), at least at first glance. I''ll try to write a > > > patch later. > > > > Well, the reason map_sg is there is presumably because on some > > architectures it''s worth it to try and make the region contigious in DMA space. > > But I agree this seems the lesser evil at this point ... > > Given that we''re already trying to allocate big chunks of physically > contiguous memory, I think that any virtual merging we get is likely > to be of very small benefit. > > It is kind of a shame to give this up though.Did we reach any conclusion? Are you switching to map_single? -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Roland Dreier
2007-Jul-18 15:12 UTC
[Xen-devel] Re: [ofa-general] Re: mthca use of dma_sync_single is bogus
> Did we reach any conclusion? Are you switching to map_single?haven''t had a chance to work on it yet, but I don''t see a better alternative. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel