Stéphane Marchesin
2014-Jun-25 04:00 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach at pengutronix.de> wrote:> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst: >> op 24-06-14 14:23, Alexandre Courbot schreef: >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: >> >>>>> From: Lucas Stach <dev at lynxeye.de> >> >>>>> >> >>>>> On architectures for which access to GPU memory is non-coherent, >> >>>>> caches need to be flushed and invalidated explicitly at the >> >>>>> appropriate places. Introduce two small helpers to make things >> >>>>> easy for TTM-based drivers. >> >>>> >> >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, >> >>>> and I recommend that you do. >> >>> >> >>> # cat /sys/kernel/debug/dma-api/error_count >> >>> 162621 >> >>> >> >>> (??????? ???) >> >> >> >> *puts table back on its feet* >> >> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot >> >> use the DMA API to sync it. Thanks Russell for pointing it out. >> >> >> >> The only alternative I see here is to flush the CPU caches when syncing for >> >> the device, and invalidate them for the other direction. Of course if the >> >> device has caches on its side as well the opposite operation must also be >> >> done for it. Guess the only way is to handle it all by ourselves here. :/ >> > ... and it really sucks. Basically if we cannot use the DMA API here >> > we will lose the convenience of having a portable API that does just >> > the right thing for the underlying platform. Without it we would have >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only >> > have support for ARM. >> > >> > The usage of the DMA API that we are doing might be illegal, but in >> > essence it does exactly what we need - at least for ARM. What are the >> > alternatives? >> Convert TTM to use the dma api? :-) > > Actually TTM already has a page alloc backend using the DMA API. It's > just not used for the standard case right now. > > I would argue that we should just use this page allocator (which has the > side effect of getting pages from CMA if available -> you are actually > free to change the caching) and do away with the other allocator in the > ARM case.CMA comes with its own set of (severe) limitations though, in particular it's not possible to map arbitrary CPU pages into the GPU without incurring a copy, you add arbitrary memory limits etc. Overall that's not really a good pick for the long term... St?phane
Alexandre Courbot
2014-Jun-26 14:53 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
On Wed, Jun 25, 2014 at 1:00 PM, St?phane Marchesin <stephane.marchesin at gmail.com> wrote:> On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach at pengutronix.de> wrote: >> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst: >>> op 24-06-14 14:23, Alexandre Courbot schreef: >>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: >>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: >>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: >>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: >>> >>>>> From: Lucas Stach <dev at lynxeye.de> >>> >>>>> >>> >>>>> On architectures for which access to GPU memory is non-coherent, >>> >>>>> caches need to be flushed and invalidated explicitly at the >>> >>>>> appropriate places. Introduce two small helpers to make things >>> >>>>> easy for TTM-based drivers. >>> >>>> >>> >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, >>> >>>> and I recommend that you do. >>> >>> >>> >>> # cat /sys/kernel/debug/dma-api/error_count >>> >>> 162621 >>> >>> >>> >>> (??????? ???) >>> >> >>> >> *puts table back on its feet* >>> >> >>> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot >>> >> use the DMA API to sync it. Thanks Russell for pointing it out. >>> >> >>> >> The only alternative I see here is to flush the CPU caches when syncing for >>> >> the device, and invalidate them for the other direction. Of course if the >>> >> device has caches on its side as well the opposite operation must also be >>> >> done for it. Guess the only way is to handle it all by ourselves here. :/ >>> > ... and it really sucks. Basically if we cannot use the DMA API here >>> > we will lose the convenience of having a portable API that does just >>> > the right thing for the underlying platform. Without it we would have >>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only >>> > have support for ARM. >>> > >>> > The usage of the DMA API that we are doing might be illegal, but in >>> > essence it does exactly what we need - at least for ARM. What are the >>> > alternatives? >>> Convert TTM to use the dma api? :-) >> >> Actually TTM already has a page alloc backend using the DMA API. It's >> just not used for the standard case right now. >> >> I would argue that we should just use this page allocator (which has the >> side effect of getting pages from CMA if available -> you are actually >> free to change the caching) and do away with the other allocator in the >> ARM case. > > CMA comes with its own set of (severe) limitations though, in > particular it's not possible to map arbitrary CPU pages into the GPU > without incurring a copy, you add arbitrary memory limits etc. Overall > that's not really a good pick for the long term...We don't plan to rely on CMA for too long. IOMMU support is on the way and should make our life easier, although no matter the source of memory, we will still have the issue of the lowmem mappings. So far it sounds like CMA is the only way to "undo" them, so in the end it may come down to whether or not the multi-mapping contraint applies to TK1. I will tap into our internal sources of knowledge to try and figure this one out.
Russell King - ARM Linux
2014-Jun-26 16:10 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
On Thu, Jun 26, 2014 at 11:53:20PM +0900, Alexandre Courbot wrote:> We don't plan to rely on CMA for too long. IOMMU support is on the way > and should make our life easier, although no matter the source of > memory, we will still have the issue of the lowmem mappings.When it comes to DMA memory, talking about lowmem vs highmem is utterly meaningless. The lowmem/highmem split is entirely a software concept and is completely adjustable. An extreme example is that you can boot any platform with more than 32MB of memory with 32MB of lowmem and the remainder as highmem. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it.
Reasonably Related Threads
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers