Maarten Lankhorst
2014-Jun-24 12:27 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
op 24-06-14 14:23, Alexandre Courbot schreef:> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: >>>>> From: Lucas Stach <dev at lynxeye.de> >>>>> >>>>> On architectures for which access to GPU memory is non-coherent, >>>>> caches need to be flushed and invalidated explicitly at the >>>>> appropriate places. Introduce two small helpers to make things >>>>> easy for TTM-based drivers. >>>> >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, >>>> and I recommend that you do. >>> >>> # cat /sys/kernel/debug/dma-api/error_count >>> 162621 >>> >>> (??????? ???) >> >> *puts table back on its feet* >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot >> use the DMA API to sync it. Thanks Russell for pointing it out. >> >> The only alternative I see here is to flush the CPU caches when syncing for >> the device, and invalidate them for the other direction. Of course if the >> device has caches on its side as well the opposite operation must also be >> done for it. Guess the only way is to handle it all by ourselves here. :/ > ... and it really sucks. Basically if we cannot use the DMA API here > we will lose the convenience of having a portable API that does just > the right thing for the underlying platform. Without it we would have > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only > have support for ARM. > > The usage of the DMA API that we are doing might be illegal, but in > essence it does exactly what we need - at least for ARM. What are the > alternatives?Convert TTM to use the dma api? :-) ~Maarten
Lucas Stach
2014-Jun-24 13:25 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:> op 24-06-14 14:23, Alexandre Courbot schreef: > > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: > >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: > >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: > >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: > >>>>> From: Lucas Stach <dev at lynxeye.de> > >>>>> > >>>>> On architectures for which access to GPU memory is non-coherent, > >>>>> caches need to be flushed and invalidated explicitly at the > >>>>> appropriate places. Introduce two small helpers to make things > >>>>> easy for TTM-based drivers. > >>>> > >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, > >>>> and I recommend that you do. > >>> > >>> # cat /sys/kernel/debug/dma-api/error_count > >>> 162621 > >>> > >>> (??????? ???) > >> > >> *puts table back on its feet* > >> > >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot > >> use the DMA API to sync it. Thanks Russell for pointing it out. > >> > >> The only alternative I see here is to flush the CPU caches when syncing for > >> the device, and invalidate them for the other direction. Of course if the > >> device has caches on its side as well the opposite operation must also be > >> done for it. Guess the only way is to handle it all by ourselves here. :/ > > ... and it really sucks. Basically if we cannot use the DMA API here > > we will lose the convenience of having a portable API that does just > > the right thing for the underlying platform. Without it we would have > > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only > > have support for ARM. > > > > The usage of the DMA API that we are doing might be illegal, but in > > essence it does exactly what we need - at least for ARM. What are the > > alternatives? > Convert TTM to use the dma api? :-)Actually TTM already has a page alloc backend using the DMA API. It's just not used for the standard case right now. I would argue that we should just use this page allocator (which has the side effect of getting pages from CMA if available -> you are actually free to change the caching) and do away with the other allocator in the ARM case. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ |
Alexandre Courbot
2014-Jun-24 13:52 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach at pengutronix.de> wrote:> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst: >> op 24-06-14 14:23, Alexandre Courbot schreef: >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: >> >>>>> From: Lucas Stach <dev at lynxeye.de> >> >>>>> >> >>>>> On architectures for which access to GPU memory is non-coherent, >> >>>>> caches need to be flushed and invalidated explicitly at the >> >>>>> appropriate places. Introduce two small helpers to make things >> >>>>> easy for TTM-based drivers. >> >>>> >> >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, >> >>>> and I recommend that you do. >> >>> >> >>> # cat /sys/kernel/debug/dma-api/error_count >> >>> 162621 >> >>> >> >>> (??????? ???) >> >> >> >> *puts table back on its feet* >> >> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot >> >> use the DMA API to sync it. Thanks Russell for pointing it out. >> >> >> >> The only alternative I see here is to flush the CPU caches when syncing for >> >> the device, and invalidate them for the other direction. Of course if the >> >> device has caches on its side as well the opposite operation must also be >> >> done for it. Guess the only way is to handle it all by ourselves here. :/ >> > ... and it really sucks. Basically if we cannot use the DMA API here >> > we will lose the convenience of having a portable API that does just >> > the right thing for the underlying platform. Without it we would have >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only >> > have support for ARM. >> > >> > The usage of the DMA API that we are doing might be illegal, but in >> > essence it does exactly what we need - at least for ARM. What are the >> > alternatives? >> Convert TTM to use the dma api? :-) > > Actually TTM already has a page alloc backend using the DMA API. It's > just not used for the standard case right now.Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is set apparently.> I would argue that we should just use this page allocator (which has the > side effect of getting pages from CMA if available -> you are actually > free to change the caching) and do away with the other allocator in the > ARM case.Mm? Does it mean that CMA memory is not mapped into lowmem? That would certainly help in the present case, but I wonder how useful it will be once the iommu support is in place. Will also need to consider performance of such coherent memory for e.g. user-space mappings. Anyway, I will experiment a bit with this tomorrow, thanks!
Stéphane Marchesin
2014-Jun-25 04:00 UTC
[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach at pengutronix.de> wrote:> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst: >> op 24-06-14 14:23, Alexandre Courbot schreef: >> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at nvidia.com> wrote: >> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote: >> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote: >> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote: >> >>>>> From: Lucas Stach <dev at lynxeye.de> >> >>>>> >> >>>>> On architectures for which access to GPU memory is non-coherent, >> >>>>> caches need to be flushed and invalidated explicitly at the >> >>>>> appropriate places. Introduce two small helpers to make things >> >>>>> easy for TTM-based drivers. >> >>>> >> >>>> Have you run this with DMA API debugging enabled? I suspect you haven't, >> >>>> and I recommend that you do. >> >>> >> >>> # cat /sys/kernel/debug/dma-api/error_count >> >>> 162621 >> >>> >> >>> (??????? ???) >> >> >> >> *puts table back on its feet* >> >> >> >> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot >> >> use the DMA API to sync it. Thanks Russell for pointing it out. >> >> >> >> The only alternative I see here is to flush the CPU caches when syncing for >> >> the device, and invalidate them for the other direction. Of course if the >> >> device has caches on its side as well the opposite operation must also be >> >> done for it. Guess the only way is to handle it all by ourselves here. :/ >> > ... and it really sucks. Basically if we cannot use the DMA API here >> > we will lose the convenience of having a portable API that does just >> > the right thing for the underlying platform. Without it we would have >> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only >> > have support for ARM. >> > >> > The usage of the DMA API that we are doing might be illegal, but in >> > essence it does exactly what we need - at least for ARM. What are the >> > alternatives? >> Convert TTM to use the dma api? :-) > > Actually TTM already has a page alloc backend using the DMA API. It's > just not used for the standard case right now. > > I would argue that we should just use this page allocator (which has the > side effect of getting pages from CMA if available -> you are actually > free to change the caching) and do away with the other allocator in the > ARM case.CMA comes with its own set of (severe) limitations though, in particular it's not possible to map arbitrary CPU pages into the GPU without incurring a copy, you add arbitrary memory limits etc. Overall that's not really a good pick for the long term... St?phane
Apparently Analagous Threads
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers
- [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers