thr3ads.net - Nouveau - [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Maarten Lankhorst

2014-Jun-24 12:27 UTC

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

op 24-06-14 14:23, Alexandre Courbot schreef:> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at
nvidia.com> wrote:
>> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot
wrote:
>>>>> From: Lucas Stach <dev at lynxeye.de>
>>>>>
>>>>> On architectures for which access to GPU memory is
non-coherent,
>>>>> caches need to be flushed and invalidated explicitly at the
>>>>> appropriate places. Introduce two small helpers to make
things
>>>>> easy for TTM-based drivers.
>>>>
>>>> Have you run this with DMA API debugging enabled?  I suspect
you haven't,
>>>> and I recommend that you do.
>>>
>>> # cat /sys/kernel/debug/dma-api/error_count
>>> 162621
>>>
>>> (??????? ???)
>>
>> *puts table back on its feet*
>>
>> So, yeah - TTM memory is not allocated using the DMA API, hence we
cannot
>> use the DMA API to sync it. Thanks Russell for pointing it out.
>>
>> The only alternative I see here is to flush the CPU caches when syncing
for
>> the device, and invalidate them for the other direction. Of course if
the
>> device has caches on its side as well the opposite operation must also
be
>> done for it. Guess the only way is to handle it all by ourselves here.
:/
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
>
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?Convert TTM to use the dma api? :-)

~Maarten

Lucas Stach

2014-Jun-24 13:25 UTC

head link

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten
Lankhorst:> op 24-06-14 14:23, Alexandre Courbot schreef:
> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at
nvidia.com> wrote:
> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre
Courbot wrote:
> >>>>> From: Lucas Stach <dev at lynxeye.de>
> >>>>>
> >>>>> On architectures for which access to GPU memory is
non-coherent,
> >>>>> caches need to be flushed and invalidated explicitly
at the
> >>>>> appropriate places. Introduce two small helpers to
make things
> >>>>> easy for TTM-based drivers.
> >>>>
> >>>> Have you run this with DMA API debugging enabled?  I
suspect you haven't,
> >>>> and I recommend that you do.
> >>>
> >>> # cat /sys/kernel/debug/dma-api/error_count
> >>> 162621
> >>>
> >>> (??????? ???)
> >>
> >> *puts table back on its feet*
> >>
> >> So, yeah - TTM memory is not allocated using the DMA API, hence we
cannot
> >> use the DMA API to sync it. Thanks Russell for pointing it out.
> >>
> >> The only alternative I see here is to flush the CPU caches when
syncing for
> >> the device, and invalidate them for the other direction. Of course
if the
> >> device has caches on its side as well the opposite operation must
also be
> >> done for it. Guess the only way is to handle it all by ourselves
here. :/
> > ... and it really sucks. Basically if we cannot use the DMA API here
> > we will lose the convenience of having a portable API that does just
> > the right thing for the underlying platform. Without it we would have
> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> > have support for ARM.
> >
> > The usage of the DMA API that we are doing might be illegal, but in
> > essence it does exactly what we need - at least for ARM. What are the
> > alternatives?
> Convert TTM to use the dma api? :-)
Actually TTM already has a page alloc backend using the DMA API. It's
just not used for the standard case right now.

I would argue that we should just use this page allocator (which has the
side effect of getting pages from CMA if available -> you are actually
free to change the caching) and do away with the other allocator in the
ARM case.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

Alexandre Courbot

2014-Jun-24 13:52 UTC

head link

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

On Tue, Jun 24, 2014 at 10:25 PM, Lucas Stach <l.stach at pengutronix.de>
wrote:> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at
nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre
Courbot wrote:
>> >>>>> From: Lucas Stach <dev at lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is
non-coherent,
>> >>>>> caches need to be flushed and invalidated
explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to
make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I
suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (??????? ???)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API,
hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it
out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches
when syncing for
>> >> the device, and invalidate them for the other direction. Of
course if the
>> >> device has caches on its side as well the opposite operation
must also be
>> >> done for it. Guess the only way is to handle it all by
ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API
here
>> > we will lose the convenience of having a portable API that does
just
>> > the right thing for the underlying platform. Without it we would
have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would
only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but
in
>> > essence it does exactly what we need - at least for ARM. What are
the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.
Indeed, and Nouveau even already makes use of it if CONFIG_SWIOTLB is
set apparently.
> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.
Mm? Does it mean that CMA memory is not mapped into lowmem? That would
certainly help in the present case, but I wonder how useful it will be
once the iommu support is in place. Will also need to consider
performance of such coherent memory for e.g. user-space mappings.

Anyway, I will experiment a bit with this tomorrow, thanks!

Stéphane Marchesin

2014-Jun-25 04:00 UTC

head link

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

On Tue, Jun 24, 2014 at 6:25 AM, Lucas Stach <l.stach at pengutronix.de>
wrote:> Am Dienstag, den 24.06.2014, 14:27 +0200 schrieb Maarten Lankhorst:
>> op 24-06-14 14:23, Alexandre Courbot schreef:
>> > On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot <acourbot at
nvidia.com> wrote:
>> >> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>> >>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
>> >>>> On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre
Courbot wrote:
>> >>>>> From: Lucas Stach <dev at lynxeye.de>
>> >>>>>
>> >>>>> On architectures for which access to GPU memory is
non-coherent,
>> >>>>> caches need to be flushed and invalidated
explicitly at the
>> >>>>> appropriate places. Introduce two small helpers to
make things
>> >>>>> easy for TTM-based drivers.
>> >>>>
>> >>>> Have you run this with DMA API debugging enabled?  I
suspect you haven't,
>> >>>> and I recommend that you do.
>> >>>
>> >>> # cat /sys/kernel/debug/dma-api/error_count
>> >>> 162621
>> >>>
>> >>> (??????? ???)
>> >>
>> >> *puts table back on its feet*
>> >>
>> >> So, yeah - TTM memory is not allocated using the DMA API,
hence we cannot
>> >> use the DMA API to sync it. Thanks Russell for pointing it
out.
>> >>
>> >> The only alternative I see here is to flush the CPU caches
when syncing for
>> >> the device, and invalidate them for the other direction. Of
course if the
>> >> device has caches on its side as well the opposite operation
must also be
>> >> done for it. Guess the only way is to handle it all by
ourselves here. :/
>> > ... and it really sucks. Basically if we cannot use the DMA API
here
>> > we will lose the convenience of having a portable API that does
just
>> > the right thing for the underlying platform. Without it we would
have
>> > to duplicate arm_iommu_sync_single_for_cpu/device() and we would
only
>> > have support for ARM.
>> >
>> > The usage of the DMA API that we are doing might be illegal, but
in
>> > essence it does exactly what we need - at least for ARM. What are
the
>> > alternatives?
>> Convert TTM to use the dma api? :-)
>
> Actually TTM already has a page alloc backend using the DMA API. It's
> just not used for the standard case right now.
>
> I would argue that we should just use this page allocator (which has the
> side effect of getting pages from CMA if available -> you are actually
> free to change the caching) and do away with the other allocator in the
> ARM case.
CMA comes with its own set of (severe) limitations though, in
particular it's not possible to map arbitrary CPU pages into the GPU
without incurring a copy, you add arbitrary memory limits etc. Overall
that's not really a good pick for the long term...

St?phane

Possibly Parallel Threads

Search for more maybe matching threads

Nouveau - Jun 2014 - [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

Possibly Parallel Threads