thr3ads.net - Nouveau - [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Christian König

2014-Jul-23 06:52 UTC

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 08:40, schrieb Maarten Lankhorst:> op 22-07-14 17:59, Christian K?nig schreef:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>> <christian.koenig at amd.com> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled
and a fence->wait
>>>> function, everything else like fence->enable_signaling or
calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with
holding any global
>>>> locking primitives (like mmap_sem etc...). Holding locking
primitives local
>>>> to the driver is ok, as long as they don't conflict with
anything possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception
that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>> In this case sorry for so much noise. I really haven't looked in so
much detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's
mandatory for drivers to call fence_signaled()?
> It's only mandatory to call fence_signal() if the .enable_signaling
callback has been called, else you can get away with never calling signaling a
fence at all before dropping the last refcount to it.
> This allows you to keep interrupts disabled when you don't need them.
Can we somehow avoid the need to call fence_signal() at all? The 
interrupts at least on radeon are way to unreliable for such a thing. 
Can enable_signalling fail? What's the reason for fence_signaled() in 
the first place?
>>> Agreed that any shared locks are out of the way (especially stuff
like
>>> dev->struct_mutex or other non-strictly driver-private stuff,
i915 is
>>> really bad here still).
>> Yeah that's also an point I've wanted to note on Maartens
patch. Radeon grabs the read side of it's exclusive semaphore while waiting
for fences (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel
driver needs to wait for Radeon to finish rendering and Radeon needs to wait for
Intel to finish displaying), this might become a perfect example of locking
inversion.
> In the preliminary patches where I can sync radeon with other GPU's
I've been very careful in all the places that call into fences, to make sure
that radeon wouldn't try to handle lockups for a different (possibly also
radeon) card.
That's actually not such a good idea.

In case of a lockup we need to handle the lockup cause otherwise it 
could happen that radeon waits for the lockup to be resolved and the 
lockup handling needs to wait for a fence that's never signaled because 
of the lockup.

Christian.
>
> This is also why fence_is_signaled should never block, and why it trylocks
the exclusive_lock. :-) I think lockdep would complain if I grabbed
exclusive_lock while blocking in is_signaled.
>
>>> So from the core fence framework I think we already have exactly
this,
>>> and we only need to adjust the radeon implementation a bit to make
it
>>> less risky and invasive to the radeon driver logic.
>> Agree. Well the biggest problem I see is that exclusive semaphore I
need to take when anything calls into the driver. For the fence code I need to
move that down into the fence->signaled handler, cause that now can be called
from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where
we grab the write side of the exclusive lock) that all interrupts are already
enabled, so that fence->signaled hopefully wouldn't mess with the
hardware at all. While this probably works, it just leaves me with a feeling
that we are doing something wrong here.
> There is unfortunately no global mechanism to say 'this card is locked
up, please don't call into any of my fences', and I don't associate
fences with devices, and radeon doesn't keep a global list of fences.
> If all of that existed, it would complicate the interface and its callers a
lot, while I like to keep things simple.
> So I did the best I could, and simply prevented the fence calls from
fiddling with the hardware. Fortunately gpu lockup is not a common operation.
:-)
>
> ~Maarten
>
>

Daniel Vetter

2014-Jul-23 07:02 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 8:52 AM, Christian K?nig
<christian.koenig at amd.com> wrote:>> In the preliminary patches where I can sync radeon with other GPU's
I've
>> been very careful in all the places that call into fences, to make sure
that
>> radeon wouldn't try to handle lockups for a different (possibly
also radeon)
>> card.
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
> happen that radeon waits for the lockup to be resolved and the lockup
> handling needs to wait for a fence that's never signaled because of the
> lockup.
I thought the plan for now is that each driver handles lookups
themselfs for now. So if any batch gets stuck for too long (whether
it's our own gpu that's stuck or whether we're somehow stuck on a
fence from a 2nd gpu doesn't matter) the driver steps in with a reset
and signals completion to all its own fences that have been in that
pile-up. As long as each driver participating in fencing has means to
abort/reset we'll eventually get unstuck.

Essentially every driver has to guarantee that assuming dependent
fences all complete eventually that it _will_ complete its own fences
no matter what.

For now this should be good enough, but for arb_robusteness or people
who care a bit about their compute results we need reliable
notification to userspace that a reset happened. I think we could add
a new "aborted" fence state for that case and then propagate that. But
given how tricky the code to compute reset victims in i915 is already
I think we should leave this out for now. And even later on make it
strictly opt-in.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Maarten Lankhorst

2014-Jul-23 07:06 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 23-07-14 08:52, Christian K?nig schreef:> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>> op 22-07-14 17:59, Christian K?nig schreef:
>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>>> <christian.koenig at amd.com> wrote:
>>>>> Drivers exporting fences need to provide a
fence->signaled and a fence->wait
>>>>> function, everything else like fence->enable_signaling
or calling
>>>>> fence_signaled() from the driver is optional.
>>>>>
>>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>>> fence->wait in atomic or interrupt context, and not with
holding any global
>>>>> locking primitives (like mmap_sem etc...). Holding locking
primitives local
>>>>> to the driver is ok, as long as they don't conflict
with anything possible
>>>>> used by their own fence implementation.
>>>> Well that's almost what we have right now with the
exception that
>>>> drivers are allowed (actually must for correctness when
updating
>>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>> In this case sorry for so much noise. I really haven't looked
in so much detail into anything but Maarten's Radeon patches.
>>>
>>> But how does that then work right now? My impression was that
it's mandatory for drivers to call fence_signaled()?
>> It's only mandatory to call fence_signal() if the .enable_signaling
callback has been called, else you can get away with never calling signaling a
fence at all before dropping the last refcount to it.
>> This allows you to keep interrupts disabled when you don't need
them.
>
> Can we somehow avoid the need to call fence_signal() at all? The interrupts
at least on radeon are way to unreliable for such a thing. Can enable_signalling
fail? What's the reason for fence_signaled() in the first place?It doesn't need to be completely reliable, or finish immediately.

And any time wake_up_all(&rdev->fence_queue) is called all the fences
that were enabled will be rechecked.
>>>> Agreed that any shared locks are out of the way (especially
stuff like
>>>> dev->struct_mutex or other non-strictly driver-private
stuff, i915 is
>>>> really bad here still).
>>> Yeah that's also an point I've wanted to note on Maartens
patch. Radeon grabs the read side of it's exclusive semaphore while waiting
for fences (because it assumes that the fence it waits for is a Radeon fence).
>>>
>>> Assuming that we need to wait in both directions with Prime (e.g.
Intel driver needs to wait for Radeon to finish rendering and Radeon needs to
wait for Intel to finish displaying), this might become a perfect example of
locking inversion.
>> In the preliminary patches where I can sync radeon with other GPU's
I've been very careful in all the places that call into fences, to make sure
that radeon wouldn't try to handle lockups for a different (possibly also
radeon) card.
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
happen that radeon waits for the lockup to be resolved and the lockup handling
needs to wait for a fence that's never signaled because of the lockup.The lockup handling calls radeon_fence_wait, not the generic fence_wait. It
doesn't call the exported wait function that takes the exclusive_lock in
read mode.
And lockdep should have complained if I screwed that up. :-)

~Maarten

Daniel Vetter

2014-Jul-23 07:09 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 9:06 AM, Maarten Lankhorst
<maarten.lankhorst at canonical.com> wrote:>> Can we somehow avoid the need to call fence_signal() at all? The
interrupts at least on radeon are way to unreliable for such a thing. Can
enable_signalling fail? What's the reason for fence_signaled() in the first
place?
> It doesn't need to be completely reliable, or finish immediately.
>
> And any time wake_up_all(&rdev->fence_queue) is called all the
fences that were enabled will be rechecked.
I raised this already somewhere else, but should we have some common
infrastructure in the core fence code to recheck fences periodically?
radeon doesn't seem to be the only hw where this isn't reliable
enough. Of course timer-based rechecking would only work if the driver
provides the fence->signalled callback to recheck actual fence state.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Christian König

2014-Jul-23 07:26 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 09:06, schrieb Maarten Lankhorst:> op 23-07-14 08:52, Christian K?nig schreef:
>> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>>> op 22-07-14 17:59, Christian K?nig schreef:
>>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>>>> <christian.koenig at amd.com> wrote:
>>>>>> Drivers exporting fences need to provide a
fence->signaled and a fence->wait
>>>>>> function, everything else like
fence->enable_signaling or calling
>>>>>> fence_signaled() from the driver is optional.
>>>>>>
>>>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>>>> fence->wait in atomic or interrupt context, and not
with holding any global
>>>>>> locking primitives (like mmap_sem etc...). Holding
locking primitives local
>>>>>> to the driver is ok, as long as they don't conflict
with anything possible
>>>>>> used by their own fence implementation.
>>>>> Well that's almost what we have right now with the
exception that
>>>>> drivers are allowed (actually must for correctness when
updating
>>>>> fences) the ww_mutexes for dma-bufs (or other buffer
objects).
>>>> In this case sorry for so much noise. I really haven't
looked in so much detail into anything but Maarten's Radeon patches.
>>>>
>>>> But how does that then work right now? My impression was that
it's mandatory for drivers to call fence_signaled()?
>>> It's only mandatory to call fence_signal() if the
.enable_signaling callback has been called, else you can get away with never
calling signaling a fence at all before dropping the last refcount to it.
>>> This allows you to keep interrupts disabled when you don't need
them.
>> Can we somehow avoid the need to call fence_signal() at all? The
interrupts at least on radeon are way to unreliable for such a thing. Can
enable_signalling fail? What's the reason for fence_signaled() in the first
place?
> It doesn't need to be completely reliable, or finish immediately.
>
> And any time wake_up_all(&rdev->fence_queue) is called all the
fences that were enabled will be rechecked.
>
>>>>> Agreed that any shared locks are out of the way (especially
stuff like
>>>>> dev->struct_mutex or other non-strictly driver-private
stuff, i915 is
>>>>> really bad here still).
>>>> Yeah that's also an point I've wanted to note on
Maartens patch. Radeon grabs the read side of it's exclusive semaphore while
waiting for fences (because it assumes that the fence it waits for is a Radeon
fence).
>>>>
>>>> Assuming that we need to wait in both directions with Prime
(e.g. Intel driver needs to wait for Radeon to finish rendering and Radeon needs
to wait for Intel to finish displaying), this might become a perfect example of
locking inversion.
>>> In the preliminary patches where I can sync radeon with other
GPU's I've been very careful in all the places that call into fences, to
make sure that radeon wouldn't try to handle lockups for a different
(possibly also radeon) card.
>> That's actually not such a good idea.
>>
>> In case of a lockup we need to handle the lockup cause otherwise it
could happen that radeon waits for the lockup to be resolved and the lockup
handling needs to wait for a fence that's never signaled because of the
lockup.
> The lockup handling calls radeon_fence_wait, not the generic fence_wait. It
doesn't call the exported wait function that takes the exclusive_lock in
read mode.
> And lockdep should have complained if I screwed that up. :-)
You screwed it up and lockdep didn't warn you about it :-P

It's not a locking problem I'm talking about here. Radeons lockup 
handling kicks in when anything calls into the driver from the outside, 
if you have a fence wait function that's called from the outside but 
doesn't handle lockups you essentially rely on somebody else calling 
another radeon function for the lockup to be resolved.

Christian.
>
> ~Maarten
>

Rob Clark

2014-Jul-23 12:35 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Wed, Jul 23, 2014 at 2:52 AM, Christian K?nig
<christian.koenig at amd.com> wrote:> Am 23.07.2014 08:40, schrieb Maarten Lankhorst:
>
>> op 22-07-14 17:59, Christian K?nig schreef:
>>>
>>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>>>
>>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>>> <christian.koenig at amd.com> wrote:
>>>>>
>>>>> Drivers exporting fences need to provide a
fence->signaled and a
>>>>> fence->wait
>>>>> function, everything else like fence->enable_signaling
or calling
>>>>> fence_signaled() from the driver is optional.
>>>>>
>>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>>> fence->wait in atomic or interrupt context, and not with
holding any
>>>>> global
>>>>> locking primitives (like mmap_sem etc...). Holding locking
primitives
>>>>> local
>>>>> to the driver is ok, as long as they don't conflict
with anything
>>>>> possible
>>>>> used by their own fence implementation.
>>>>
>>>> Well that's almost what we have right now with the
exception that
>>>> drivers are allowed (actually must for correctness when
updating
>>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>>
>>> In this case sorry for so much noise. I really haven't looked
in so much
>>> detail into anything but Maarten's Radeon patches.
>>>
>>> But how does that then work right now? My impression was that
it's
>>> mandatory for drivers to call fence_signaled()?
>>
>> It's only mandatory to call fence_signal() if the .enable_signaling
>> callback has been called, else you can get away with never calling
signaling
>> a fence at all before dropping the last refcount to it.
>> This allows you to keep interrupts disabled when you don't need
them.
>
>
> Can we somehow avoid the need to call fence_signal() at all? The interrupts
> at least on radeon are way to unreliable for such a thing. Can
> enable_signalling fail? What's the reason for fence_signaled() in the
first
> place?
>
the device you are sharing with may not be able to do hw<->hw
signalling.. think about buffer sharing w/ camera, for example.

You probably want your ->enable_signalling() to enable whatever
workaround periodic-polling you need to do to catch missed irq's (and
then call fence->signal() once you detect the fence has passed.

fwiw, I haven't had a chance to read this whole thread yet, but I
expect that a lot of the SoC devices, especially ones with separate
kms-only display and gpu drivers, will want callback from gpu's irq to
bang a few display controller registers.  I agree in general callbacks
from atomic ctx is probably something you want to avoid, but in this
particular case I think it is worth the extra complexity.  Nothing is
stopping a driver from using a callback that just chucks something on
a workqueue, whereas the inverse is not possible.

BR,
-R
>
>>>> Agreed that any shared locks are out of the way (especially
stuff like
>>>> dev->struct_mutex or other non-strictly driver-private
stuff, i915 is
>>>> really bad here still).
>>>
>>> Yeah that's also an point I've wanted to note on Maartens
patch. Radeon
>>> grabs the read side of it's exclusive semaphore while waiting
for fences
>>> (because it assumes that the fence it waits for is a Radeon fence).
>>>
>>> Assuming that we need to wait in both directions with Prime (e.g.
Intel
>>> driver needs to wait for Radeon to finish rendering and Radeon
needs to wait
>>> for Intel to finish displaying), this might become a perfect
example of
>>> locking inversion.
>>
>> In the preliminary patches where I can sync radeon with other GPU's
I've
>> been very careful in all the places that call into fences, to make sure
that
>> radeon wouldn't try to handle lockups for a different (possibly
also radeon)
>> card.
>
>
> That's actually not such a good idea.
>
> In case of a lockup we need to handle the lockup cause otherwise it could
> happen that radeon waits for the lockup to be resolved and the lockup
> handling needs to wait for a fence that's never signaled because of the
> lockup.
>
> Christian.
>
>
>>
>> This is also why fence_is_signaled should never block, and why it
trylocks
>> the exclusive_lock. :-) I think lockdep would complain if I grabbed
>> exclusive_lock while blocking in is_signaled.
>>
>>>> So from the core fence framework I think we already have
exactly this,
>>>> and we only need to adjust the radeon implementation a bit to
make it
>>>> less risky and invasive to the radeon driver logic.
>>>
>>> Agree. Well the biggest problem I see is that exclusive semaphore I
need
>>> to take when anything calls into the driver. For the fence code I
need to
>>> move that down into the fence->signaled handler, cause that now
can be
>>> called from outside the driver.
>>>
>>> Maarten solved this by telling the driver in the lockup handler
(where we
>>> grab the write side of the exclusive lock) that all interrupts are
already
>>> enabled, so that fence->signaled hopefully wouldn't mess
with the hardware
>>> at all. While this probably works, it just leaves me with a feeling
that we
>>> are doing something wrong here.
>>
>> There is unfortunately no global mechanism to say 'this card is
locked up,
>> please don't call into any of my fences', and I don't
associate fences with
>> devices, and radeon doesn't keep a global list of fences.
>> If all of that existed, it would complicate the interface and its
callers
>> a lot, while I like to keep things simple.
>> So I did the best I could, and simply prevented the fence calls from
>> fiddling with the hardware. Fortunately gpu lockup is not a common
>> operation. :-)
>>
>> ~Maarten
>>
>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

Reasonably Related Threads

Search for more apparently analagous threads

Nouveau - Jul 2014 - [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Reasonably Related Threads