thr3ads.net - Nouveau - [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Christian König

2014-Jul-22 15:59 UTC

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 22.07.2014 17:42, schrieb Daniel Vetter:> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
> <christian.koenig at amd.com> wrote:
>> Drivers exporting fences need to provide a fence->signaled and a
fence->wait
>> function, everything else like fence->enable_signaling or calling
>> fence_signaled() from the driver is optional.
>>
>> Drivers wanting to use exported fences don't call
fence->signaled or
>> fence->wait in atomic or interrupt context, and not with holding any
global
>> locking primitives (like mmap_sem etc...). Holding locking primitives
local
>> to the driver is ok, as long as they don't conflict with anything
possible
>> used by their own fence implementation.
> Well that's almost what we have right now with the exception that
> drivers are allowed (actually must for correctness when updating
> fences) the ww_mutexes for dma-bufs (or other buffer objects).
In this case sorry for so much noise. I really haven't looked in so much 
detail into anything but Maarten's Radeon patches.

But how does that then work right now? My impression was that it's 
mandatory for drivers to call fence_signaled()?
> Locking
> correctness is enforced with some extremely nasty lockdep annotations
> + additional debugging infrastructure enabled with
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
> dma-buf ww_mutexes while updating fences or waiting for them. And
> obviously for ->wait we need non-atomic context, not just
> non-interrupt.
Sounds mostly reasonable, but for holding the dma-buf ww_mutex, wouldn't 
be an RCU be more appropriate here? E.g. aren't we just interested that 
the current assigned fence at some point is signaled?

Something like grab ww_mutexes, grab a reference to the current fence 
object, release ww_mutex, wait for fence, release reference to the fence 
object.

> Agreed that any shared locks are out of the way (especially stuff like
> dev->struct_mutex or other non-strictly driver-private stuff, i915 is
> really bad here still).
Yeah that's also an point I've wanted to note on Maartens patch. Radeon 
grabs the read side of it's exclusive semaphore while waiting for fences 
(because it assumes that the fence it waits for is a Radeon fence).

Assuming that we need to wait in both directions with Prime (e.g. Intel 
driver needs to wait for Radeon to finish rendering and Radeon needs to 
wait for Intel to finish displaying), this might become a perfect 
example of locking inversion.
> So from the core fence framework I think we already have exactly this,
> and we only need to adjust the radeon implementation a bit to make it
> less risky and invasive to the radeon driver logic.
Agree. Well the biggest problem I see is that exclusive semaphore I need 
to take when anything calls into the driver. For the fence code I need 
to move that down into the fence->signaled handler, cause that now can 
be called from outside the driver.

Maarten solved this by telling the driver in the lockup handler (where 
we grab the write side of the exclusive lock) that all interrupts are 
already enabled, so that fence->signaled hopefully wouldn't mess with 
the hardware at all. While this probably works, it just leaves me with a 
feeling that we are doing something wrong here.

Christian.
> -Daniel

Daniel Vetter

2014-Jul-22 16:21 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 5:59 PM, Christian K?nig
<deathsimple at vodafone.de> wrote:> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>
>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>> <christian.koenig at amd.com> wrote:
>>>
>>> Drivers exporting fences need to provide a fence->signaled and a
>>> fence->wait
>>> function, everything else like fence->enable_signaling or
calling
>>> fence_signaled() from the driver is optional.
>>>
>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>> fence->wait in atomic or interrupt context, and not with holding
any
>>> global
>>> locking primitives (like mmap_sem etc...). Holding locking
primitives
>>> local
>>> to the driver is ok, as long as they don't conflict with
anything
>>> possible
>>> used by their own fence implementation.
>>
>> Well that's almost what we have right now with the exception that
>> drivers are allowed (actually must for correctness when updating
>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>
>
> In this case sorry for so much noise. I really haven't looked in so
much
> detail into anything but Maarten's Radeon patches.
>
> But how does that then work right now? My impression was that it's
mandatory
> for drivers to call fence_signaled()?
Maybe I've mixed things up a bit in my description. There is
fence_signal which the implementor/exporter of a fence must call when
the fence is completed. If the exporter has an ->enable_signaling
callback it can delay that call to fence_signal for as long as it
wishes as long as enable_signaling isn't called yet. But that's just
the optimization to not required irqs to be turned on all the time.

The other function is fence_is_signaled, which is used by code that is
interested in the fence state, together with fence_wait if it wants to
block and not just wants to know the momentary fence state. All the
other functions (the stuff that adds callbacks and the various _locked
and other versions) are just for fancy special cases.
>> Locking
>> correctness is enforced with some extremely nasty lockdep annotations
>> + additional debugging infrastructure enabled with
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>> dma-buf ww_mutexes while updating fences or waiting for them. And
>> obviously for ->wait we need non-atomic context, not just
>> non-interrupt.
>
>
> Sounds mostly reasonable, but for holding the dma-buf ww_mutex,
wouldn't be
> an RCU be more appropriate here? E.g. aren't we just interested that
the
> current assigned fence at some point is signaled?
Yeah, as an optimization you can get the set of currently attached
fences to a dma-buf with just rcu. But if you update the set of fences
attached to a dma-buf (e.g. radeon blits the newly rendered frame to a
dma-buf exported by i915 for scanout on i915) then you need a write
lock on that buffer. Which is what the ww_mutex is for, to make sure
that you don't deadlock with i915 doing concurrent ops on the same
underlying buffer.
> Something like grab ww_mutexes, grab a reference to the current fence
> object, release ww_mutex, wait for fence, release reference to the fence
> object.
Yeah, if the only thing you want to do is wait for fences, then the
rcu-protected fence ref grabbing + lockless waiting is all you need.
But e.g. in an execbuf you also need to update fences and maybe deep
down in the reservation code you notice that you need to evict some
stuff and so need to wait on some other guy to finish, and it's too
complicated to drop and reacquire all the locks. Or you simply need to
do a blocking wait on other gpus (because there's no direct hw sync
mechanism) and again dropping locks would needlessly complicate the
code. So I think we should allow this just to avoid too hairy/brittle
(and almost definitely little tested code) in drivers.

Afaik this is also the same way ttm currently handles things wrt
buffer reservation and eviction.
>> Agreed that any shared locks are out of the way (especially stuff like
>> dev->struct_mutex or other non-strictly driver-private stuff, i915
is
>> really bad here still).
>
>
> Yeah that's also an point I've wanted to note on Maartens patch.
Radeon
> grabs the read side of it's exclusive semaphore while waiting for
fences
> (because it assumes that the fence it waits for is a Radeon fence).
>
> Assuming that we need to wait in both directions with Prime (e.g. Intel
> driver needs to wait for Radeon to finish rendering and Radeon needs to
wait
> for Intel to finish displaying), this might become a perfect example of
> locking inversion.
fence updates are atomic on a dma-buf, protected by ww_mutex. The neat
trick of ww_mutex is that they enforce a global ordering, so in your
scenario either i915 or radeon would be first and you can't deadlock.
There is no way to interleave anything even if you have lots of
buffers shared between i915/radeon. Wrt deadlocking it's exactly the
same guarantees as the magic ttm provides for just one driver with
concurrent command submission since it's the same idea.
>> So from the core fence framework I think we already have exactly this,
>> and we only need to adjust the radeon implementation a bit to make it
>> less risky and invasive to the radeon driver logic.
>
>
> Agree. Well the biggest problem I see is that exclusive semaphore I need to
> take when anything calls into the driver. For the fence code I need to move
> that down into the fence->signaled handler, cause that now can be called
> from outside the driver.
>
> Maarten solved this by telling the driver in the lockup handler (where we
> grab the write side of the exclusive lock) that all interrupts are already
> enabled, so that fence->signaled hopefully wouldn't mess with the
hardware
> at all. While this probably works, it just leaves me with a feeling that we
> are doing something wrong here.
I'm not versed on the details in readon, but on i915 we can attach a
memory location and cookie value to each fence and just do a memory
fetch to figure out whether the fence has passed or not. So no locking
needed at all. Of course the fence itself needs to lock a reference
onto that memory location, which is a neat piece of integration work
that we still need to tackle in some cases - there's conflicting patch
series all over this ;-)

But like I've said fence->signaled is optional so you don't need this
necessarily, as long as radeon eventually calls fence_signaled once
the fence has completed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Christian König

2014-Jul-22 16:39 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's
just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.Well that's rather bad, cause IRQs aren't reliable enough on Radeon HW 
for such a thing. Especially on Prime systems and Macs.

That's why we have this fancy HZ/2 timeout on all fence wait operations 
to manually check if the fence is signaled or not.

To guarantee that a fence is signaled after enable_signaling is called 
we would need to fire up a kernel thread which periodically calls 
fence->signaled.

Christian.

Am 22.07.2014 18:21, schrieb Daniel Vetter:> On Tue, Jul 22, 2014 at 5:59 PM, Christian K?nig
> <deathsimple at vodafone.de> wrote:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>> <christian.koenig at amd.com> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled
and a
>>>> fence->wait
>>>> function, everything else like fence->enable_signaling or
calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with
holding any
>>>> global
>>>> locking primitives (like mmap_sem etc...). Holding locking
primitives
>>>> local
>>>> to the driver is ok, as long as they don't conflict with
anything
>>>> possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception
that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>>
>> In this case sorry for so much noise. I really haven't looked in so
much
>> detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's
mandatory
>> for drivers to call fence_signaled()?
> Maybe I've mixed things up a bit in my description. There is
> fence_signal which the implementor/exporter of a fence must call when
> the fence is completed. If the exporter has an ->enable_signaling
> callback it can delay that call to fence_signal for as long as it
> wishes as long as enable_signaling isn't called yet. But that's
just
> the optimization to not required irqs to be turned on all the time.
>
> The other function is fence_is_signaled, which is used by code that is
> interested in the fence state, together with fence_wait if it wants to
> block and not just wants to know the momentary fence state. All the
> other functions (the stuff that adds callbacks and the various _locked
> and other versions) are just for fancy special cases.
>
>>> Locking
>>> correctness is enforced with some extremely nasty lockdep
annotations
>>> + additional debugging infrastructure enabled with
>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>>> dma-buf ww_mutexes while updating fences or waiting for them. And
>>> obviously for ->wait we need non-atomic context, not just
>>> non-interrupt.
>>
>> Sounds mostly reasonable, but for holding the dma-buf ww_mutex,
wouldn't be
>> an RCU be more appropriate here? E.g. aren't we just interested
that the
>> current assigned fence at some point is signaled?
> Yeah, as an optimization you can get the set of currently attached
> fences to a dma-buf with just rcu. But if you update the set of fences
> attached to a dma-buf (e.g. radeon blits the newly rendered frame to a
> dma-buf exported by i915 for scanout on i915) then you need a write
> lock on that buffer. Which is what the ww_mutex is for, to make sure
> that you don't deadlock with i915 doing concurrent ops on the same
> underlying buffer.
>
>> Something like grab ww_mutexes, grab a reference to the current fence
>> object, release ww_mutex, wait for fence, release reference to the
fence
>> object.
> Yeah, if the only thing you want to do is wait for fences, then the
> rcu-protected fence ref grabbing + lockless waiting is all you need.
> But e.g. in an execbuf you also need to update fences and maybe deep
> down in the reservation code you notice that you need to evict some
> stuff and so need to wait on some other guy to finish, and it's too
> complicated to drop and reacquire all the locks. Or you simply need to
> do a blocking wait on other gpus (because there's no direct hw sync
> mechanism) and again dropping locks would needlessly complicate the
> code. So I think we should allow this just to avoid too hairy/brittle
> (and almost definitely little tested code) in drivers.
>
> Afaik this is also the same way ttm currently handles things wrt
> buffer reservation and eviction.
>
>>> Agreed that any shared locks are out of the way (especially stuff
like
>>> dev->struct_mutex or other non-strictly driver-private stuff,
i915 is
>>> really bad here still).
>>
>> Yeah that's also an point I've wanted to note on Maartens
patch. Radeon
>> grabs the read side of it's exclusive semaphore while waiting for
fences
>> (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel
>> driver needs to wait for Radeon to finish rendering and Radeon needs to
wait
>> for Intel to finish displaying), this might become a perfect example of
>> locking inversion.
> fence updates are atomic on a dma-buf, protected by ww_mutex. The neat
> trick of ww_mutex is that they enforce a global ordering, so in your
> scenario either i915 or radeon would be first and you can't deadlock.
> There is no way to interleave anything even if you have lots of
> buffers shared between i915/radeon. Wrt deadlocking it's exactly the
> same guarantees as the magic ttm provides for just one driver with
> concurrent command submission since it's the same idea.
>
>>> So from the core fence framework I think we already have exactly
this,
>>> and we only need to adjust the radeon implementation a bit to make
it
>>> less risky and invasive to the radeon driver logic.
>>
>> Agree. Well the biggest problem I see is that exclusive semaphore I
need to
>> take when anything calls into the driver. For the fence code I need to
move
>> that down into the fence->signaled handler, cause that now can be
called
>> from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where
we
>> grab the write side of the exclusive lock) that all interrupts are
already
>> enabled, so that fence->signaled hopefully wouldn't mess with
the hardware
>> at all. While this probably works, it just leaves me with a feeling
that we
>> are doing something wrong here.
> I'm not versed on the details in readon, but on i915 we can attach a
> memory location and cookie value to each fence and just do a memory
> fetch to figure out whether the fence has passed or not. So no locking
> needed at all. Of course the fence itself needs to lock a reference
> onto that memory location, which is a neat piece of integration work
> that we still need to tackle in some cases - there's conflicting patch
> series all over this ;-)
>
> But like I've said fence->signaled is optional so you don't need
this
> necessarily, as long as radeon eventually calls fence_signaled once
> the fence has completed.
> -Daniel

Daniel Vetter

2014-Jul-22 16:43 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

On Tue, Jul 22, 2014 at 6:21 PM, Daniel Vetter <daniel.vetter at ffwll.ch>
wrote:> But like I've said fence->signaled is optional so you don't need
this
> necessarily, as long as radeon eventually calls fence_signaled once
> the fence has completed.
Actually I've chatted a bit with Maarten about the different ways we
could restrict both the calling context and the implementations for
fence callbacks to avoid surprises. One is certainyl that we need
WARN_ON(in_interrupt) around the wait, enable_singaling and add
callback stuff.

But we also talked about ensure that the ->signaled callback never
sleeps by wrapping it in a preempt_enable/disable section. Currently
sleeping is allowed in ->signaled, which the radeon implementation
here does. I think it would be reasonable to forbid this and drop
__radeon_fence_signaled.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Maarten Lankhorst

2014-Jul-23 06:40 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

op 22-07-14 17:59, Christian K?nig schreef:> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>> <christian.koenig at amd.com> wrote:
>>> Drivers exporting fences need to provide a fence->signaled and a
fence->wait
>>> function, everything else like fence->enable_signaling or
calling
>>> fence_signaled() from the driver is optional.
>>>
>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>> fence->wait in atomic or interrupt context, and not with holding
any global
>>> locking primitives (like mmap_sem etc...). Holding locking
primitives local
>>> to the driver is ok, as long as they don't conflict with
anything possible
>>> used by their own fence implementation.
>> Well that's almost what we have right now with the exception that
>> drivers are allowed (actually must for correctness when updating
>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>
> In this case sorry for so much noise. I really haven't looked in so
much detail into anything but Maarten's Radeon patches.
>
> But how does that then work right now? My impression was that it's
mandatory for drivers to call fence_signaled()?It's only mandatory to call fence_signal() if the .enable_signaling callback
has been called, else you can get away with never calling signaling a fence at
all before dropping the last refcount to it.
This allows you to keep interrupts disabled when you don't need
them.>> Locking
>> correctness is enforced with some extremely nasty lockdep annotations
>> + additional debugging infrastructure enabled with
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. We really need to be able to hold
>> dma-buf ww_mutexes while updating fences or waiting for them. And
>> obviously for ->wait we need non-atomic context, not just
>> non-interrupt.
>
> Sounds mostly reasonable, but for holding the dma-buf ww_mutex,
wouldn't be an RCU be more appropriate here? E.g. aren't we just
interested that the current assigned fence at some point is signaled?You can wait with RCU, without holding the ww_mutex, by calling
reservation_object_wait_timeout_rcu on ttm_bo->resv.
If you don't want to block you could test with
reservation_object_test_signaled_rcu.
Or if you want a copy of all fences without taking locks, try
reservation_object_get_fences_rcu. (Might be out of date by the time the
function returns if you don't hold ww_mutex, if you hold ww_mutex you
probably don't need to call this function.)

I didn't add non-rcu versions, but using the RCU functions would work with
ww_mutex held too, probably with some small overhead.> Something like grab ww_mutexes, grab a reference to the current fence
object, release ww_mutex, wait for fence, release reference to the fence object.This is what I do currently. :-) The reservation_object that's embedded in
TTM gets shared with the dma-buf, so there will be no special case needed for
dma-buf at all, all objects can simply be shared and the synchronization is
handled in the same way.

ttm_bo_reserve and friends automatically end up locking the dma-buf too, and
lockdep works on it.
>
>> Agreed that any shared locks are out of the way (especially stuff like
>> dev->struct_mutex or other non-strictly driver-private stuff, i915
is
>> really bad here still).
>
> Yeah that's also an point I've wanted to note on Maartens patch.
Radeon grabs the read side of it's exclusive semaphore while waiting for
fences (because it assumes that the fence it waits for is a Radeon fence).
>
> Assuming that we need to wait in both directions with Prime (e.g. Intel
driver needs to wait for Radeon to finish rendering and Radeon needs to wait for
Intel to finish displaying), this might become a perfect example of locking
inversion.In the preliminary patches where I can sync radeon with other GPU's I've
been very careful in all the places that call into fences, to make sure that
radeon wouldn't try to handle lockups for a different (possibly also radeon)
card.

This is also why fence_is_signaled should never block, and why it trylocks the
exclusive_lock. :-) I think lockdep would complain if I grabbed exclusive_lock
while blocking in is_signaled.
>> So from the core fence framework I think we already have exactly this,
>> and we only need to adjust the radeon implementation a bit to make it
>> less risky and invasive to the radeon driver logic.
>
> Agree. Well the biggest problem I see is that exclusive semaphore I need to
take when anything calls into the driver. For the fence code I need to move that
down into the fence->signaled handler, cause that now can be called from
outside the driver.
>
> Maarten solved this by telling the driver in the lockup handler (where we
grab the write side of the exclusive lock) that all interrupts are already
enabled, so that fence->signaled hopefully wouldn't mess with the
hardware at all. While this probably works, it just leaves me with a feeling
that we are doing something wrong here.There is unfortunately no global mechanism to say 'this card is locked up,
please don't call into any of my fences', and I don't associate
fences with devices, and radeon doesn't keep a global list of fences.
If all of that existed, it would complicate the interface and its callers a lot,
while I like to keep things simple.
So I did the best I could, and simply prevented the fence calls from fiddling
with the hardware. Fortunately gpu lockup is not a common operation. :-)

~Maarten

Christian König

2014-Jul-23 06:52 UTC

head link

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Am 23.07.2014 08:40, schrieb Maarten Lankhorst:> op 22-07-14 17:59, Christian K?nig schreef:
>> Am 22.07.2014 17:42, schrieb Daniel Vetter:
>>> On Tue, Jul 22, 2014 at 5:35 PM, Christian K?nig
>>> <christian.koenig at amd.com> wrote:
>>>> Drivers exporting fences need to provide a fence->signaled
and a fence->wait
>>>> function, everything else like fence->enable_signaling or
calling
>>>> fence_signaled() from the driver is optional.
>>>>
>>>> Drivers wanting to use exported fences don't call
fence->signaled or
>>>> fence->wait in atomic or interrupt context, and not with
holding any global
>>>> locking primitives (like mmap_sem etc...). Holding locking
primitives local
>>>> to the driver is ok, as long as they don't conflict with
anything possible
>>>> used by their own fence implementation.
>>> Well that's almost what we have right now with the exception
that
>>> drivers are allowed (actually must for correctness when updating
>>> fences) the ww_mutexes for dma-bufs (or other buffer objects).
>> In this case sorry for so much noise. I really haven't looked in so
much detail into anything but Maarten's Radeon patches.
>>
>> But how does that then work right now? My impression was that it's
mandatory for drivers to call fence_signaled()?
> It's only mandatory to call fence_signal() if the .enable_signaling
callback has been called, else you can get away with never calling signaling a
fence at all before dropping the last refcount to it.
> This allows you to keep interrupts disabled when you don't need them.
Can we somehow avoid the need to call fence_signal() at all? The 
interrupts at least on radeon are way to unreliable for such a thing. 
Can enable_signalling fail? What's the reason for fence_signaled() in 
the first place?
>>> Agreed that any shared locks are out of the way (especially stuff
like
>>> dev->struct_mutex or other non-strictly driver-private stuff,
i915 is
>>> really bad here still).
>> Yeah that's also an point I've wanted to note on Maartens
patch. Radeon grabs the read side of it's exclusive semaphore while waiting
for fences (because it assumes that the fence it waits for is a Radeon fence).
>>
>> Assuming that we need to wait in both directions with Prime (e.g. Intel
driver needs to wait for Radeon to finish rendering and Radeon needs to wait for
Intel to finish displaying), this might become a perfect example of locking
inversion.
> In the preliminary patches where I can sync radeon with other GPU's
I've been very careful in all the places that call into fences, to make sure
that radeon wouldn't try to handle lockups for a different (possibly also
radeon) card.
That's actually not such a good idea.

In case of a lockup we need to handle the lockup cause otherwise it 
could happen that radeon waits for the lockup to be resolved and the 
lockup handling needs to wait for a fence that's never signaled because 
of the lockup.

Christian.
>
> This is also why fence_is_signaled should never block, and why it trylocks
the exclusive_lock. :-) I think lockdep would complain if I grabbed
exclusive_lock while blocking in is_signaled.
>
>>> So from the core fence framework I think we already have exactly
this,
>>> and we only need to adjust the radeon implementation a bit to make
it
>>> less risky and invasive to the radeon driver logic.
>> Agree. Well the biggest problem I see is that exclusive semaphore I
need to take when anything calls into the driver. For the fence code I need to
move that down into the fence->signaled handler, cause that now can be called
from outside the driver.
>>
>> Maarten solved this by telling the driver in the lockup handler (where
we grab the write side of the exclusive lock) that all interrupts are already
enabled, so that fence->signaled hopefully wouldn't mess with the
hardware at all. While this probably works, it just leaves me with a feeling
that we are doing something wrong here.
> There is unfortunately no global mechanism to say 'this card is locked
up, please don't call into any of my fences', and I don't associate
fences with devices, and radeon doesn't keep a global list of fences.
> If all of that existed, it would complicate the interface and its callers a
lot, while I like to keep things simple.
> So I did the best I could, and simply prevented the fence calls from
fiddling with the hardware. Fortunately gpu lockup is not a common operation.
:-)
>
> ~Maarten
>
>

Reasonably Related Threads

Search for more seemingly similar threads

Nouveau - Jul 2014 - [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

Reasonably Related Threads