thr3ads.net - Nouveau - [Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences [May 2014]

If this information is useful, please help other people find it:
Share via:

Christian König

2014-May-15 15:48 UTC

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:> op 15-05-14 15:19, Christian K?nig schreef:
>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>> op 15-05-14 11:42, Christian K?nig schreef:
>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 11:21, Christian K?nig schreef:
>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>> op 14-05-14 17:29, Christian K?nig schreef:
>>>>>>>>> +    /* did fence get signaled after we
enabled the sw irq? */
>>>>>>>>> +    if 
>>>>>>>>>
(atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq)
>>>>>>>>> >= fence->seq) {
>>>>>>>>> +       
radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>> +    fence->fence_wake.private = NULL;
>>>>>>>>> +    fence->fence_wake.func =
radeon_fence_check_signaled;
>>>>>>>>> +
__add_wait_queue(&fence->rdev->fence_queue,
>>>>>>>>> &fence->fence_wake);
>>>>>>>>> +    fence_get(f);
>>>>>>>> That looks like a race condition to me. The
fence needs to be
>>>>>>>> added to the wait queue before the check, not
after.
>>>>>>>>
>>>>>>>> Apart from that the whole approach looks like a
really bad idea
>>>>>>>> to me. How for example is lockup detection
supposed to happen
>>>>>>>> with this? 
>>>>>>> It's not a race condition because
fence_queue.lock is held when
>>>>>>> this function is called.
>>>>>> Ah, I see. That's also the reason why you moved the
wake_up_all
>>>>>> out of the processing function.
>>>>> Correct. :-)
>>>>>>> Lockup's a bit of a weird problem, the changes
wouldn't allow
>>>>>>> core ttm code to handle the lockup any more,
>>>>>>> but any driver specific wait code would still
handle this. I did
>>>>>>> this by design, because in future patches the wait
>>>>>>> function may be called from outside of the radeon
driver. The
>>>>>>> official wait function takes a timeout parameter,
>>>>>>> so lockups wouldn't be fatal if the timeout is
set to something
>>>>>>> like 30*HZ for example, it would still return
>>>>>>> and report that the function timed out.
>>>>>> Timeouts help with the detection of the lockup, but not
at all
>>>>>> with the handling of them.
>>>>>>
>>>>>> What we essentially need is a wait callback into the
driver that
>>>>>> is called in non atomic context without any locks held.
>>>>>>
>>>>>> This way we can block for the fence to become signaled
with a
>>>>>> timeout and can then also initiate the reset handling
if necessary.
>>>>>>
>>>>>> The way you designed the interface now means that the
driver
>>>>>> never gets a chance to wait for the hardware to become
idle and
>>>>>> so never has the opportunity to the reset the whole
thing.
>>>>> You could set up a hangcheck timer like intel does, and end
up
>>>>> with a reliable hangcheck detection that doesn't depend
on cpu
>>>>> waits. :-) Or override the default wait function and
restore the
>>>>> old behavior.
>>>>
>>>> Overriding the default wait function sounds better, please 
>>>> implement it this way.
>>>>
>>>> Thanks,
>>>> Christian. 
>>>
>>> Does this modification look sane?
>> Adding the timeout is on my todo list for quite some time as well, so 
>> this part makes sense.
>>
>>> +static long __radeon_fence_wait(struct fence *f, bool intr, long 
>>> timeout)
>>> +{
>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>> +
>>> +    target_seq[fence->ring] = fence->seq;
>>> +    return radeon_fence_wait_seq_timeout(fence->rdev,
target_seq,
>>> intr, timeout);
>>> +}
>> When this call is comming from outside the radeon driver you need to 
>> lock rdev->exclusive_lock here to make sure not to interfere with a 
>> possible reset.
> Ah thanks, I'll add that.
>
>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>      .signaled = __radeon_fence_signaled,
>> Do we still need those callback when we implemented the wait callback?
> .get_timeline_name is used for debugging (trace events).
> .signaled is the non-blocking call to check if the fence is signaled 
> or not.
> .enable_signaling is used for adding callbacks upon fence completion, 
> the default 'fence_default_wait' uses it, so
> when it works no separate implementation is needed unless you want to 
> do more than just waiting.
> It's also used when fence_add_callback is called. i915 can be patched 
> to use it. ;-)
I just meant enable_signaling, the other ones are fine with me. The 
problem with enable_signaling is that it's called with a spin lock held, 
so we can't sleep.

While resetting the GPU could be moved out into a timer the problem here 
is that I can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a 
fence we can't make sure that there is not GPU reset running on another 
CPU. And touching the IRQ registers while a reset is going on is a 
really good recipe to lockup the whole system.

Christian.
>
> ~Maarten

Maarten Lankhorst

2014-May-15 15:58 UTC

head link

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

op 15-05-14 17:48, Christian K?nig schreef:> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>> op 15-05-14 15:19, Christian K?nig schreef:
>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>> op 15-05-14 11:42, Christian K?nig schreef:
>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>> op 15-05-14 11:21, Christian K?nig schreef:
>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>> op 14-05-14 17:29, Christian K?nig schreef:
>>>>>>>>>> +    /* did fence get signaled after we
enabled the sw irq? */
>>>>>>>>>> +    if
(atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >=
fence->seq) {
>>>>>>>>>> +
radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>> +        return false;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    fence->fence_wake.flags = 0;
>>>>>>>>>> +    fence->fence_wake.private =
NULL;
>>>>>>>>>> +    fence->fence_wake.func =
radeon_fence_check_signaled;
>>>>>>>>>> +
__add_wait_queue(&fence->rdev->fence_queue,
&fence->fence_wake);
>>>>>>>>>> +    fence_get(f);
>>>>>>>>> That looks like a race condition to me. The
fence needs to be added to the wait queue before the check, not after.
>>>>>>>>>
>>>>>>>>> Apart from that the whole approach looks
like a really bad idea to me. How for example is lockup detection supposed to
happen with this?
>>>>>>>> It's not a race condition because
fence_queue.lock is held when this function is called.
>>>>>>> Ah, I see. That's also the reason why you moved
the wake_up_all out of the processing function.
>>>>>> Correct. :-)
>>>>>>>> Lockup's a bit of a weird problem, the
changes wouldn't allow core ttm code to handle the lockup any more,
>>>>>>>> but any driver specific wait code would still
handle this. I did this by design, because in future patches the wait
>>>>>>>> function may be called from outside of the
radeon driver. The official wait function takes a timeout parameter,
>>>>>>>> so lockups wouldn't be fatal if the timeout
is set to something like 30*HZ for example, it would still return
>>>>>>>> and report that the function timed out.
>>>>>>> Timeouts help with the detection of the lockup, but
not at all with the handling of them.
>>>>>>>
>>>>>>> What we essentially need is a wait callback into
the driver that is called in non atomic context without any locks held.
>>>>>>>
>>>>>>> This way we can block for the fence to become
signaled with a timeout and can then also initiate the reset handling if
necessary.
>>>>>>>
>>>>>>> The way you designed the interface now means that
the driver never gets a chance to wait for the hardware to become idle and so
never has the opportunity to the reset the whole thing.
>>>>>> You could set up a hangcheck timer like intel does, and
end up with a reliable hangcheck detection that doesn't depend on cpu waits.
:-) Or override the default wait function and restore the old behavior.
>>>>>
>>>>> Overriding the default wait function sounds better, please
implement it this way.
>>>>>
>>>>> Thanks,
>>>>> Christian. 
>>>>
>>>> Does this modification look sane?
>>> Adding the timeout is on my todo list for quite some time as well,
so this part makes sense.
>>>
>>>> +static long __radeon_fence_wait(struct fence *f, bool intr,
long timeout)
>>>> +{
>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>> +
>>>> +    target_seq[fence->ring] = fence->seq;
>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev,
target_seq, intr, timeout);
>>>> +}
>>> When this call is comming from outside the radeon driver you need
to lock rdev->exclusive_lock here to make sure not to interfere with a
possible reset.
>> Ah thanks, I'll add that.
>>
>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>      .signaled = __radeon_fence_signaled,
>>> Do we still need those callback when we implemented the wait
callback?
>> .get_timeline_name is used for debugging (trace events).
>> .signaled is the non-blocking call to check if the fence is signaled or
not.
>> .enable_signaling is used for adding callbacks upon fence completion,
the default 'fence_default_wait' uses it, so
>> when it works no separate implementation is needed unless you want to
do more than just waiting.
>> It's also used when fence_add_callback is called. i915 can be
patched to use it. ;-)
>
> I just meant enable_signaling, the other ones are fine with me. The problem
with enable_signaling is that it's called with a spin lock held, so we
can't sleep.
>
> While resetting the GPU could be moved out into a timer the problem here is
that I can't lock rdev->exclusive_lock in such situations.
>
> This means when i915 would call into radeon to enable signaling for a fence
we can't make sure that there is not GPU reset running on another CPU. And
touching the IRQ registers while a reset is going on is a really good recipe to
lockup the whole system.If you increase the irq counter on all rings before doing a gpu reset, adjust
the state and call sw_irq_put when done this race could never happen. Or am I
missing something?

~Maarten

Christian König

2014-May-15 16:13 UTC

head link

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

Am 15.05.2014 17:58, schrieb Maarten Lankhorst:> op 15-05-14 17:48, Christian K?nig schreef:
>> Am 15.05.2014 16:18, schrieb Maarten Lankhorst:
>>> op 15-05-14 15:19, Christian K?nig schreef:
>>>> Am 15.05.2014 15:04, schrieb Maarten Lankhorst:
>>>>> op 15-05-14 11:42, Christian K?nig schreef:
>>>>>> Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
>>>>>>> op 15-05-14 11:21, Christian K?nig schreef:
>>>>>>>> Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
>>>>>>>>> op 14-05-14 17:29, Christian K?nig schreef:
>>>>>>>>>>> +    /* did fence get signaled
after we enabled the sw irq? */
>>>>>>>>>>> +    if 
>>>>>>>>>>>
(atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >=
>>>>>>>>>>> fence->seq) {
>>>>>>>>>>> +
radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>>>>>>>>>>> +        return false;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    fence->fence_wake.flags =
0;
>>>>>>>>>>> +    fence->fence_wake.private =
NULL;
>>>>>>>>>>> +    fence->fence_wake.func =
radeon_fence_check_signaled;
>>>>>>>>>>> +
__add_wait_queue(&fence->rdev->fence_queue,
>>>>>>>>>>> &fence->fence_wake);
>>>>>>>>>>> +    fence_get(f);
>>>>>>>>>> That looks like a race condition to me.
The fence needs to be
>>>>>>>>>> added to the wait queue before the
check, not after.
>>>>>>>>>>
>>>>>>>>>> Apart from that the whole approach
looks like a really bad
>>>>>>>>>> idea to me. How for example is lockup
detection supposed to
>>>>>>>>>> happen with this? 
>>>>>>>>> It's not a race condition because
fence_queue.lock is held
>>>>>>>>> when this function is called.
>>>>>>>> Ah, I see. That's also the reason why you
moved the wake_up_all
>>>>>>>> out of the processing function.
>>>>>>> Correct. :-)
>>>>>>>>> Lockup's a bit of a weird problem, the
changes wouldn't allow
>>>>>>>>> core ttm code to handle the lockup any
more,
>>>>>>>>> but any driver specific wait code would
still handle this. I
>>>>>>>>> did this by design, because in future
patches the wait
>>>>>>>>> function may be called from outside of the
radeon driver. The
>>>>>>>>> official wait function takes a timeout
parameter,
>>>>>>>>> so lockups wouldn't be fatal if the
timeout is set to
>>>>>>>>> something like 30*HZ for example, it would
still return
>>>>>>>>> and report that the function timed out.
>>>>>>>> Timeouts help with the detection of the lockup,
but not at all
>>>>>>>> with the handling of them.
>>>>>>>>
>>>>>>>> What we essentially need is a wait callback
into the driver
>>>>>>>> that is called in non atomic context without
any locks held.
>>>>>>>>
>>>>>>>> This way we can block for the fence to become
signaled with a
>>>>>>>> timeout and can then also initiate the reset
handling if
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> The way you designed the interface now means
that the driver
>>>>>>>> never gets a chance to wait for the hardware to
become idle and
>>>>>>>> so never has the opportunity to the reset the
whole thing.
>>>>>>> You could set up a hangcheck timer like intel does,
and end up
>>>>>>> with a reliable hangcheck detection that
doesn't depend on cpu
>>>>>>> waits. :-) Or override the default wait function
and restore the
>>>>>>> old behavior.
>>>>>>
>>>>>> Overriding the default wait function sounds better,
please
>>>>>> implement it this way.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian. 
>>>>>
>>>>> Does this modification look sane?
>>>> Adding the timeout is on my todo list for quite some time as
well,
>>>> so this part makes sense.
>>>>
>>>>> +static long __radeon_fence_wait(struct fence *f, bool
intr, long
>>>>> timeout)
>>>>> +{
>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>> +    u64 target_seq[RADEON_NUM_RINGS] = {};
>>>>> +
>>>>> +    target_seq[fence->ring] = fence->seq;
>>>>> +    return radeon_fence_wait_seq_timeout(fence->rdev,
target_seq,
>>>>> intr, timeout);
>>>>> +}
>>>> When this call is comming from outside the radeon driver you
need
>>>> to lock rdev->exclusive_lock here to make sure not to
interfere
>>>> with a possible reset.
>>> Ah thanks, I'll add that.
>>>
>>>>>      .get_timeline_name = radeon_fence_get_timeline_name,
>>>>>      .enable_signaling = radeon_fence_enable_signaling,
>>>>>      .signaled = __radeon_fence_signaled,
>>>> Do we still need those callback when we implemented the wait
callback?
>>> .get_timeline_name is used for debugging (trace events).
>>> .signaled is the non-blocking call to check if the fence is
signaled
>>> or not.
>>> .enable_signaling is used for adding callbacks upon fence 
>>> completion, the default 'fence_default_wait' uses it, so
>>> when it works no separate implementation is needed unless you want 
>>> to do more than just waiting.
>>> It's also used when fence_add_callback is called. i915 can be 
>>> patched to use it. ;-)
>>
>> I just meant enable_signaling, the other ones are fine with me. The 
>> problem with enable_signaling is that it's called with a spin lock 
>> held, so we can't sleep.
>>
>> While resetting the GPU could be moved out into a timer the problem 
>> here is that I can't lock rdev->exclusive_lock in such
situations.
>>
>> This means when i915 would call into radeon to enable signaling for a 
>> fence we can't make sure that there is not GPU reset running on 
>> another CPU. And touching the IRQ registers while a reset is going on 
>> is a really good recipe to lockup the whole system.
> If you increase the irq counter on all rings before doing a gpu reset, 
> adjust the state and call sw_irq_put when done this race could never 
> happen. Or am I missing something?
>Beside that's being extremely ugly in the case of a hard PCI reset even 
touching any register or just accessing VRAM in this moment can crash 
the box. Just working around the enable/disable of the interrupt here 
won't help us much.

Adding another spin lock won't work so well either, because the reset 
function itself wants to sleep as well.

The only solution I see off hand is making the critical reset code path 
work in atomic context as well, but that's not really doable cause AFAIK 
we need to work with functions from the PCI subsystem and spinning on a 
lock for up to a second is not really desirable also.

Christian.
> ~Maarten
>

Apparently Analagous Threads

Search for more seemingly similar threads

Nouveau - May 2014 - [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

[Nouveau] [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

Apparently Analagous Threads