Christian König
2014-Aug-04 15:04 UTC
[Nouveau] [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2
Am 04.08.2014 um 16:58 schrieb Maarten Lankhorst:> op 04-08-14 16:45, Christian K?nig schreef: >> Am 04.08.2014 um 16:40 schrieb Maarten Lankhorst: >>> op 04-08-14 16:37, Christian K?nig schreef: >>>>> It'a pain to deal with gpu reset. >>>> Yeah, well that's nothing new. >>>> >>>>> I've now tried other solutions but that would mean reverting to the old style during gpu lockup recovery, and only running the delayed work when !lockup. >>>>> But this meant that the timeout was useless to add. I think the cleanest is keeping the v2 patch, because potentially any waiting code can be called during lockup recovery. >>>> The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. >>>> >>>> How about moving the fence waiting out of the reset code? >>> What cases did I miss then? >>> >>> I'm curious how you want to move the fence waiting out of reset, when there are so many places that could potentially wait, like radeon_ib_get can call radeon_sa_bo_new which can do a wait, or radeon_ring_alloc that can wait on radeon_fence_wait_next, etc. >> The IB test itself doesn't needs to be protected by the exclusive lock. Only everything between radeon_save_bios_scratch_regs and radeon_ring_restore. > I'm not sure about that, what do you want to do if the ring tests fail? Do you have to retake the exclusive lock?Just set need_reset again and return -EAGAIN, that should have mostly the same effect as what we are doing right now. Christian.> > ~Maarten >
Maarten Lankhorst
2014-Aug-04 15:09 UTC
[Nouveau] [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2
op 04-08-14 17:04, Christian K?nig schreef:> Am 04.08.2014 um 16:58 schrieb Maarten Lankhorst: >> op 04-08-14 16:45, Christian K?nig schreef: >>> Am 04.08.2014 um 16:40 schrieb Maarten Lankhorst: >>>> op 04-08-14 16:37, Christian K?nig schreef: >>>>>> It'a pain to deal with gpu reset. >>>>> Yeah, well that's nothing new. >>>>> >>>>>> I've now tried other solutions but that would mean reverting to the old style during gpu lockup recovery, and only running the delayed work when !lockup. >>>>>> But this meant that the timeout was useless to add. I think the cleanest is keeping the v2 patch, because potentially any waiting code can be called during lockup recovery. >>>>> The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. >>>>> >>>>> How about moving the fence waiting out of the reset code? >>>> What cases did I miss then? >>>> >>>> I'm curious how you want to move the fence waiting out of reset, when there are so many places that could potentially wait, like radeon_ib_get can call radeon_sa_bo_new which can do a wait, or radeon_ring_alloc that can wait on radeon_fence_wait_next, etc. >>> The IB test itself doesn't needs to be protected by the exclusive lock. Only everything between radeon_save_bios_scratch_regs and radeon_ring_restore. >> I'm not sure about that, what do you want to do if the ring tests fail? Do you have to retake the exclusive lock? > > Just set need_reset again and return -EAGAIN, that should have mostly the same effect as what we are doing right now.Yeah, except for the locking the ttm delayed workqueue, but that bool should be easy to save/restore. I think this could work. ~Maarten
Christian König
2014-Aug-04 17:04 UTC
[Nouveau] [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2
Am 04.08.2014 um 17:09 schrieb Maarten Lankhorst:> op 04-08-14 17:04, Christian K?nig schreef: >> Am 04.08.2014 um 16:58 schrieb Maarten Lankhorst: >>> op 04-08-14 16:45, Christian K?nig schreef: >>>> Am 04.08.2014 um 16:40 schrieb Maarten Lankhorst: >>>>> op 04-08-14 16:37, Christian K?nig schreef: >>>>>>> It'a pain to deal with gpu reset. >>>>>> Yeah, well that's nothing new. >>>>>> >>>>>>> I've now tried other solutions but that would mean reverting to the old style during gpu lockup recovery, and only running the delayed work when !lockup. >>>>>>> But this meant that the timeout was useless to add. I think the cleanest is keeping the v2 patch, because potentially any waiting code can be called during lockup recovery. >>>>>> The lockup code itself should never call any waiting code and V2 doesn't seem to handle a couple of cases correctly either. >>>>>> >>>>>> How about moving the fence waiting out of the reset code? >>>>> What cases did I miss then? >>>>> >>>>> I'm curious how you want to move the fence waiting out of reset, when there are so many places that could potentially wait, like radeon_ib_get can call radeon_sa_bo_new which can do a wait, or radeon_ring_alloc that can wait on radeon_fence_wait_next, etc. >>>> The IB test itself doesn't needs to be protected by the exclusive lock. Only everything between radeon_save_bios_scratch_regs and radeon_ring_restore. >>> I'm not sure about that, what do you want to do if the ring tests fail? Do you have to retake the exclusive lock? >> Just set need_reset again and return -EAGAIN, that should have mostly the same effect as what we are doing right now. > Yeah, except for the locking the ttm delayed workqueue, but that bool should be easy to save/restore. > I think this could work.Actually you could activate the delayed workqueue much earlier as well. Thinking more about it that sounds like a bug in the current code, because we probably want the workqueue activated before waiting for the fence. Christian.> > ~Maarten >
Apparently Analagous Threads
- [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
- [RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences
- [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences
- [PATCH 09/17] drm/radeon: use common fence implementation for fences
- [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences