Christian König
2014-Jul-23 12:36 UTC
[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences
Am 23.07.2014 12:52, schrieb Daniel Vetter:> On Wed, Jul 23, 2014 at 12:13 PM, Christian K?nig > <christian.koenig at amd.com> wrote: >>> And the dma-buf would still have fences belonging to both drivers, and it >>> would still call from outside the driver. >> >> Calling from outside the driver is fine as long as the driver can do >> everything necessary to complete it's work and isn't forced into any ugly >> hacks and things that are not 100% reliable. >> >> So I don't see much other approach as integrating recovery code for not >> firing interrupts and some kind of lockup handling into the fence code as >> well. > That approach doesn't really work at that well since every driver has > it's own reset semantics. And we're trying to move away from global > reset to fine-grained reset. So stop-the-world reset is out of > fashion, at least for i915. As you said, reset is normal in gpus and > we're trying to make reset less invasive. I really don't see a point > in imposing a reset scheme upon all drivers and I think you have about > as much motivation to convert radeon to the scheme used by i915 as > I'll have for converting to the one used by radeon. If it would fit at > all.Oh my! No, I didn't wanted to suggest any global reset infrastructure. My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver. This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling. Christian.> I guess for radeon we just have to add tons of insulation by punting > all callbacks to work items so that radeon can do whatever it wants. > Plus start a delayed_work based fallback when ->enable_signalling is > called to make sure we work on platforms that lack interrupts. > -Daniel
Daniel Vetter
2014-Jul-23 12:42 UTC
[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences
On Wed, Jul 23, 2014 at 2:36 PM, Christian K?nig <christian.koenig at amd.com> wrote:> My idea was more that the fence framework provides a > fence->process_signaling callback that is periodically called after > enable_signaling is called to trigger manual signal processing in the > driver. > > This would both be suitable as a fallback in case of not working interrupts > as well as a chance for any driver to do necessary lockup handling.Imo that should be an implementation detail of the fence provider. So in ->enable_signaling radeon needs to arm a delayed work to regularly check fence and signal them if the irq failed. If it's a common need we might provide some shared code for this (e.g. a struct unreliable_fence or so). But this shouldn't be mandatory since not all gpus are broken like that. And if we force other drivers to care for this special case that imo leaks the abstraction out of radeon (or any other driver with unreliable interrupts). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Maarten Lankhorst
2014-Jul-23 13:16 UTC
[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences
op 23-07-14 14:36, Christian K?nig schreef:> Am 23.07.2014 12:52, schrieb Daniel Vetter: >> On Wed, Jul 23, 2014 at 12:13 PM, Christian K?nig >> <christian.koenig at amd.com> wrote: >>>> And the dma-buf would still have fences belonging to both drivers, and it >>>> would still call from outside the driver. >>> >>> Calling from outside the driver is fine as long as the driver can do >>> everything necessary to complete it's work and isn't forced into any ugly >>> hacks and things that are not 100% reliable. >>> >>> So I don't see much other approach as integrating recovery code for not >>> firing interrupts and some kind of lockup handling into the fence code as >>> well. >> That approach doesn't really work at that well since every driver has >> it's own reset semantics. And we're trying to move away from global >> reset to fine-grained reset. So stop-the-world reset is out of >> fashion, at least for i915. As you said, reset is normal in gpus and >> we're trying to make reset less invasive. I really don't see a point >> in imposing a reset scheme upon all drivers and I think you have about >> as much motivation to convert radeon to the scheme used by i915 as >> I'll have for converting to the one used by radeon. If it would fit at >> all. > Oh my! No, I didn't wanted to suggest any global reset infrastructure. > > My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver. > > This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling.I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change.. diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7fbfd41479f1..51b646b9c8bb 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -345,6 +345,9 @@ struct radeon_fence_driver { uint64_t sync_seq[RADEON_NUM_RINGS]; atomic64_t last_seq; bool initialized; + struct delayed_work work; + struct radeon_device *rdev; + unsigned ring; }; struct radeon_fence_cb { diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index da83f36dd708..955c825946ad 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring) } } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); + if (!wake && last_seq < last_emitted) + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10)); + return wake; } @@ -815,6 +818,14 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) return 0; } +static void radeon_fence_driver_recheck(struct work_struct *work) +{ + struct radeon_fence_driver *drv = container_of(work, struct radeon_fence_driver, work.work); + + DRM_ERROR("omg, working!\n"); + radeon_fence_process(drv->rdev, drv->ring); +} + /** * radeon_fence_driver_init_ring - init the fence driver * for the requested ring. @@ -836,6 +847,10 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring) rdev->fence_drv[ring].sync_seq[i] = 0; atomic64_set(&rdev->fence_drv[ring].last_seq, 0); rdev->fence_drv[ring].initialized = false; + + rdev->fence_drv[ring].ring = ring; + rdev->fence_drv[ring].rdev = rdev; + INIT_DELAYED_WORK(&rdev->fence_drv[ring].work, radeon_fence_driver_recheck); } /** @@ -880,6 +895,7 @@ void radeon_fence_driver_fini(struct radeon_device *rdev) for (ring = 0; ring < RADEON_NUM_RINGS; ring++) { if (!rdev->fence_drv[ring].initialized) continue; + cancel_delayed_work_sync(&rdev->fence_drv[ring].work); r = radeon_fence_wait_empty(rdev, ring); if (r) { /* no need to trigger GPU reset as we are unloading */ diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c index 16807afab362..85391ddd3ce9 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -331,7 +331,7 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring) { unsigned long irqflags; - if (!rdev->ddev->irq_enabled) +// if (!rdev->ddev->irq_enabled) return; if (atomic_inc_return(&rdev->irq.ring_int[ring]) == 1) { @@ -355,7 +355,7 @@ void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring) { unsigned long irqflags; - if (!rdev->ddev->irq_enabled) +// if (!rdev->ddev->irq_enabled) return; if (atomic_dec_and_test(&rdev->irq.ring_int[ring])) {
Maarten Lankhorst
2014-Jul-23 14:05 UTC
[Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences
op 23-07-14 15:16, Maarten Lankhorst schreef:> op 23-07-14 14:36, Christian K?nig schreef: >> Am 23.07.2014 12:52, schrieb Daniel Vetter: >>> On Wed, Jul 23, 2014 at 12:13 PM, Christian K?nig >>> <christian.koenig at amd.com> wrote: >>>>> And the dma-buf would still have fences belonging to both drivers, and it >>>>> would still call from outside the driver. >>>> Calling from outside the driver is fine as long as the driver can do >>>> everything necessary to complete it's work and isn't forced into any ugly >>>> hacks and things that are not 100% reliable. >>>> >>>> So I don't see much other approach as integrating recovery code for not >>>> firing interrupts and some kind of lockup handling into the fence code as >>>> well. >>> That approach doesn't really work at that well since every driver has >>> it's own reset semantics. And we're trying to move away from global >>> reset to fine-grained reset. So stop-the-world reset is out of >>> fashion, at least for i915. As you said, reset is normal in gpus and >>> we're trying to make reset less invasive. I really don't see a point >>> in imposing a reset scheme upon all drivers and I think you have about >>> as much motivation to convert radeon to the scheme used by i915 as >>> I'll have for converting to the one used by radeon. If it would fit at >>> all. >> Oh my! No, I didn't wanted to suggest any global reset infrastructure. >> >> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver. >> >> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling. > I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change.. > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index 7fbfd41479f1..51b646b9c8bb 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -345,6 +345,9 @@ struct radeon_fence_driver { > uint64_t sync_seq[RADEON_NUM_RINGS]; > atomic64_t last_seq; > bool initialized; > + struct delayed_work work; > + struct radeon_device *rdev; > + unsigned ring; > }; > > struct radeon_fence_cb { > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c > index da83f36dd708..955c825946ad 100644 > --- a/drivers/gpu/drm/radeon/radeon_fence.c > +++ b/drivers/gpu/drm/radeon/radeon_fence.c > @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring) > } > } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); > > + if (!wake && last_seq < last_emitted) > + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10)); > + >When trying this: if (seq < last_emitted) is probably a better check. ~Maarten
Reasonably Related Threads
- [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2
- [PATCH 09/17] drm/radeon: use common fence implementation for fences
- [PATCH v2 09/17] drm/radeon: use common fence implementation for fences
- [RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences
- [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2