skeggsb at gmail.com
2009-Aug-21 09:56 UTC
[Nouveau] [PATCH] drm/nouveau: rewrite nouveau_dma_wait()
From: Ben Skeggs <bskeggs at redhat.com> There was at least one situation we could get into in the old code where we'd end up overrunning the push buffer with the jumps back to the start of the buffer in an attempt to get more space. In addition, the new code prevents misdetecting a GPU hang by resetting the timeout counter if we see the GPU GET pointer advancing, and simplifies the handling of a confusing corner-case. There's also a significant amount of commenting added to the code, as some of the interactions are quite complex and hard to grasp on first looking at the code. --- drivers/gpu/drm/nouveau/nouveau_dma.c | 114 +++++++++++++++++++++------------ 1 files changed, 72 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c b/drivers/gpu/drm/nouveau/nouveau_dma.c index e1a0adb..8930420 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dma.c +++ b/drivers/gpu/drm/nouveau/nouveau_dma.c @@ -113,8 +113,13 @@ READ_GET(struct nouveau_channel *chan, uint32_t *get) val = nvchan_rd32(chan->user_get); if (val < chan->pushbuf_base || - val >= chan->pushbuf_base + chan->pushbuf_bo->bo.mem.size) + val >= chan->pushbuf_base + chan->pushbuf_bo->bo.mem.size) { + /* meaningless to dma_wait() except to know whether the + * GPU has stalled or not + */ + *get = val; return false; + } *get = (val - chan->pushbuf_base) >> 2; return true; @@ -123,54 +128,79 @@ READ_GET(struct nouveau_channel *chan, uint32_t *get) int nouveau_dma_wait(struct nouveau_channel *chan, int size) { - const int us_timeout = 100000; - uint32_t get; - int ret = -EBUSY, i; + uint32_t get, prev_get = 0, cnt = 0; + bool get_valid; + + while (chan->dma.free < size) { + /* reset counter as long as GET is still advancing, this is + * to avoid misdetecting a GPU lockup if the GPU happens to + * just be processing an operation that takes a long time + */ + get_valid = READ_GET(chan, &get); + if (get != prev_get) { + prev_get = get; + cnt = 0; + } - for (i = 0; i < us_timeout; i++) { - if (!READ_GET(chan, &get)) { + if ((++cnt & 0xff) == 0) { DRM_UDELAY(1); - continue; + if (cnt > 10000) + return -EBUSY; } - if (chan->dma.put >= get) { - chan->dma.free = chan->dma.max - chan->dma.cur; - - if (chan->dma.free < size) { - OUT_RING(chan, 0x20000000|chan->pushbuf_base); - if (get <= NOUVEAU_DMA_SKIPS) { - /*corner case - will be idle*/ - if (chan->dma.put <= NOUVEAU_DMA_SKIPS) - WRITE_PUT(NOUVEAU_DMA_SKIPS + 1); - - for (; i < us_timeout; i++) { - if (READ_GET(chan, &get) && - get > NOUVEAU_DMA_SKIPS) - break; - - DRM_UDELAY(1); - } - - if (i >= us_timeout) - break; - } - - WRITE_PUT(NOUVEAU_DMA_SKIPS); - chan->dma.cur - chan->dma.put = NOUVEAU_DMA_SKIPS; - chan->dma.free = get - (NOUVEAU_DMA_SKIPS + 1); - } - } else { - chan->dma.free = get - chan->dma.cur - 1; - } + /* loop until we have a usable GET pointer. the value + * we read from the GPU may be outside the main ring if + * PFIFO is processing a buffer called from the main ring, + * discard these values until something sensible is seen. + * + * the other case we discard GET is while the GPU is fetching + * from the SKIPS area, so the code below doesn't have to deal + * with some fun corner cases. + */ + if (!get_valid || get < NOUVEAU_DMA_SKIPS) + continue; - if (chan->dma.free >= size) { - ret = 0; - break; + if (get <= chan->dma.cur) { + /* engine is fetching behind us, or is completely + * idle (GET == PUT) so we have free space up until + * the end of the push buffer + * + * we can only hit that path once per call due to + * looping back to the beginning of the push buffer, + * we'll hit the fetching-ahead-of-us path from that + * point on. + * + * the *one* exception to that rule is if we read + * GET==PUT, in which case the below conditional will + * always succeed and break us out of the wait loop. + */ + chan->dma.free = chan->dma.max - chan->dma.cur; + if (chan->dma.free >= size) + break; + + /* not enough space left at the end of the push buffer, + * instruct the GPU to jump back to the start right + * after processing the currently pending commands. + */ + OUT_RING (chan, chan->pushbuf_base | 0x20000000); + WRITE_PUT (NOUVEAU_DMA_SKIPS); + + /* we're now submitting commands at the start of + * the push buffer. + */ + chan->dma.cur + chan->dma.put = NOUVEAU_DMA_SKIPS; } - DRM_UDELAY(1); + /* engine fetching ahead of us, we have space up until the + * current GET pointer. the "- 1" is to ensure there's + * space left to emit a jump back to the beginning of the + * push buffer if we require it. we can never get GET == PUT + * here, so this is safe. + */ + chan->dma.free = get - chan->dma.cur - 1; } - return ret; + return 0; } + -- 1.6.4
Maarten Maathuis
2009-Aug-21 19:30 UTC
[Nouveau] [PATCH] drm/nouveau: rewrite nouveau_dma_wait()
The timeout is too short, the gallium driver will easily trigger an assert. Maarten.
Ben Skeggs
2009-Aug-21 23:26 UTC
[Nouveau] [PATCH] drm/nouveau: rewrite nouveau_dma_wait()
On Fri, 2009-08-21 at 21:30 +0200, Maarten Maathuis wrote:> The timeout is too short, the gallium driver will easily trigger an assert.Oops, that was a typo. Thanks, fixed. Any other comments?> > Maarten.
Pekka Paalanen
2009-Aug-22 10:25 UTC
[Nouveau] [PATCH] drm/nouveau: rewrite nouveau_dma_wait()
On Fri, 21 Aug 2009 19:56:51 +1000 skeggsb at gmail.com wrote:> From: Ben Skeggs <bskeggs at redhat.com> > > There was at least one situation we could get into in the old code where > we'd end up overrunning the push buffer with the jumps back to the start > of the buffer in an attempt to get more space. > > In addition, the new code prevents misdetecting a GPU hang by resetting > the timeout counter if we see the GPU GET pointer advancing, and > simplifies the handling of a confusing corner-case. > > There's also a significant amount of commenting added to the code, as some > of the interactions are quite complex and hard to grasp on first looking > at the code.Excellent, now even I may understand it. :-)> --- > drivers/gpu/drm/nouveau/nouveau_dma.c | 114 +++++++++++++++++++++------------ > 1 files changed, 72 insertions(+), 42 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c b/drivers/gpu/drm/nouveau/nouveau_dma.c > index e1a0adb..8930420 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dma.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dma.c > @@ -113,8 +113,13 @@ READ_GET(struct nouveau_channel *chan, uint32_t *get) > > val = nvchan_rd32(chan->user_get); > if (val < chan->pushbuf_base || > - val >= chan->pushbuf_base + chan->pushbuf_bo->bo.mem.size) > + val >= chan->pushbuf_base + chan->pushbuf_bo->bo.mem.size) { > + /* meaningless to dma_wait() except to know whether the > + * GPU has stalled or not > + */ > + *get = val;*get = val? Why not *get = (val - chan->pushbuf_base) >> 2 since it is used for comparing with an old value in the caller? Not that it likely matters much, but it just strikes as odd.> return false; > + } > > *get = (val - chan->pushbuf_base) >> 2; > return true; > @@ -123,54 +128,79 @@ READ_GET(struct nouveau_channel *chan, uint32_t *get) > int > nouveau_dma_wait(struct nouveau_channel *chan, int size) > { > - const int us_timeout = 100000; > - uint32_t get; > - int ret = -EBUSY, i; > + uint32_t get, prev_get = 0, cnt = 0; > + bool get_valid; > + > + while (chan->dma.free < size) { > + /* reset counter as long as GET is still advancing, this is > + * to avoid misdetecting a GPU lockup if the GPU happens to > + * just be processing an operation that takes a long time > + */ > + get_valid = READ_GET(chan, &get); > + if (get != prev_get) { > + prev_get = get; > + cnt = 0; > + } > > - for (i = 0; i < us_timeout; i++) { > - if (!READ_GET(chan, &get)) { > + if ((++cnt & 0xff) == 0) { > DRM_UDELAY(1); > - continue; > + if (cnt > 10000) > + return -EBUSY; > }DRM_UDELAY is a busy-wait. Since we are by default busy-waiting anyway, why bother with udelay? Is it to give the PCI bus a rest once in a while? How about adding a third level of waiting: a sleeping wait. Something like: if (++cnt > CNT_MAX) return -EBUSY; if (cnt & 0xfff == 0) msleep(5); else if (cnt & 0xf == 0) udelay(1); In pseudocode: repeat until timeout: - repeat 256 times: * repeat 16 times READ_GET * 1 us busywait - at least 5 ms sleep (schedules) The total expected timeout is roughly cnt * READ_GET + (cnt / 16) * 1us + (cnt / 4096) * 5ms In other words: CNT_MAX = TIMEOUT / (READ_GET + 1us / 16 + 5000us / 4096) Is 1 us busywait enough considering PCI latency? Might not. Also the 5 ms might be too long. We probably have to adjust those based on e.g. EXA and OpenGL benchmarks. The big question here is, do we need nouveau_dma_wait() in a context where sleeping is not allowed? We probably want might_sleep() in the beginning of nouveau_dma_wait(). (See include/linux/kernel.h) <clip>> + /* loop until we have a usable GET pointer. the value > + * we read from the GPU may be outside the main ring if > + * PFIFO is processing a buffer called from the main ring, > + * discard these values until something sensible is seen. > + * > + * the other case we discard GET is while the GPU is fetching > + * from the SKIPS area, so the code below doesn't have to deal > + * with some fun corner cases. > + */ > + if (!get_valid || get < NOUVEAU_DMA_SKIPS) > + continue; > > - if (chan->dma.free >= size) { > - ret = 0; > - break; > + if (get <= chan->dma.cur) { > + /* engine is fetching behind us, or is completely > + * idle (GET == PUT) so we have free space up until > + * the end of the push buffer > + * > + * we can only hit that path once per call due to > + * looping back to the beginning of the push buffer, > + * we'll hit the fetching-ahead-of-us path from that > + * point on. > + * > + * the *one* exception to that rule is if we read > + * GET==PUT, in which case the below conditional will > + * always succeed and break us out of the wait loop. > + */ > + chan->dma.free = chan->dma.max - chan->dma.cur; > + if (chan->dma.free >= size) > + break;If dma.free == size, doesn't the jump command emitted on the next call here get written past the end of the buffer? dma.cur will equal dma.max.> + > + /* not enough space left at the end of the push buffer, > + * instruct the GPU to jump back to the start right > + * after processing the currently pending commands. > + */ > + OUT_RING (chan, chan->pushbuf_base | 0x20000000); > + WRITE_PUT (NOUVEAU_DMA_SKIPS); > + > + /* we're now submitting commands at the start of > + * the push buffer. > + */ > + chan->dma.cur > + chan->dma.put = NOUVEAU_DMA_SKIPS; > } > > - DRM_UDELAY(1); > + /* engine fetching ahead of us, we have space up until the > + * current GET pointer. the "- 1" is to ensure there's > + * space left to emit a jump back to the beginning of the > + * push buffer if we require it. we can never get GET == PUT > + * here, so this is safe. > + */ > + chan->dma.free = get - chan->dma.cur - 1; > } > > - return ret; > + return 0; > } > + > -- > 1.6.4-- Pekka Paalanen http://www.iki.fi/pq/
Apparently Analagous Threads
- [PATCH 1/4] drm/nouveau: refactor nouveau_dma_wait()
- [PATCH drm-misc-next v2] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly
- [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly
- [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly
- [PATCH 1/3] drm/nouveau: change channel regs mapping to ioremap