Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 0/5] More explicit pushbuf error handling
Hi there, These patches work as a starting point for more explicit error mechanisms and better robustness. At the moment, when a job hangs or faults, it seems that nouveau doesn't quite know how to handle the situation and often results in a hang. Some of these situations would require either completely resetting the gpu, or a more complex path for only recovering the faulted channel. To start, I worked on support for letting userspace know what exactly happened. Proper recovery would come later. The "error notifier" in the first patch is a simple shared buffer between kernel and userspace. Its error codes match nvgpu's. Alternatively, the status could be queried with an ioctl, but that would be considerably more heavyweight. I'd like to know if the event mechanism is meant for these kinds of events at all (engines notify errors upwards to the drm layer). Another alternative would probably be to register the same buffer to all necessary engines separately in method calls? Or register it to just one (e.g., fifo) and get that engine somehow when errors happen in others? Please comment on this; I wrote this before understanding the mthd mechanism. Additionally, priority and timeout management for separate channels in flight is added in two patches. Neither is exactly what the name says, but the effect should be the same, and this is what nvgpu does currently. Those two patches call the fifo channel object's methods directly from userspace, so a hack is added in the nvif path to accept that. The objects are NEW'd from kernel space, so calling from userspace isn't allowed, as it appears. How should this be handled? Also, since nouveau often hangs on errors, the userspace hangs too (waiting on a fence). The final patch attempts to fix this in a couple of specific error paths to forcibly update all fences to be finished. I'd like to hear how that would be handled properly - consider the patch just a proof-of-concept and sample of what would be necessary, supporting this question of mine again. I don't expect the patches to be accepted as-is - as a newbie, I'd appreciate any high-level comments on if I've understood anything, especially the event and nvif/method mechanisms (I use the latter from userspace with a hack constructed from the perfmon branch seen here earlier into nvidia's internal libdrm-equivalent). The fence-forcing thing is something that is necessary with the error notifiers (at least with our userspace that waits really long or infinitely on fences). I'm working specifically on Tegra and don't know much about the desktop's userspace details, so I may be biased in some areas. I'd be happy to write sample tests on e.g. libdrm for the new methods once the kernel patches would get to a good shape, if that's required for accepting new features. I tested these to work as a proof-of-concept on Jetson TK1, and the code is adapted from the latest nvgpu. The patches can also be found in http://github.com/sooda/nouveau and are based on a version of gnurou/staging. Thanks! Konsta (sooda in IRC) Konsta Hölttä (5): notify channel errors to userspace HACK don't verify route == owner in nvkm ioctl gk104: channel priority/timeslice support channel timeout using repeated sched timeouts force fences updated in error conditions drm/nouveau/include/nvif/class.h | 20 +++++ drm/nouveau/include/nvif/event.h | 12 +++ drm/nouveau/include/nvkm/engine/fifo.h | 5 +- drm/nouveau/nouveau_chan.c | 54 ++++++++++++ drm/nouveau/nouveau_chan.h | 5 ++ drm/nouveau/nouveau_drm.c | 1 + drm/nouveau/nouveau_fence.c | 13 +-- drm/nouveau/nouveau_gem.c | 50 +++++++++++ drm/nouveau/nouveau_gem.h | 2 + drm/nouveau/nvkm/core/ioctl.c | 5 +- drm/nouveau/nvkm/engine/fifo/base.c | 57 ++++++++++++- drm/nouveau/nvkm/engine/fifo/gf100.c | 2 +- drm/nouveau/nvkm/engine/fifo/gk104.c | 150 +++++++++++++++++++++++++++++++-- drm/nouveau/nvkm/engine/fifo/nv04.c | 2 +- drm/nouveau/nvkm/engine/gr/gf100.c | 4 + drm/nouveau/uapi/drm/nouveau_drm.h | 12 +++ 16 files changed, 374 insertions(+), 20 deletions(-) -- 2.1.4
Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 1/5] notify channel errors to userspace
Let userspace know about some specific types of errors via a shared buffer obj registered with a new ioctl, NOUVEAU_GEM_SET_ERROR_NOTIFIER. Once set, the notifier buffer is used until the channel is destroyed. The exceptional per-channel error conditions are signaled upwards from the nvkm layer via the event/notify mechanism and written to the buffer which holds the latest error occurred. Some error situations make a channel completely stuck, which may require discarding and reinitializing it completely, which is not yet completely supported. Once a specific type of error happens, no others are expected until recovery. Passing the error to userspace in a shared buffer enables the graphics driver to periodically check in a light-weight way if anything went wrong (typically, after a submit); e.g., the GL robustness extension requires to detect how the context has dead. The notifier currently signals idle timeout, sw notify, mmu fault and illegal pbdma error conditions. Signed-off-by: Konsta Hölttä <kholtta at nvidia.com> --- drm/nouveau/include/nvif/class.h | 2 ++ drm/nouveau/include/nvif/event.h | 11 +++++++ drm/nouveau/include/nvkm/engine/fifo.h | 3 ++ drm/nouveau/nouveau_chan.c | 54 ++++++++++++++++++++++++++++++++++ drm/nouveau/nouveau_chan.h | 5 ++++ drm/nouveau/nouveau_drm.c | 1 + drm/nouveau/nouveau_gem.c | 50 +++++++++++++++++++++++++++++++ drm/nouveau/nouveau_gem.h | 2 ++ drm/nouveau/nvkm/engine/fifo/base.c | 54 ++++++++++++++++++++++++++++++++++ drm/nouveau/nvkm/engine/fifo/gk104.c | 13 ++++++++ drm/nouveau/nvkm/engine/gr/gf100.c | 4 +++ drm/nouveau/uapi/drm/nouveau_drm.h | 12 ++++++++ 12 files changed, 211 insertions(+) diff --git a/drm/nouveau/include/nvif/class.h b/drm/nouveau/include/nvif/class.h index d90e207..28176e6 100644 --- a/drm/nouveau/include/nvif/class.h +++ b/drm/nouveau/include/nvif/class.h @@ -410,16 +410,18 @@ struct kepler_channel_gpfifo_a_v0 { __u8 engine; __u16 chid; __u8 pad04[4]; __u32 pushbuf; __u32 ilength; __u64 ioffset; }; +#define CHANNEL_GPFIFO_ERROR_NOTIFIER_EEVENT 0x01 + /******************************************************************************* * legacy display ******************************************************************************/ #define NV04_DISP_NTFY_VBLANK 0x00 #define NV04_DISP_NTFY_CONN 0x01 struct nv04_disp_mthd_v0 { diff --git a/drm/nouveau/include/nvif/event.h b/drm/nouveau/include/nvif/event.h index 2176449..d148b85 100644 --- a/drm/nouveau/include/nvif/event.h +++ b/drm/nouveau/include/nvif/event.h @@ -54,9 +54,20 @@ struct nvif_notify_conn_rep_v0 { struct nvif_notify_uevent_req { /* nvif_notify_req ... */ }; struct nvif_notify_uevent_rep { /* nvif_notify_rep ... */ }; +struct nvif_notify_eevent_req { + /* nvif_notify_req ... */ + u32 chid; +}; + +struct nvif_notify_eevent_rep { + /* nvif_notify_rep ... */ + u32 error; + u32 chid; +}; + #endif diff --git a/drm/nouveau/include/nvkm/engine/fifo.h b/drm/nouveau/include/nvkm/engine/fifo.h index 9100b80..cbca477 100644 --- a/drm/nouveau/include/nvkm/engine/fifo.h +++ b/drm/nouveau/include/nvkm/engine/fifo.h @@ -67,16 +67,17 @@ struct nvkm_fifo_base { #include <core/engine.h> #include <core/event.h> struct nvkm_fifo { struct nvkm_engine base; struct nvkm_event cevent; /* channel creation event */ struct nvkm_event uevent; /* async user trigger */ + struct nvkm_event eevent; /* error notifier */ struct nvkm_object **channel; spinlock_t lock; u16 min; u16 max; int (*chid)(struct nvkm_fifo *, struct nvkm_object *); void (*pause)(struct nvkm_fifo *, unsigned long *); @@ -118,11 +119,13 @@ extern struct nvkm_oclass *gk20a_fifo_oclass; extern struct nvkm_oclass *gk208_fifo_oclass; extern struct nvkm_oclass *gm204_fifo_oclass; extern struct nvkm_oclass *gm20b_fifo_oclass; int nvkm_fifo_uevent_ctor(struct nvkm_object *, void *, u32, struct nvkm_notify *); void nvkm_fifo_uevent(struct nvkm_fifo *); +void nvkm_fifo_eevent(struct nvkm_fifo *, u32 chid, u32 error); + void nv04_fifo_intr(struct nvkm_subdev *); int nv04_fifo_context_attach(struct nvkm_object *, struct nvkm_object *); #endif diff --git a/drm/nouveau/nouveau_chan.c b/drm/nouveau/nouveau_chan.c index 0589bab..2bcd9ff 100644 --- a/drm/nouveau/nouveau_chan.c +++ b/drm/nouveau/nouveau_chan.c @@ -19,16 +19,18 @@ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Authors: Ben Skeggs */ #include <nvif/os.h> #include <nvif/class.h> +#include <nvif/notify.h> +#include <nvif/event.h> /*XXX*/ #include <core/client.h> #include "nouveau_drm.h" #include "nouveau_dma.h" #include "nouveau_bo.h" #include "nouveau_chan.h" @@ -54,16 +56,40 @@ nouveau_channel_idle(struct nouveau_channel *chan) if (ret) NV_PRINTK(error, cli, "failed to idle channel 0x%08x [%s]\n", chan->object->handle, nvxx_client(&cli->base)->name); return ret; } void +nouveau_channel_set_error_notifier(struct nouveau_channel *chan, u32 error) +{ + struct nouveau_cli *cli = (void *)nvif_client(chan->object); + struct nouveau_bo *nvbo = chan->error_notifier.buffer; + u32 off = chan->error_notifier.offset / sizeof(u32); + struct timespec time_data; + u64 nsec; + + if (!nvbo) + return; + + getnstimeofday(&time_data); + nsec = ((u64)time_data.tv_sec) * 1000000000u + + (u64)time_data.tv_nsec; + + nouveau_bo_wr32(nvbo, off + 0, nsec); + nouveau_bo_wr32(nvbo, off + 1, nsec >> 32); + nouveau_bo_wr32(nvbo, off + 2, error); + nouveau_bo_wr32(nvbo, off + 3, 0xffffffff); + NV_PRINTK(error, cli, "error notifier set to %d for ch %d\n", + error, chan->chid); +} + +void nouveau_channel_del(struct nouveau_channel **pchan) { struct nouveau_channel *chan = *pchan; if (chan) { if (chan->fence) { nouveau_channel_idle(chan); nouveau_fence(chan->drm)->context_del(chan); } @@ -72,16 +98,19 @@ nouveau_channel_del(struct nouveau_channel **pchan) nvif_object_fini(&chan->vram); nvif_object_ref(NULL, &chan->object); nvif_object_fini(&chan->push.ctxdma); nouveau_bo_vma_del(chan->push.buffer, &chan->push.vma); nouveau_bo_unmap(chan->push.buffer); if (chan->push.buffer && chan->push.buffer->pin_refcnt) nouveau_bo_unpin(chan->push.buffer); nouveau_bo_ref(NULL, &chan->push.buffer); + if (chan->error_notifier.buffer) + nouveau_bo_unmap(chan->error_notifier.buffer); + nvif_notify_fini(&chan->error_notifier.notify); nvif_device_ref(NULL, &chan->device); kfree(chan); } *pchan = NULL; } static int nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device, @@ -273,16 +302,27 @@ nouveau_channel_dma(struct nouveau_drm *drm, struct nvif_device *device, } } while (ret && *oclass); nouveau_channel_del(pchan); return ret; } static int +nouveau_chan_eevent_handler(struct nvif_notify *notify) +{ + struct nouveau_channel *chan + container_of(notify, typeof(*chan), error_notifier.notify); + const struct nvif_notify_eevent_rep *rep = notify->data; + + WARN_ON(rep->chid != chan->chid); + nouveau_channel_set_error_notifier(chan, rep->error); + return NVIF_NOTIFY_KEEP; +} +static int nouveau_channel_init(struct nouveau_channel *chan, u32 vram, u32 gart) { struct nvif_device *device = chan->device; struct nouveau_cli *cli = (void *)nvif_client(&device->base); struct nvkm_mmu *mmu = nvxx_mmu(device); struct nvkm_sw_chan *swch; struct nv_dma_v0 args = {}; int ret, i; @@ -381,16 +421,30 @@ nouveau_channel_init(struct nouveau_channel *chan, u32 vram, u32 gart) if (ret) return ret; BEGIN_NV04(chan, NvSubSw, 0x0000, 1); OUT_RING (chan, chan->nvsw.handle); FIRE_RING (chan); } + /* error code events on why we're stuck/broken */ + ret = nvif_notify_init(chan->object, NULL, + nouveau_chan_eevent_handler, false, + CHANNEL_GPFIFO_ERROR_NOTIFIER_EEVENT, + &(struct nvif_notify_eevent_req) { .chid = chan->chid }, + sizeof(struct nvif_notify_eevent_req), + sizeof(struct nvif_notify_eevent_rep), + &chan->error_notifier.notify); + WARN_ON(ret); + if (ret) + return ret; + /* fini() does one put() */ + nvif_notify_get(&chan->error_notifier.notify); + /* initialise synchronisation */ return nouveau_fence(chan->drm)->context_new(chan); } int nouveau_channel_new(struct nouveau_drm *drm, struct nvif_device *device, u32 handle, u32 arg0, u32 arg1, struct nouveau_channel **pchan) diff --git a/drm/nouveau/nouveau_chan.h b/drm/nouveau/nouveau_chan.h index 8b3640f..36f4f62 100644 --- a/drm/nouveau/nouveau_chan.h +++ b/drm/nouveau/nouveau_chan.h @@ -33,16 +33,21 @@ struct nouveau_channel { int ib_free; int ib_put; } dma; u32 user_get_hi; u32 user_get; u32 user_put; struct nvif_object *object; + struct { + struct nouveau_bo *buffer; + u32 offset; + struct nvif_notify notify; + } error_notifier; }; int nouveau_channel_new(struct nouveau_drm *, struct nvif_device *, u32 handle, u32 arg0, u32 arg1, struct nouveau_channel **); void nouveau_channel_del(struct nouveau_channel **); int nouveau_channel_idle(struct nouveau_channel *); diff --git a/drm/nouveau/nouveau_drm.c b/drm/nouveau/nouveau_drm.c index 7d1297e..e3e6441 100644 --- a/drm/nouveau/nouveau_drm.c +++ b/drm/nouveau/nouveau_drm.c @@ -898,16 +898,17 @@ nouveau_ioctls[] = { DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_NEW, nouveau_gem_ioctl_new, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_PUSHBUF, nouveau_gem_ioctl_pushbuf, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), /* Staging ioctls */ DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_SET_TILING, nouveau_gem_ioctl_set_tiling, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_PUSHBUF_2, nouveau_gem_ioctl_pushbuf_2, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_SET_ERROR_NOTIFIER, nouveau_gem_ioctl_set_error_notifier, DRM_UNLOCKED|DRM_AUTH|DRM_RENDER_ALLOW), }; long nouveau_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct drm_file *filp = file->private_data; struct drm_device *dev = filp->minor->dev; long ret; diff --git a/drm/nouveau/nouveau_gem.c b/drm/nouveau/nouveau_gem.c index 884ab83..48abe16 100644 --- a/drm/nouveau/nouveau_gem.c +++ b/drm/nouveau/nouveau_gem.c @@ -1032,16 +1032,66 @@ out_next: (chan->push.vma.offset + ((chan->dma.cur + 2) << 2)); req->suffix1 = 0x00000000; } return nouveau_abi16_put(abi16, ret); } int +nouveau_gem_ioctl_set_error_notifier(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct nouveau_abi16 *abi16 = nouveau_abi16_get(file_priv, dev); + struct drm_nouveau_gem_set_error_notifier *req = data; + struct nouveau_abi16_chan *abi16_ch; + struct nouveau_channel *chan = NULL; + struct drm_gem_object *gem; + struct nouveau_bo *nvbo; + int ret = 0; + + if (unlikely(!abi16)) + return -ENOMEM; + + list_for_each_entry(abi16_ch, &abi16->channels, head) { + if (abi16_ch->chan->object->handle + == (NVDRM_CHAN | req->channel)) { + chan = abi16_ch->chan; + break; + } + } + if (!chan) + return nouveau_abi16_put(abi16, -ENOENT); + + gem = drm_gem_object_lookup(dev, file_priv, req->buffer); + if (!gem) + return nouveau_abi16_put(abi16, -ENOENT); + + nvbo = nouveau_gem_object(gem); + + ret = nouveau_bo_map(nvbo); + if (ret) + goto out; + + /* userspace should keep this buf alive */ + chan->error_notifier.buffer = nvbo; + chan->error_notifier.offset = req->offset; + + /* zero any old data */ + nouveau_bo_wr32(nvbo, req->offset / sizeof(u32) + 0, 0); + nouveau_bo_wr32(nvbo, req->offset / sizeof(u32) + 1, 0); + nouveau_bo_wr32(nvbo, req->offset / sizeof(u32) + 2, 0); + nouveau_bo_wr32(nvbo, req->offset / sizeof(u32) + 3, 0); + +out: + drm_gem_object_unreference_unlocked(gem); + return nouveau_abi16_put(abi16, ret); +} + +int nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_nouveau_gem_cpu_prep *req = data; struct drm_gem_object *gem; struct nouveau_bo *nvbo; bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT); bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE); diff --git a/drm/nouveau/nouveau_gem.h b/drm/nouveau/nouveau_gem.h index 9e4323f..880c99f 100644 --- a/drm/nouveau/nouveau_gem.h +++ b/drm/nouveau/nouveau_gem.h @@ -24,16 +24,18 @@ extern int nouveau_gem_object_open(struct drm_gem_object *, struct drm_file *); extern void nouveau_gem_object_close(struct drm_gem_object *, struct drm_file *); extern int nouveau_gem_ioctl_set_tiling(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_new(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_pushbuf(struct drm_device *, void *, struct drm_file *); +extern int nouveau_gem_ioctl_set_error_notifier(struct drm_device *, void *, + struct drm_file *); extern int nouveau_gem_ioctl_pushbuf_2(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_cpu_prep(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_cpu_fini(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); diff --git a/drm/nouveau/nvkm/engine/fifo/base.c b/drm/nouveau/nvkm/engine/fifo/base.c index fa223f8..df9ee37 100644 --- a/drm/nouveau/nvkm/engine/fifo/base.c +++ b/drm/nouveau/nvkm/engine/fifo/base.c @@ -191,28 +191,77 @@ nvkm_fifo_uevent_ctor(struct nvkm_object *object, void *data, u32 size, void nvkm_fifo_uevent(struct nvkm_fifo *fifo) { struct nvif_notify_uevent_rep rep = { }; nvkm_event_send(&fifo->uevent, 1, 0, &rep, sizeof(rep)); } +static int +nvkm_fifo_eevent_ctor(struct nvkm_object *object, void *data, u32 size, + struct nvkm_notify *notify) +{ + union { + struct nvif_notify_eevent_req req; + } *req = data; + int ret; + + if (nvif_unvers(req->req)) { + notify->size = sizeof(struct nvif_notify_eevent_rep); + notify->types = 1; + notify->index = req->req.chid; + } + + return ret; +} + +static void +gk104_fifo_eevent_init(struct nvkm_event *event, int type, int index) +{ +} + +static void +gk104_fifo_eevent_fini(struct nvkm_event *event, int type, int index) +{ +} + +static const struct nvkm_event_func +nvkm_fifo_eevent_func = { + .ctor = nvkm_fifo_eevent_ctor, + .init = gk104_fifo_eevent_init, + .fini = gk104_fifo_eevent_fini, +}; + +void +nvkm_fifo_eevent(struct nvkm_fifo *fifo, u32 chid, u32 error) +{ + struct nvif_notify_eevent_rep rep = { + .error = error, + .chid = chid + }; + nvkm_event_send(&fifo->eevent, 1, chid, &rep, sizeof(rep)); +} + int _nvkm_fifo_channel_ntfy(struct nvkm_object *object, u32 type, struct nvkm_event **event) { struct nvkm_fifo *fifo = (void *)object->engine; switch (type) { case G82_CHANNEL_DMA_V0_NTFY_UEVENT: if (nv_mclass(object) >= G82_CHANNEL_DMA) { *event = &fifo->uevent; return 0; } break; + case CHANNEL_GPFIFO_ERROR_NOTIFIER_EEVENT: + *event = &fifo->eevent; + return 0; + break; default: break; } return -EINVAL; } static int nvkm_fifo_chid(struct nvkm_fifo *priv, struct nvkm_object *object) @@ -242,16 +291,17 @@ nvkm_client_name_for_fifo_chid(struct nvkm_fifo *fifo, u32 chid) return nvkm_client_name(chan); } void nvkm_fifo_destroy(struct nvkm_fifo *priv) { kfree(priv->channel); + nvkm_event_fini(&priv->eevent); nvkm_event_fini(&priv->uevent); nvkm_event_fini(&priv->cevent); nvkm_engine_destroy(&priv->base); } int nvkm_fifo_create_(struct nvkm_object *parent, struct nvkm_object *engine, struct nvkm_oclass *oclass, @@ -271,12 +321,16 @@ nvkm_fifo_create_(struct nvkm_object *parent, struct nvkm_object *engine, priv->channel = kzalloc(sizeof(*priv->channel) * (max + 1), GFP_KERNEL); if (!priv->channel) return -ENOMEM; ret = nvkm_event_init(&nvkm_fifo_event_func, 1, 1, &priv->cevent); if (ret) return ret; + ret = nvkm_event_init(&nvkm_fifo_eevent_func, 1, max + 1, &priv->eevent); + if (ret) + return ret; + priv->chid = nvkm_fifo_chid; spin_lock_init(&priv->lock); return 0; } diff --git a/drm/nouveau/nvkm/engine/fifo/gk104.c b/drm/nouveau/nvkm/engine/fifo/gk104.c index 52c22b0..659c05f 100644 --- a/drm/nouveau/nvkm/engine/fifo/gk104.c +++ b/drm/nouveau/nvkm/engine/fifo/gk104.c @@ -30,16 +30,18 @@ #include <subdev/bar.h> #include <subdev/fb.h> #include <subdev/mmu.h> #include <subdev/timer.h> #include <nvif/class.h> #include <nvif/unpack.h> +#include "nouveau_drm.h" + #define _(a,b) { (a), ((1ULL << (a)) | (b)) } static const struct { u64 subdev; u64 mask; } fifo_engine[] = { _(NVDEV_ENGINE_GR , (1ULL << NVDEV_ENGINE_SW) | (1ULL << NVDEV_ENGINE_CE2)), _(NVDEV_ENGINE_MSPDEC , 0), @@ -567,16 +569,20 @@ gk104_fifo_intr_sched_ctxsw(struct gk104_fifo_priv *priv) u32 chid = load ? next : prev; (void)save; if (busy && chsw) { if (!(chan = (void *)priv->base.channel[chid])) continue; if (!(engine = gk104_fifo_engine(priv, engn))) continue; + + nvkm_fifo_eevent(&priv->base, chid, + NOUVEAU_GEM_CHANNEL_FIFO_ERROR_IDLE_TIMEOUT); + gk104_fifo_recover(priv, engine, chan); } } } static void gk104_fifo_intr_sched(struct gk104_fifo_priv *priv) { @@ -783,16 +789,19 @@ gk104_fifo_intr_fault(struct gk104_fifo_priv *priv, int unit) ec ? ec->name : ecunk, (u64)inst << 12, nvkm_client_name(engctx)); object = engctx; while (object) { switch (nv_mclass(object)) { case KEPLER_CHANNEL_GPFIFO_A: case MAXWELL_CHANNEL_GPFIFO_A: + nvkm_fifo_eevent(&priv->base, + ((struct nvkm_fifo_chan*)object)->chid, + NOUVEAU_GEM_CHANNEL_FIFO_ERROR_MMU_ERR_FLT); gk104_fifo_recover(priv, engine, (void *)object); break; } object = object->parent; } nvkm_engctx_put(engctx); } @@ -853,16 +862,18 @@ gk104_fifo_intr_pbdma_0(struct gk104_fifo_priv *priv, int unit) nv_error(priv, "PBDMA%d:", unit); nvkm_bitfield_print(gk104_fifo_pbdma_intr_0, show); pr_cont("\n"); nv_error(priv, "PBDMA%d: ch %d [%s] subc %d mthd 0x%04x data 0x%08x\n", unit, chid, nvkm_client_name_for_fifo_chid(&priv->base, chid), subc, mthd, data); + nvkm_fifo_eevent(&priv->base, chid, + NOUVEAU_GEM_CHANNEL_PBDMA_ERROR); } nv_wr32(priv, 0x040108 + (unit * 0x2000), stat); } static const struct nvkm_bitfield gk104_fifo_pbdma_intr_1[] = { { 0x00000001, "HCE_RE_ILLEGAL_OP" }, { 0x00000002, "HCE_RE_ALIGNB" }, @@ -881,16 +892,18 @@ gk104_fifo_intr_pbdma_1(struct gk104_fifo_priv *priv, int unit) if (stat) { nv_error(priv, "PBDMA%d:", unit); nvkm_bitfield_print(gk104_fifo_pbdma_intr_1, stat); pr_cont("\n"); nv_error(priv, "PBDMA%d: ch %d %08x %08x\n", unit, chid, nv_rd32(priv, 0x040150 + (unit * 0x2000)), nv_rd32(priv, 0x040154 + (unit * 0x2000))); + nvkm_fifo_eevent(&priv->base, chid, + NOUVEAU_GEM_CHANNEL_PBDMA_ERROR); } nv_wr32(priv, 0x040148 + (unit * 0x2000), stat); } static void gk104_fifo_intr_runlist(struct gk104_fifo_priv *priv) { diff --git a/drm/nouveau/nvkm/engine/gr/gf100.c b/drm/nouveau/nvkm/engine/gr/gf100.c index e7c3e9e..9d08c80 100644 --- a/drm/nouveau/nvkm/engine/gr/gf100.c +++ b/drm/nouveau/nvkm/engine/gr/gf100.c @@ -32,16 +32,18 @@ #include <engine/fifo.h> #include <subdev/fb.h> #include <subdev/mc.h> #include <subdev/timer.h> #include <nvif/class.h> #include <nvif/unpack.h> +#include "nouveau_drm.h" + /******************************************************************************* * Zero Bandwidth Clear ******************************************************************************/ static void gf100_gr_zbc_clear_color(struct gf100_gr_priv *priv, int zbc) { if (priv->zbc_color[zbc].format) { @@ -1169,26 +1171,28 @@ gf100_gr_intr(struct nvkm_subdev *subdev) if (stat & 0x00000020) { nv_error(priv, "ILLEGAL_CLASS ch %d [0x%010llx %s] subc %d class 0x%04x mthd 0x%04x data 0x%08x\n", chid, inst << 12, nvkm_client_name(engctx), subc, class, mthd, data); nv_wr32(priv, 0x400100, 0x00000020); stat &= ~0x00000020; + nvkm_fifo_eevent(pfifo, chid, NOUVEAU_GEM_CHANNEL_GR_ERROR_SW_NOTIFY); } if (stat & 0x00100000) { nv_error(priv, "DATA_ERROR ["); nvkm_enum_print(nv50_data_error_names, code); pr_cont("] ch %d [0x%010llx %s] subc %d class 0x%04x mthd 0x%04x data 0x%08x\n", chid, inst << 12, nvkm_client_name(engctx), subc, class, mthd, data); nv_wr32(priv, 0x400100, 0x00100000); stat &= ~0x00100000; + nvkm_fifo_eevent(pfifo, chid, NOUVEAU_GEM_CHANNEL_GR_ERROR_SW_NOTIFY); } if (stat & 0x00200000) { nv_error(priv, "TRAP ch %d [0x%010llx %s]\n", chid, inst << 12, nvkm_client_name(engctx)); gf100_gr_trap_intr(priv); nv_wr32(priv, 0x400100, 0x00200000); stat &= ~0x00200000; diff --git a/drm/nouveau/uapi/drm/nouveau_drm.h b/drm/nouveau/uapi/drm/nouveau_drm.h index 1331c28..259efcc 100644 --- a/drm/nouveau/uapi/drm/nouveau_drm.h +++ b/drm/nouveau/uapi/drm/nouveau_drm.h @@ -143,16 +143,26 @@ struct drm_nouveau_gem_cpu_prep { uint32_t handle; uint32_t flags; }; struct drm_nouveau_gem_cpu_fini { uint32_t handle; }; +#define NOUVEAU_GEM_CHANNEL_FIFO_ERROR_IDLE_TIMEOUT 8 +#define NOUVEAU_GEM_CHANNEL_GR_ERROR_SW_NOTIFY 13 +#define NOUVEAU_GEM_CHANNEL_FIFO_ERROR_MMU_ERR_FLT 31 +#define NOUVEAU_GEM_CHANNEL_PBDMA_ERROR 32 +struct drm_nouveau_gem_set_error_notifier { + uint32_t channel; + uint32_t buffer; + uint32_t offset; /* in bytes, u32-aligned */ +}; + #define DRM_NOUVEAU_GETPARAM 0x00 /* deprecated */ #define DRM_NOUVEAU_SETPARAM 0x01 /* deprecated */ #define DRM_NOUVEAU_CHANNEL_ALLOC 0x02 /* deprecated */ #define DRM_NOUVEAU_CHANNEL_FREE 0x03 /* deprecated */ #define DRM_NOUVEAU_GROBJ_ALLOC 0x04 /* deprecated */ #define DRM_NOUVEAU_NOTIFIEROBJ_ALLOC 0x05 /* deprecated */ #define DRM_NOUVEAU_GPUOBJ_FREE 0x06 /* deprecated */ #define DRM_NOUVEAU_NVIF 0x07 @@ -160,19 +170,21 @@ struct drm_nouveau_gem_cpu_fini { #define DRM_NOUVEAU_GEM_PUSHBUF 0x41 #define DRM_NOUVEAU_GEM_CPU_PREP 0x42 #define DRM_NOUVEAU_GEM_CPU_FINI 0x43 #define DRM_NOUVEAU_GEM_INFO 0x44 /* range 0x98..DRM_COMMAND_END (8 entries) is reserved for staging, unstable ioctls */ #define DRM_NOUVEAU_STAGING_IOCTL 0x58 #define DRM_NOUVEAU_GEM_SET_TILING (DRM_NOUVEAU_STAGING_IOCTL + 0x0) #define DRM_NOUVEAU_GEM_PUSHBUF_2 (DRM_NOUVEAU_STAGING_IOCTL + 0x1) +#define DRM_NOUVEAU_GEM_SET_ERROR_NOTIFIER (DRM_NOUVEAU_STAGING_IOCTL + 0x2) #define DRM_IOCTL_NOUVEAU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_NEW, struct drm_nouveau_gem_new) #define DRM_IOCTL_NOUVEAU_GEM_PUSHBUF DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_PUSHBUF, struct drm_nouveau_gem_pushbuf) #define DRM_IOCTL_NOUVEAU_GEM_CPU_PREP DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_PREP, struct drm_nouveau_gem_cpu_prep) #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini) #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info) /* staging ioctls */ #define DRM_IOCTL_NOUVEAU_GEM_SET_TILING DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_SET_TILING, struct drm_nouveau_gem_set_tiling) #define DRM_IOCTL_NOUVEAU_GEM_PUSHBUF_2 DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_PUSHBUF_2, struct drm_nouveau_gem_pushbuf_2) +#define DRM_IOCTL_NOUVEAU_GEM_SET_ERROR_NOTIFIER DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_SET_ERROR_NOTIFIER, struct drm_nouveau_gem_set_error_notifier) #endif /* __NOUVEAU_DRM_H__ */ -- 2.1.4
Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 2/5] HACK don't verify route == owner in nvkm ioctl
FIXME!! Some objects we need to access from userspace are created in kernel. Only the ..V0_NEW ioctl on kernel objects appears to be freely usable from userspace at this point, and also accessing objects that are created from userspace in the first place. The channel object is created in kernel in nouveau_chan.c, I suppose. Signed-off-by: Konsta Hölttä <kholtta at nvidia.com> --- drm/nouveau/nvkm/core/ioctl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drm/nouveau/nvkm/core/ioctl.c b/drm/nouveau/nvkm/core/ioctl.c index 4459ff5..44b0f1d 100644 --- a/drm/nouveau/nvkm/core/ioctl.c +++ b/drm/nouveau/nvkm/core/ioctl.c @@ -474,18 +474,19 @@ nvkm_ioctl_path(struct nvkm_handle *parent, u32 type, u32 nr, u32 *path, nv_debug(object, "handle 0x%08x not found\n", path[nr]); return -ENOENT; } nvkm_namedb_put(handle); parent = handle; } if (owner != NVIF_IOCTL_V0_OWNER_ANY && owner != handle->route) { - nv_ioctl(object, "object route != owner\n"); - return -EACCES; + nv_ioctl(object, "object route != owner: rou %x ow %x\n", handle->route, owner); + nv_ioctl(object, "HACK!! still continuing\n"); + //return -EACCES; } *route = handle->route; *token = handle->token; if (ret = -EINVAL, type < ARRAY_SIZE(nvkm_ioctl_v0)) { if (nvkm_ioctl_v0[type].version == 0) ret = nvkm_ioctl_v0[type].func(handle, data, size); } -- 2.1.4
Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 3/5] gk104: channel priority/timeslice support
Change channel "priorities" by modifying the runlist timeslice value, thus giving more time for more important jobs before switching. A new KEPLER_SET_CHANNEL_PRIORITY mthd on fifo channel objects sets the priority to low, medium or high. Disable channel and preempt it out, change the timeslice, and then enable the channel again. The shift bit in the timeslice register is left untouched (3) as is the enable bit. Signed-off-by: Konsta Hölttä <kholtta at nvidia.com> --- drm/nouveau/include/nvif/class.h | 10 +++++++ drm/nouveau/nvkm/engine/fifo/gk104.c | 58 ++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+) diff --git a/drm/nouveau/include/nvif/class.h b/drm/nouveau/include/nvif/class.h index 28176e6..381c72d 100644 --- a/drm/nouveau/include/nvif/class.h +++ b/drm/nouveau/include/nvif/class.h @@ -619,9 +619,19 @@ struct fermi_a_zbc_depth_v0 { #define FERMI_A_ZBC_DEPTH_V0_FMT_FP32 0x01 __u8 format; __u8 index; __u8 pad03[5]; __u32 ds; __u32 l2; }; +#define KEPLER_SET_CHANNEL_PRIORITY 0x42 // XXX +struct kepler_set_channel_priority_v0 { + __u8 version; +#define KEPLER_SET_CHANNEL_PRIORITY_LOW 0x00 +#define KEPLER_SET_CHANNEL_PRIORITY_MEDIUM 0x01 +#define KEPLER_SET_CHANNEL_PRIORITY_HIGH 0x02 + __u8 priority; + __u8 pad03[6]; +}; + #endif diff --git a/drm/nouveau/nvkm/engine/fifo/gk104.c b/drm/nouveau/nvkm/engine/fifo/gk104.c index 659c05f..2bab45e 100644 --- a/drm/nouveau/nvkm/engine/fifo/gk104.c +++ b/drm/nouveau/nvkm/engine/fifo/gk104.c @@ -333,22 +333,80 @@ gk104_fifo_chan_fini(struct nvkm_object *object, bool suspend) gk104_fifo_runlist_update(priv, chan->engine); } gk104_fifo_chan_kick(chan); nv_wr32(priv, 0x800000 + (chid * 8), 0x00000000); return nvkm_fifo_channel_fini(&chan->base, suspend); } +static int +gk104_fifo_set_runlist_timeslice(struct gk104_fifo_priv *priv, + struct gk104_fifo_chan *chan, u8 slice) +{ + struct nvkm_gpuobj *base = nv_gpuobj(nv_object(chan)->parent); + u32 chid = chan->base.chid; + + nv_mask(priv, 0x800004 + (chid * 8), 0x00000800, 0x00000800); + WARN_ON(gk104_fifo_chan_kick(chan)); + nv_wo32(base, 0xf8, slice | 0x10003000); + nv_mask(priv, 0x800004 + (chid * 8), 0x00000400, 0x00000400); + nv_info(chan, "timeslice set to %d for %d\n", slice, chid); + + return 0; +} + +int +gk104_fifo_chan_set_priority(struct nvkm_object *object, void *data, u32 size) +{ + struct gk104_fifo_priv *priv = (void *)object->engine; + struct gk104_fifo_chan *chan = (void *)object; + union { + struct kepler_set_channel_priority_v0 v0; + } *args = data; + int ret; + + if (nvif_unpack(args->v0, 0, 0, false)) { + switch (args->v0.priority) { + case KEPLER_SET_CHANNEL_PRIORITY_LOW: + /* 64 << 3 == 512 us */ + return gk104_fifo_set_runlist_timeslice(priv, chan, 64); + case KEPLER_SET_CHANNEL_PRIORITY_MEDIUM: + /* 128 << 3 == 1024 us */ + return gk104_fifo_set_runlist_timeslice(priv, chan, 128); + case KEPLER_SET_CHANNEL_PRIORITY_HIGH: + /* 256 << 3 == 2048 us */ + return gk104_fifo_set_runlist_timeslice(priv, chan, 255); + default: + return -EINVAL; + } + } + + return ret; +} + +int +gk104_fifo_chan_mthd(struct nvkm_object *object, u32 mthd, void *data, u32 size) +{ + switch (mthd) { + case KEPLER_SET_CHANNEL_PRIORITY: + return gk104_fifo_chan_set_priority(object, data, size); + default: + break; + } + return -EINVAL; +} + struct nvkm_ofuncs gk104_fifo_chan_ofuncs = { .ctor = gk104_fifo_chan_ctor, .dtor = _nvkm_fifo_channel_dtor, .init = gk104_fifo_chan_init, .fini = gk104_fifo_chan_fini, + .mthd = gk104_fifo_chan_mthd, .map = _nvkm_fifo_channel_map, .rd32 = _nvkm_fifo_channel_rd32, .wr32 = _nvkm_fifo_channel_wr32, .ntfy = _nvkm_fifo_channel_ntfy }; static struct nvkm_oclass gk104_fifo_sclass[] = { -- 2.1.4
Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 4/5] channel timeout using repeated sched timeouts
Enable the scheduling timeout error interrupt and set it to a low value to happen periodically, since it can be missed in HW in certain conditions. Increment a channel-specific counter in software if the current channel hasn't advanced. Abort the channel once the timeout limit is hit (with the periodic granularity). The error notifier is set to NOUVEAU_GEM_CHANNEL_FIFO_ERROR_IDLE_TIMEOUT when this occurs. A new KEPLER_SET_CHANNEL_TIMEOUT mthd sets the timeout limit, in milliseconds. The interrupt granularity is set to 100 ms. Signed-off-by: Konsta Hölttä <kholtta at nvidia.com> --- drm/nouveau/include/nvif/class.h | 8 ++++ drm/nouveau/nvkm/engine/fifo/gk104.c | 78 +++++++++++++++++++++++++++++++----- 2 files changed, 75 insertions(+), 11 deletions(-) diff --git a/drm/nouveau/include/nvif/class.h b/drm/nouveau/include/nvif/class.h index 381c72d..f9a3647 100644 --- a/drm/nouveau/include/nvif/class.h +++ b/drm/nouveau/include/nvif/class.h @@ -620,18 +620,26 @@ struct fermi_a_zbc_depth_v0 { __u8 format; __u8 index; __u8 pad03[5]; __u32 ds; __u32 l2; }; #define KEPLER_SET_CHANNEL_PRIORITY 0x42 // XXX +#define KEPLER_SET_CHANNEL_TIMEOUT 0x43 // XXX + struct kepler_set_channel_priority_v0 { __u8 version; #define KEPLER_SET_CHANNEL_PRIORITY_LOW 0x00 #define KEPLER_SET_CHANNEL_PRIORITY_MEDIUM 0x01 #define KEPLER_SET_CHANNEL_PRIORITY_HIGH 0x02 __u8 priority; __u8 pad03[6]; }; +struct kepler_set_channel_timeout_v0 { + __u8 version; + __u8 pad03[3]; + __u32 timeout_ms; +}; + #endif diff --git a/drm/nouveau/nvkm/engine/fifo/gk104.c b/drm/nouveau/nvkm/engine/fifo/gk104.c index 2bab45e..15360a6 100644 --- a/drm/nouveau/nvkm/engine/fifo/gk104.c +++ b/drm/nouveau/nvkm/engine/fifo/gk104.c @@ -83,18 +83,25 @@ struct gk104_fifo_base { struct gk104_fifo_chan { struct nvkm_fifo_chan base; u32 engine; enum { STOPPED, RUNNING, KILLED } state; + struct { + u32 sum_ms; + u32 limit_ms; + u32 gpfifo_get; + } timeout; }; +#define GRFIFO_TIMEOUT_CHECK_PERIOD_MS 100 + /******************************************************************************* * FIFO channel objects ******************************************************************************/ static void gk104_fifo_runlist_update(struct gk104_fifo_priv *priv, u32 engine) { struct nvkm_bar *bar = nvkm_bar(priv); @@ -288,16 +295,21 @@ gk104_fifo_chan_ctor(struct nvkm_object *parent, struct nvkm_object *engine, nv_wo32(base, 0x94, 0x30000001); nv_wo32(base, 0x9c, 0x00000100); nv_wo32(base, 0xac, 0x0000001f); nv_wo32(base, 0xe8, chan->base.chid); nv_wo32(base, 0xb8, 0xf8000000); nv_wo32(base, 0xf8, 0x10003080); /* 0x002310 */ nv_wo32(base, 0xfc, 0x10000010); /* 0x002350 */ bar->flush(bar); + + chan->timeout.sum_ms = 0; + chan->timeout.limit_ms = -1; + chan->timeout.gpfifo_get = 0; + return 0; } static int gk104_fifo_chan_init(struct nvkm_object *object) { struct nvkm_gpuobj *base = nv_gpuobj(object->parent); struct gk104_fifo_priv *priv = (void *)object->engine; @@ -379,21 +391,39 @@ gk104_fifo_chan_set_priority(struct nvkm_object *object, void *data, u32 size) return -EINVAL; } } return ret; } int +gk104_fifo_chan_set_timeout(struct nvkm_object *object, void *data, u32 size) +{ + struct gk104_fifo_chan *chan = (void *)object; + union { + struct kepler_set_channel_timeout_v0 v0; + } *args = data; + int ret; + + if (nvif_unpack(args->v0, 0, 0, false)) { + chan->timeout.limit_ms = args->v0.timeout_ms; + } + + return ret; +} + +int gk104_fifo_chan_mthd(struct nvkm_object *object, u32 mthd, void *data, u32 size) { switch (mthd) { case KEPLER_SET_CHANNEL_PRIORITY: return gk104_fifo_chan_set_priority(object, data, size); + case KEPLER_SET_CHANNEL_TIMEOUT: + return gk104_fifo_chan_set_timeout(object, data, size); default: break; } return -EINVAL; } struct nvkm_ofuncs gk104_fifo_chan_ofuncs = { @@ -604,61 +634,83 @@ gk104_fifo_intr_bind(struct gk104_fifo_priv *priv) } static const struct nvkm_enum gk104_fifo_sched_reason[] = { { 0x0a, "CTXSW_TIMEOUT" }, {} }; +static bool +gk104_fifo_update_timeout(struct gk104_fifo_priv *priv, + struct gk104_fifo_chan *chan, u32 dt) +{ + u32 gpfifo_get = nv_rd32(priv, 34); + if (gpfifo_get == chan->timeout.gpfifo_get) { + chan->timeout.sum_ms += dt; + } else { + chan->timeout.sum_ms = dt; + } + + chan->timeout.gpfifo_get = gpfifo_get; + + return chan->timeout.sum_ms > chan->timeout.limit_ms; +} + static void gk104_fifo_intr_sched_ctxsw(struct gk104_fifo_priv *priv) { struct nvkm_engine *engine; struct gk104_fifo_chan *chan; u32 engn; for (engn = 0; engn < ARRAY_SIZE(fifo_engine); engn++) { u32 stat = nv_rd32(priv, 0x002640 + (engn * 0x04)); u32 busy = (stat & 0x80000000); u32 next = (stat & 0x07ff0000) >> 16; - u32 chsw = (stat & 0x00008000); - u32 save = (stat & 0x00004000); - u32 load = (stat & 0x00002000); + u32 cxsw = (stat & 0x0000e000) >> 13; u32 prev = (stat & 0x000007ff); - u32 chid = load ? next : prev; - (void)save; + /* if loading context, take next id */ + u32 chid = cxsw == 5 ? next : prev; - if (busy && chsw) { + nv_error(priv, "ctxsw eng stat: %08x\n", stat); + /* doing context switch? */ + if (busy && (cxsw >= 5 && cxsw <= 7)) { if (!(chan = (void *)priv->base.channel[chid])) continue; if (!(engine = gk104_fifo_engine(priv, engn))) continue; - nvkm_fifo_eevent(&priv->base, chid, - NOUVEAU_GEM_CHANNEL_FIFO_ERROR_IDLE_TIMEOUT); - - gk104_fifo_recover(priv, engine, chan); + if (gk104_fifo_update_timeout(priv, chan, + GRFIFO_TIMEOUT_CHECK_PERIOD_MS)) { + nvkm_fifo_eevent(&priv->base, chid, + NOUVEAU_GEM_CHANNEL_FIFO_ERROR_IDLE_TIMEOUT); + gk104_fifo_recover(priv, engine, chan); + } else { + nv_debug(priv, "fifo waiting for ctxsw %d ms on ch %d\n", + chan->timeout.sum_ms, chid); + } } } } static void gk104_fifo_intr_sched(struct gk104_fifo_priv *priv) { u32 intr = nv_rd32(priv, 0x00254c); u32 code = intr & 0x000000ff; const struct nvkm_enum *en; char enunk[6] = ""; en = nvkm_enum_find(gk104_fifo_sched_reason, code); if (!en) snprintf(enunk, sizeof(enunk), "UNK%02x", code); - nv_error(priv, "SCHED_ERROR [ %s ]\n", en ? en->name : enunk); + /* this is a normal situation, not so loud */ + nv_debug(priv, "SCHED_ERROR [ %s ]\n", en ? en->name : enunk); switch (code) { case 0x0a: gk104_fifo_intr_sched_ctxsw(priv); break; default: break; } @@ -1131,18 +1183,22 @@ gk104_fifo_init(struct nvkm_object *object) /* PBDMA[n].HCE */ for (i = 0; i < priv->spoon_nr; i++) { nv_wr32(priv, 0x040148 + (i * 0x2000), 0xffffffff); /* INTR */ nv_wr32(priv, 0x04014c + (i * 0x2000), 0xffffffff); /* INTREN */ } nv_wr32(priv, 0x002254, 0x10000000 | priv->user.bar.offset >> 12); + /* enable interrupts */ nv_wr32(priv, 0x002100, 0xffffffff); nv_wr32(priv, 0x002140, 0x7fffffff); + + /* engine timeout */ + nv_wr32(priv, 0x002a0c, 0x80000000 | (1000 * GRFIFO_TIMEOUT_CHECK_PERIOD_MS)); return 0; } void gk104_fifo_dtor(struct nvkm_object *object) { struct gk104_fifo_priv *priv = (void *)object; int i; -- 2.1.4
Konsta Hölttä
2015-Aug-05 11:27 UTC
[Nouveau] [RFC PATCH 5/5] force fences updated in error conditions
Some error conditions just stop a channel and fences get stuck, so they either need to be kicked ready by overwriting hw seq numbers (as nvgpu does) or faked with a sw flag like this. This is just a hack as an example of what would be needed. Here, a channel id whose fences should be forced updated is passed upwards with the uevent response. Normally, this is -1 to match no channel id, but some error paths fake an update event with an explicit channel id. Note: if userspace has some meaningful timeouts on the fences, then they do finish but without any notification that the channel is broken now (how do you distinguish a long gpu job from a stuck one?). In many cases, a channel needs to be shut down completely when it breaks (e.g., mmu fault). Signed-off-by: Konsta Hölttä <kholtta at nvidia.com> --- drm/nouveau/include/nvif/event.h | 1 + drm/nouveau/include/nvkm/engine/fifo.h | 2 +- drm/nouveau/nouveau_fence.c | 13 ++++++++----- drm/nouveau/nvkm/engine/fifo/base.c | 3 ++- drm/nouveau/nvkm/engine/fifo/gf100.c | 2 +- drm/nouveau/nvkm/engine/fifo/gk104.c | 7 ++++++- drm/nouveau/nvkm/engine/fifo/nv04.c | 2 +- 7 files changed, 20 insertions(+), 10 deletions(-) diff --git a/drm/nouveau/include/nvif/event.h b/drm/nouveau/include/nvif/event.h index d148b85..a9ff4ee 100644 --- a/drm/nouveau/include/nvif/event.h +++ b/drm/nouveau/include/nvif/event.h @@ -52,16 +52,17 @@ struct nvif_notify_conn_rep_v0 { }; struct nvif_notify_uevent_req { /* nvif_notify_req ... */ }; struct nvif_notify_uevent_rep { /* nvif_notify_rep ... */ + __u32 force_chid; }; struct nvif_notify_eevent_req { /* nvif_notify_req ... */ u32 chid; }; struct nvif_notify_eevent_rep { diff --git a/drm/nouveau/include/nvkm/engine/fifo.h b/drm/nouveau/include/nvkm/engine/fifo.h index cbca477..946eb68 100644 --- a/drm/nouveau/include/nvkm/engine/fifo.h +++ b/drm/nouveau/include/nvkm/engine/fifo.h @@ -117,15 +117,15 @@ extern struct nvkm_oclass *gf100_fifo_oclass; extern struct nvkm_oclass *gk104_fifo_oclass; extern struct nvkm_oclass *gk20a_fifo_oclass; extern struct nvkm_oclass *gk208_fifo_oclass; extern struct nvkm_oclass *gm204_fifo_oclass; extern struct nvkm_oclass *gm20b_fifo_oclass; int nvkm_fifo_uevent_ctor(struct nvkm_object *, void *, u32, struct nvkm_notify *); -void nvkm_fifo_uevent(struct nvkm_fifo *); +void nvkm_fifo_uevent(struct nvkm_fifo *, u32 force_chid); void nvkm_fifo_eevent(struct nvkm_fifo *, u32 chid, u32 error); void nv04_fifo_intr(struct nvkm_subdev *); int nv04_fifo_context_attach(struct nvkm_object *, struct nvkm_object *); #endif diff --git a/drm/nouveau/nouveau_fence.c b/drm/nouveau/nouveau_fence.c index 38bccb0..b7d9987 100644 --- a/drm/nouveau/nouveau_fence.c +++ b/drm/nouveau/nouveau_fence.c @@ -123,50 +123,53 @@ nouveau_fence_context_put(struct kref *fence_ref) void nouveau_fence_context_free(struct nouveau_fence_chan *fctx) { kref_put(&fctx->fence_ref, nouveau_fence_context_put); } static int -nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan *fctx) +nouveau_fence_update(struct nouveau_channel *chan, + struct nouveau_fence_chan *fctx, u32 force_chid) { struct nouveau_fence *fence; int drop = 0; u32 seq = fctx->read(chan); + bool force = force_chid == chan->chid; while (!list_empty(&fctx->pending)) { fence = list_entry(fctx->pending.next, typeof(*fence), head); - if ((int)(seq - fence->base.seqno) < 0) + if ((int)(seq - fence->base.seqno) < 0 && !force) break; drop |= nouveau_fence_signal(fence); } return drop; } static int nouveau_fence_wait_uevent_handler(struct nvif_notify *notify) { struct nouveau_fence_chan *fctx container_of(notify, typeof(*fctx), notify); + const struct nvif_notify_uevent_rep *rep = notify->data; unsigned long flags; int ret = NVIF_NOTIFY_KEEP; spin_lock_irqsave(&fctx->lock, flags); if (!list_empty(&fctx->pending)) { struct nouveau_fence *fence; struct nouveau_channel *chan; fence = list_entry(fctx->pending.next, typeof(*fence), head); chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock)); - if (nouveau_fence_update(fence->channel, fctx)) + if (nouveau_fence_update(fence->channel, fctx, rep->force_chid)) ret = NVIF_NOTIFY_DROP; } spin_unlock_irqrestore(&fctx->lock, flags); return ret; } void @@ -278,17 +281,17 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan) kref_get(&fctx->fence_ref); trace_fence_emit(&fence->base); ret = fctx->emit(fence); if (!ret) { fence_get(&fence->base); spin_lock_irq(&fctx->lock); - if (nouveau_fence_update(chan, fctx)) + if (nouveau_fence_update(chan, fctx, -1)) nvif_notify_put(&fctx->notify); list_add_tail(&fence->head, &fctx->pending); spin_unlock_irq(&fctx->lock); } return ret; } @@ -302,17 +305,17 @@ nouveau_fence_done(struct nouveau_fence *fence) struct nouveau_channel *chan; unsigned long flags; if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags)) return true; spin_lock_irqsave(&fctx->lock, flags); chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock)); - if (chan && nouveau_fence_update(chan, fctx)) + if (chan && nouveau_fence_update(chan, fctx, -1)) nvif_notify_put(&fctx->notify); spin_unlock_irqrestore(&fctx->lock, flags); } return fence_is_signaled(&fence->base); } static long nouveau_fence_wait_legacy(struct fence *f, bool intr, long wait) diff --git a/drm/nouveau/nvkm/engine/fifo/base.c b/drm/nouveau/nvkm/engine/fifo/base.c index df9ee37..535cc87 100644 --- a/drm/nouveau/nvkm/engine/fifo/base.c +++ b/drm/nouveau/nvkm/engine/fifo/base.c @@ -184,19 +184,20 @@ nvkm_fifo_uevent_ctor(struct nvkm_object *object, void *data, u32 size, notify->types = 1; notify->index = 0; } return ret; } void -nvkm_fifo_uevent(struct nvkm_fifo *fifo) +nvkm_fifo_uevent(struct nvkm_fifo *fifo, u32 force_chid) { struct nvif_notify_uevent_rep rep = { + .force_chid = force_chid }; nvkm_event_send(&fifo->uevent, 1, 0, &rep, sizeof(rep)); } static int nvkm_fifo_eevent_ctor(struct nvkm_object *object, void *data, u32 size, struct nvkm_notify *notify) { diff --git a/drm/nouveau/nvkm/engine/fifo/gf100.c b/drm/nouveau/nvkm/engine/fifo/gf100.c index b745252..ca86dfe 100644 --- a/drm/nouveau/nvkm/engine/fifo/gf100.c +++ b/drm/nouveau/nvkm/engine/fifo/gf100.c @@ -732,17 +732,17 @@ gf100_fifo_intr_engine_unit(struct gf100_fifo_priv *priv, int engn) u32 inte = nv_rd32(priv, 0x002628); u32 unkn; nv_wr32(priv, 0x0025a8 + (engn * 0x04), intr); for (unkn = 0; unkn < 8; unkn++) { u32 ints = (intr >> (unkn * 0x04)) & inte; if (ints & 0x1) { - nvkm_fifo_uevent(&priv->base); + nvkm_fifo_uevent(&priv->base, -1); ints &= ~1; } if (ints) { nv_error(priv, "ENGINE %d %d %01x", engn, unkn, ints); nv_mask(priv, 0x002628, ints, 0); } } } diff --git a/drm/nouveau/nvkm/engine/fifo/gk104.c b/drm/nouveau/nvkm/engine/fifo/gk104.c index 15360a6..2ad5486 100644 --- a/drm/nouveau/nvkm/engine/fifo/gk104.c +++ b/drm/nouveau/nvkm/engine/fifo/gk104.c @@ -902,16 +902,18 @@ gk104_fifo_intr_fault(struct gk104_fifo_priv *priv, int unit) object = engctx; while (object) { switch (nv_mclass(object)) { case KEPLER_CHANNEL_GPFIFO_A: case MAXWELL_CHANNEL_GPFIFO_A: nvkm_fifo_eevent(&priv->base, ((struct nvkm_fifo_chan*)object)->chid, NOUVEAU_GEM_CHANNEL_FIFO_ERROR_MMU_ERR_FLT); + nvkm_fifo_uevent(&priv->base, + ((struct nvkm_fifo_chan*)object)->chid); gk104_fifo_recover(priv, engine, (void *)object); break; } object = object->parent; } nvkm_engctx_put(engctx); } @@ -972,18 +974,21 @@ gk104_fifo_intr_pbdma_0(struct gk104_fifo_priv *priv, int unit) nv_error(priv, "PBDMA%d:", unit); nvkm_bitfield_print(gk104_fifo_pbdma_intr_0, show); pr_cont("\n"); nv_error(priv, "PBDMA%d: ch %d [%s] subc %d mthd 0x%04x data 0x%08x\n", unit, chid, nvkm_client_name_for_fifo_chid(&priv->base, chid), subc, mthd, data); + nvkm_fifo_eevent(&priv->base, chid, NOUVEAU_GEM_CHANNEL_PBDMA_ERROR); + + nvkm_fifo_uevent(&priv->base, chid); } nv_wr32(priv, 0x040108 + (unit * 0x2000), stat); } static const struct nvkm_bitfield gk104_fifo_pbdma_intr_1[] = { { 0x00000001, "HCE_RE_ILLEGAL_OP" }, { 0x00000002, "HCE_RE_ALIGNB" }, @@ -1024,17 +1029,17 @@ gk104_fifo_intr_runlist(struct gk104_fifo_priv *priv) nv_wr32(priv, 0x002a00, 1 << engn); mask &= ~(1 << engn); } } static void gk104_fifo_intr_engine(struct gk104_fifo_priv *priv) { - nvkm_fifo_uevent(&priv->base); + nvkm_fifo_uevent(&priv->base, -1); } static void gk104_fifo_intr(struct nvkm_subdev *subdev) { struct gk104_fifo_priv *priv = (void *)subdev; u32 mask = nv_rd32(priv, 0x002140); u32 stat = nv_rd32(priv, 0x002100) & mask; diff --git a/drm/nouveau/nvkm/engine/fifo/nv04.c b/drm/nouveau/nvkm/engine/fifo/nv04.c index 043e429..1749614 100644 --- a/drm/nouveau/nvkm/engine/fifo/nv04.c +++ b/drm/nouveau/nvkm/engine/fifo/nv04.c @@ -536,17 +536,17 @@ nv04_fifo_intr(struct nvkm_subdev *subdev) if (device->card_type == NV_50) { if (stat & 0x00000010) { stat &= ~0x00000010; nv_wr32(priv, 0x002100, 0x00000010); } if (stat & 0x40000000) { nv_wr32(priv, 0x002100, 0x40000000); - nvkm_fifo_uevent(&priv->base); + nvkm_fifo_uevent(&priv->base, -1); stat &= ~0x40000000; } } if (stat) { nv_warn(priv, "unknown intr 0x%08x\n", stat); nv_mask(priv, NV03_PFIFO_INTR_EN_0, stat, 0x00000000); nv_wr32(priv, NV03_PFIFO_INTR_0, stat); -- 2.1.4