Thierry Reding
2018-Jan-11 22:15 UTC
[Nouveau] [PATCH 0/3] drm/nouveau: Add support for fence FDs
From: Thierry Reding <treding at nvidia.com> This small series of patches implements support for waiting on and emitting fence FDs on kickoff. This enables explicit fencing and can be used for example to synchronize buffer accesses between the display engine and the GPU on Tegra. The first patch lays the groundwork by splitting up nouveau_fence_sync() to allow reuse. Patch 2 is where the interesting stuff happens. It adds a new IOCTL (PUSHBUF2) that is a superset of the existing PUSHBUF IOCTL and reuses most of the code while adding support for pre- and post- fences. Finally, the third patch teaches Nouveau how to deal with fence arrays, which are usually a result of chaining together multiple dependent jobs. I have corresponding userspace support for these in libdrm and Mesa: https://cgit.freedesktop.org/~tagr/drm/log/?h=nouveau-sync-fd https://cgit.freedesktop.org/~tagr/mesa/log/?h=nouveau-sync-fd I'll send those patches out shortly. There's some more work depending on these patches that I plan to send out in the coming days or weeks. The final result allows Nouveau and Tegra DRM to negotiate for a framebuffer modifier and then go into a render/scanout loop using fences for synchronization. All of this was tested using a slightly modified version of kmscube. Thierry Thierry Reding (3): drm/nouveau: Split nouveau_fence_sync() drm/nouveau: Support fence FDs at kickoff drm/nouveau: Support DMA fence arrays drivers/gpu/drm/nouveau/nouveau_bo.c | 38 ++++++++++++++- drivers/gpu/drm/nouveau/nouveau_bo.h | 2 + drivers/gpu/drm/nouveau/nouveau_display.c | 4 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 1 + drivers/gpu/drm/nouveau/nouveau_fence.c | 81 +++++++++++++------------------ drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 80 ++++++++++++++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_gem.h | 2 + include/uapi/drm/nouveau_drm.h | 14 ++++++ 9 files changed, 167 insertions(+), 57 deletions(-) -- 2.15.1
Thierry Reding
2018-Jan-11 22:15 UTC
[Nouveau] [PATCH 1/3] drm/nouveau: Split nouveau_fence_sync()
From: Thierry Reding <treding at nvidia.com> Turn nouveau_fence_sync() into a low-level helper that adds fence waits to the channel command stream. The new nouveau_bo_sync() helper replaces the previous nouveau_fence_sync() implementation. It passes each of the buffer object's fences to nouveau_fence_sync() in turn. This provides more fine-grained control over fences which is needed by subsequent patches for sync fd support. Heavily based on work by Lauri Peltonen <lpeltonen at nvidia.com>. Signed-off-by: Thierry Reding <treding at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_bo.c | 38 ++++++++++++++++- drivers/gpu/drm/nouveau/nouveau_bo.h | 2 + drivers/gpu/drm/nouveau/nouveau_display.c | 4 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 68 +++++++------------------------ drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 2 +- 6 files changed, 57 insertions(+), 59 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 41e7f2927443..0285ca4c6235 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -545,6 +545,42 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) PAGE_SIZE, DMA_FROM_DEVICE); } +int +nouveau_bo_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, + bool exclusive, bool intr) +{ + struct reservation_object *resv = nvbo->bo.resv; + struct reservation_object_list *fobj; + struct dma_fence *fence; + int ret = 0, i; + + if (!exclusive) { + ret = reservation_object_reserve_shared(resv); + if (ret < 0) + return ret; + } + + fobj = reservation_object_get_list(resv); + fence = reservation_object_get_excl(resv); + + if (fence && (!exclusive || !fobj || !fobj->shared_count)) + return nouveau_fence_sync(fence, chan, intr); + + if (!exclusive || !fobj) + return ret; + + for (i = 0; i < fobj->shared_count && !ret; ++i) { + fence = rcu_dereference_protected(fobj->shared[i], + reservation_object_held(resv)); + + ret = nouveau_fence_sync(fence, chan, intr); + if (ret < 0) + break; + } + + return ret; +} + int nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible, bool no_wait_gpu) @@ -1114,7 +1150,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr, } mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING); - ret = nouveau_fence_sync(nouveau_bo(bo), chan, true, intr); + ret = nouveau_bo_sync(nouveau_bo(bo), chan, true, intr); if (ret == 0) { ret = drm->ttm.move(chan, bo, &bo->mem, new_reg); if (ret == 0) { diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h index 7b5cc5c73d20..d2ef12c0e39a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.h +++ b/drivers/gpu/drm/nouveau/nouveau_bo.h @@ -93,6 +93,8 @@ int nouveau_bo_validate(struct nouveau_bo *, bool interruptible, bool no_wait_gpu); void nouveau_bo_sync_for_device(struct nouveau_bo *nvbo); void nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo); +int nouveau_bo_sync(struct nouveau_bo *nvbo, struct nouveau_channel *channel, + bool exclusive, bool intr); /* TODO: submit equivalent to TTM generic API upstream? */ static inline void __iomem * diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 009713404cc4..526280e9677a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -755,7 +755,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, spin_unlock_irqrestore(&dev->event_lock, flags); /* Synchronize with the old framebuffer */ - ret = nouveau_fence_sync(old_bo, chan, false, false); + ret = nouveau_bo_sync(old_bo, chan, false, false); if (ret) goto fail; @@ -819,7 +819,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct drm_framebuffer *fb, goto fail_unpin; /* synchronise rendering channel with the kernel's channel */ - ret = nouveau_fence_sync(new_bo, chan, false, true); + ret = nouveau_bo_sync(new_bo, chan, false, true); if (ret) { ttm_bo_unreserve(&new_bo->bo); goto fail_unpin; diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 9c8f3a154d55..d61fcfb97b09 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -332,66 +332,26 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) } int -nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool exclusive, bool intr) +nouveau_fence_sync(struct dma_fence *fence, struct nouveau_channel *chan, + bool intr) { struct nouveau_fence_chan *fctx = chan->fence; - struct dma_fence *fence; - struct reservation_object *resv = nvbo->bo.resv; - struct reservation_object_list *fobj; + struct nouveau_channel *prev = NULL; struct nouveau_fence *f; - int ret = 0, i; - - if (!exclusive) { - ret = reservation_object_reserve_shared(resv); + bool must_wait = true; + int ret = 0; - if (ret) - return ret; + f = nouveau_local_fence(fence, chan->drm); + if (f) { + rcu_read_lock(); + prev = rcu_dereference(f->channel); + if (prev && (prev == chan || fctx->sync(f, prev, chan) == 0)) + must_wait = false; + rcu_read_unlock(); } - fobj = reservation_object_get_list(resv); - fence = reservation_object_get_excl(resv); - - if (fence && (!exclusive || !fobj || !fobj->shared_count)) { - struct nouveau_channel *prev = NULL; - bool must_wait = true; - - f = nouveau_local_fence(fence, chan->drm); - if (f) { - rcu_read_lock(); - prev = rcu_dereference(f->channel); - if (prev && (prev == chan || fctx->sync(f, prev, chan) == 0)) - must_wait = false; - rcu_read_unlock(); - } - - if (must_wait) - ret = dma_fence_wait(fence, intr); - - return ret; - } - - if (!exclusive || !fobj) - return ret; - - for (i = 0; i < fobj->shared_count && !ret; ++i) { - struct nouveau_channel *prev = NULL; - bool must_wait = true; - - fence = rcu_dereference_protected(fobj->shared[i], - reservation_object_held(resv)); - - f = nouveau_local_fence(fence, chan->drm); - if (f) { - rcu_read_lock(); - prev = rcu_dereference(f->channel); - if (prev && (prev == chan || fctx->sync(f, prev, chan) == 0)) - must_wait = false; - rcu_read_unlock(); - } - - if (must_wait) - ret = dma_fence_wait(fence, intr); - } + if (must_wait) + ret = dma_fence_wait(fence, intr); return ret; } diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h index 5bd8d30d1657..2c46d9e767ab 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.h +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h @@ -24,7 +24,7 @@ void nouveau_fence_unref(struct nouveau_fence **); int nouveau_fence_emit(struct nouveau_fence *, struct nouveau_channel *); bool nouveau_fence_done(struct nouveau_fence *); int nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr); -int nouveau_fence_sync(struct nouveau_bo *, struct nouveau_channel *, bool exclusive, bool intr); +int nouveau_fence_sync(struct dma_fence *, struct nouveau_channel *, bool intr); struct nouveau_fence_chan { spinlock_t lock; diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index e72a7e37eb0a..ea5e55551cbd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -494,7 +494,7 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli, return ret; } - ret = nouveau_fence_sync(nvbo, chan, !!b->write_domains, true); + ret = nouveau_bo_sync(nvbo, chan, !!b->write_domains, true); if (unlikely(ret)) { if (ret != -ERESTARTSYS) NV_PRINTK(err, cli, "fail post-validate sync\n"); -- 2.15.1
Thierry Reding
2018-Jan-11 22:15 UTC
[Nouveau] [PATCH 2/3] drm/nouveau: Support fence FDs at kickoff
From: Thierry Reding <treding at nvidia.com> Add a new NOUVEAU_GEM_PUSHBUF2 IOCTL that accepts and emits a sync fence FD from/to userspace if requested by the corresponding flags. Based heavily on work by Lauri Peltonen <lpeltonen at nvidia.com> Signed-off-by: Thierry Reding <treding at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_drm.c | 1 + drivers/gpu/drm/nouveau/nouveau_gem.c | 78 +++++++++++++++++++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_gem.h | 2 + include/uapi/drm/nouveau_drm.h | 14 +++++++ 4 files changed, 91 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 3ce2f02e9e58..b38500a64236 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -1012,6 +1012,7 @@ nouveau_ioctls[] = { DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_PREP, nouveau_gem_ioctl_cpu_prep, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_CPU_FINI, nouveau_gem_ioctl_cpu_fini, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_INFO, nouveau_gem_ioctl_info, DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(NOUVEAU_GEM_PUSHBUF2, nouveau_gem_ioctl_pushbuf2, DRM_AUTH|DRM_RENDER_ALLOW), }; long diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index ea5e55551cbd..ad5c939b4f33 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -24,6 +24,8 @@ * */ +#include <linux/sync_file.h> + #include "nouveau_drv.h" #include "nouveau_dma.h" #include "nouveau_fence.h" @@ -666,22 +668,28 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, return ret; } -int -nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, - struct drm_file *file_priv) +static int +__nouveau_gem_ioctl_pushbuf(struct drm_device *dev, + struct drm_nouveau_gem_pushbuf2 *request, + struct drm_file *file_priv) { struct nouveau_abi16 *abi16 = nouveau_abi16_get(file_priv); struct nouveau_cli *cli = nouveau_cli(file_priv); struct nouveau_abi16_chan *temp; struct nouveau_drm *drm = nouveau_drm(dev); - struct drm_nouveau_gem_pushbuf *req = data; + struct drm_nouveau_gem_pushbuf *req = &request->base; struct drm_nouveau_gem_pushbuf_push *push; struct drm_nouveau_gem_pushbuf_bo *bo; struct nouveau_channel *chan = NULL; struct validate_op op; struct nouveau_fence *fence = NULL; + struct dma_fence *prefence = NULL; int i, j, ret = 0, do_reloc = 0; + /* check for unrecognized flags */ + if (request->flags & ~NOUVEAU_GEM_PUSHBUF_FLAGS) + return -EINVAL; + if (unlikely(!abi16)) return -ENOMEM; @@ -746,6 +754,15 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, goto out_prevalid; } + if (request->flags & NOUVEAU_GEM_PUSHBUF_FENCE_WAIT) { + prefence = sync_file_get_fence(request->fence); + if (prefence) { + ret = nouveau_fence_sync(prefence, chan, true); + if (ret < 0) + goto out; + } + } + /* Apply any relocations that are required */ if (do_reloc) { ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo); @@ -830,7 +847,30 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, goto out; } + if (request->flags & NOUVEAU_GEM_PUSHBUF_FENCE_EMIT) { + struct sync_file *file; + int fd; + + fd = get_unused_fd_flags(O_CLOEXEC); + if (fd < 0) { + ret = fd; + goto out; + } + + file = sync_file_create(&fence->base); + if (!file) { + put_unused_fd(fd); + goto out; + } + + fd_install(fd, file->file); + request->fence = fd; + } + out: + if (prefence) + dma_fence_put(prefence); + validate_fini(&op, fence, bo); nouveau_fence_unref(&fence); @@ -855,6 +895,27 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, return nouveau_abi16_put(abi16, ret); } +int +nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_nouveau_gem_pushbuf *request = data; + struct drm_nouveau_gem_pushbuf2 req; + int ret; + + memset(&req, 0, sizeof(req)); + memcpy(&req.base, request, sizeof(*request)); + + ret = __nouveau_gem_ioctl_pushbuf(dev, &req, file_priv); + + request->gart_available = req.base.gart_available; + request->vram_available = req.base.vram_available; + request->suffix1 = req.base.suffix1; + request->suffix0 = req.base.suffix0; + + return ret; +} + int nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data, struct drm_file *file_priv) @@ -922,3 +983,12 @@ nouveau_gem_ioctl_info(struct drm_device *dev, void *data, return ret; } +int +nouveau_gem_ioctl_pushbuf2(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_nouveau_gem_pushbuf2 *req = data; + + return __nouveau_gem_ioctl_pushbuf(dev, req, file_priv); +} + diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.h b/drivers/gpu/drm/nouveau/nouveau_gem.h index eb55c1eb1d9f..3c98b0f87c4b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.h +++ b/drivers/gpu/drm/nouveau/nouveau_gem.h @@ -31,6 +31,8 @@ extern int nouveau_gem_ioctl_cpu_fini(struct drm_device *, void *, struct drm_file *); extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); +extern int nouveau_gem_ioctl_pushbuf2(struct drm_device *, void *, + struct drm_file *); struct dma_buf *nouveau_gem_prime_export(struct drm_device *dev, struct drm_gem_object *obj, diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h index 259588a4b61b..e50e65515303 100644 --- a/include/uapi/drm/nouveau_drm.h +++ b/include/uapi/drm/nouveau_drm.h @@ -114,6 +114,18 @@ struct drm_nouveau_gem_pushbuf { __u64 gart_available; }; +#define NOUVEAU_GEM_PUSHBUF_FENCE_WAIT (1 << 0) +#define NOUVEAU_GEM_PUSHBUF_FENCE_EMIT (1 << 1) +#define NOUVEAU_GEM_PUSHBUF_FLAGS (NOUVEAU_GEM_PUSHBUF_FENCE_WAIT | \ + NOUVEAU_GEM_PUSHBUF_FENCE_EMIT) + +struct drm_nouveau_gem_pushbuf2 { + struct drm_nouveau_gem_pushbuf base; + __u32 flags; + __s32 fence; + __u64 reserved; +}; + #define NOUVEAU_GEM_CPU_PREP_NOWAIT 0x00000001 #define NOUVEAU_GEM_CPU_PREP_WRITE 0x00000004 struct drm_nouveau_gem_cpu_prep { @@ -138,12 +150,14 @@ struct drm_nouveau_gem_cpu_fini { #define DRM_NOUVEAU_GEM_CPU_PREP 0x42 #define DRM_NOUVEAU_GEM_CPU_FINI 0x43 #define DRM_NOUVEAU_GEM_INFO 0x44 +#define DRM_NOUVEAU_GEM_PUSHBUF2 0x45 #define DRM_IOCTL_NOUVEAU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_NEW, struct drm_nouveau_gem_new) #define DRM_IOCTL_NOUVEAU_GEM_PUSHBUF DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_PUSHBUF, struct drm_nouveau_gem_pushbuf) #define DRM_IOCTL_NOUVEAU_GEM_CPU_PREP DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_PREP, struct drm_nouveau_gem_cpu_prep) #define DRM_IOCTL_NOUVEAU_GEM_CPU_FINI DRM_IOW (DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_CPU_FINI, struct drm_nouveau_gem_cpu_fini) #define DRM_IOCTL_NOUVEAU_GEM_INFO DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_INFO, struct drm_nouveau_gem_info) +#define DRM_IOCTL_NOUVEAU_GEM_PUSHBUF2 DRM_IOWR(DRM_COMMAND_BASE + DRM_NOUVEAU_GEM_PUSHBUF2, struct drm_nouveau_gem_pushbuf2) #if defined(__cplusplus) } -- 2.15.1
Thierry Reding
2018-Jan-11 22:15 UTC
[Nouveau] [PATCH 3/3] drm/nouveau: Support DMA fence arrays
From: Thierry Reding <treding at nvidia.com> Signed-off-by: Thierry Reding <treding at nvidia.com> --- drivers/gpu/drm/nouveau/nouveau_fence.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index d61fcfb97b09..53178b1471e3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -28,6 +28,7 @@ #include <linux/ktime.h> #include <linux/hrtimer.h> +#include <linux/dma-fence-array.h> #include <trace/events/dma_fence.h> #include <nvif/cl826e.h> @@ -331,9 +332,9 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr) return 0; } -int -nouveau_fence_sync(struct dma_fence *fence, struct nouveau_channel *chan, - bool intr) +static int +__nouveau_fence_sync(struct dma_fence *fence, struct nouveau_channel *chan, + bool intr) { struct nouveau_fence_chan *fctx = chan->fence; struct nouveau_channel *prev = NULL; @@ -356,6 +357,30 @@ nouveau_fence_sync(struct dma_fence *fence, struct nouveau_channel *chan, return ret; } +int +nouveau_fence_sync(struct dma_fence *fence, struct nouveau_channel *chan, + bool intr) +{ + int ret = 0; + + if (dma_fence_is_array(fence)) { + struct dma_fence_array *array = to_dma_fence_array(fence); + unsigned int i; + + for (i = 0; i < array->num_fences; i++) { + struct dma_fence *f = array->fences[i]; + + ret = __nouveau_fence_sync(f, chan, intr); + if (ret < 0) + break; + } + } else { + ret = __nouveau_fence_sync(fence, chan, intr); + } + + return ret; +} + void nouveau_fence_unref(struct nouveau_fence **pfence) { -- 2.15.1
Possibly Parallel Threads
- [PATCH 0/6] drm/nouveau: Support sync FDs and sync objects
- [RFC] Explicit synchronization for Nouveau
- [PATCH 00/17] Convert TTM to the new fence interface.
- [RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface
- [PATCH 01/19] fence: add debugging lines to fence_is_signaled for the callback