Dave Airlie
2025-Jan-09 00:55 UTC
[PATCH] nouveau/fence: handle cross device fences properly. (v3)
From: Dave Airlie <airlied at redhat.com> This is the 3rd iteration of this after talking to Ben and Danilo, I think this makes sense now. The fence sync logic doesn't handle a fence sync across devices as it tries to write to a channel offset from one device into the fence bo from a different device, which won't work so well. This patch fixes that to avoid using the sync path in the case where the fences come from different nouveau drm devices. This works fine on a single device as the fence bo is shared across the devices, and mapped into each channels vma space, the channel offsets are therefore okay to pass between sides, so one channel can sync on the seqnos from the other by using the offset into it's vma. Signed-off-by: Dave Airlie <airlied at redhat.com> Cc: stable at vger.kernel.org --- drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index ee5e9d40c166..a3eb1f447a29 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, if (f) { struct nouveau_channel *prev; bool must_wait = true; + bool local; rcu_read_lock(); prev = rcu_dereference(f->channel); - if (prev && (prev == chan || - fctx->sync(f, prev, chan) == 0)) + local = prev && prev->drm == chan->drm; + if (local && (prev == chan || + fctx->sync(f, prev, chan) == 0)) must_wait = false; rcu_read_unlock(); if (!must_wait) -- 2.43.0
Ben Skeggs
2025-Jan-09 03:42 UTC
[PATCH] nouveau/fence: handle cross device fences properly. (v3)
On 9/1/25 10:55, Dave Airlie wrote:> From: Dave Airlie <airlied at redhat.com> > > This is the 3rd iteration of this after talking to Ben and > Danilo, I think this makes sense now. > > The fence sync logic doesn't handle a fence sync across devices > as it tries to write to a channel offset from one device into > the fence bo from a different device, which won't work so well. > > This patch fixes that to avoid using the sync path in the case > where the fences come from different nouveau drm devices. > > This works fine on a single device as the fence bo is shared > across the devices, and mapped into each channels vma space, > the channel offsets are therefore okay to pass between sides, > so one channel can sync on the seqnos from the other by using > the offset into it's vma. > > Signed-off-by: Dave Airlie <airlied at redhat.com>Signed-off-by: Ben Skeggs <bskeggs at nvidia.com>> Cc: stable at vger.kernel.org > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166..a3eb1f447a29 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > if (f) { > struct nouveau_channel *prev; > bool must_wait = true; > + bool local; > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > - fctx->sync(f, prev, chan) == 0)) > + local = prev && prev->drm == chan->drm; > + if (local && (prev == chan || > + fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > if (!must_wait)
Danilo Krummrich
2025-Jan-09 16:58 UTC
[PATCH] nouveau/fence: handle cross device fences properly. (v3)
On Thu, Jan 09, 2025 at 10:55:53AM +1000, Dave Airlie wrote:> From: Dave Airlie <airlied at redhat.com> > > This is the 3rd iteration of this after talking to Ben and > Danilo, I think this makes sense now. > > The fence sync logic doesn't handle a fence sync across devices > as it tries to write to a channel offset from one device into > the fence bo from a different device, which won't work so well. > > This patch fixes that to avoid using the sync path in the case > where the fences come from different nouveau drm devices. > > This works fine on a single device as the fence bo is shared > across the devices, and mapped into each channels vma space, > the channel offsets are therefore okay to pass between sides, > so one channel can sync on the seqnos from the other by using > the offset into it's vma.Huh, they indeed all share and map drm->fence->bo, good catch.> > Signed-off-by: Dave Airlie <airlied at redhat.com> > Cc: stable at vger.kernel.org > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166..a3eb1f447a29 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -367,11 +367,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > if (f) { > struct nouveau_channel *prev; > bool must_wait = true; > + bool local; > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > - fctx->sync(f, prev, chan) == 0)) > + local = prev && prev->drm == chan->drm;struct nouveau_channel has no pointer to a struct nouveau_drm, this should be prev->cli->drm and chan->cli->drm instead. No need to resend, I can fix it when applying the patch if you want.> + if (local && (prev == chan || > + fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > if (!must_wait) > -- > 2.43.0 >
kernel test robot
2025-Jan-10 02:28 UTC
[PATCH] nouveau/fence: handle cross device fences properly. (v3)
Hi Dave, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on drm-misc/drm-misc-next drm-tip/drm-tip v6.13-rc6 next-20250109] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Dave-Airlie/nouveau-fence-handle-cross-device-fences-properly-v3/20250109-085805 base: linus/master patch link: https://lore.kernel.org/r/20250109005553.623947-1-airlied%40gmail.com patch subject: [PATCH] nouveau/fence: handle cross device fences properly. (v3) config: loongarch-randconfig-002-20250110 (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp at intel.com/config) compiler: loongarch64-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250110/202501101033.wlEjeZwK-lkp at intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp at intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501101033.wlEjeZwK-lkp at intel.com/ All errors (new ones prefixed by >>): drivers/gpu/drm/nouveau/nouveau_fence.c: In function 'nouveau_fence_sync':>> drivers/gpu/drm/nouveau/nouveau_fence.c:394:53: error: 'struct nouveau_channel' has no member named 'drm'394 | local = prev && prev->drm == chan->drm; | ^~ drivers/gpu/drm/nouveau/nouveau_fence.c:394:66: error: 'struct nouveau_channel' has no member named 'drm' 394 | local = prev && prev->drm == chan->drm; | ^~ vim +394 drivers/gpu/drm/nouveau/nouveau_fence.c 356 357 int 358 nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, 359 bool exclusive, bool intr) 360 { 361 struct nouveau_fence_chan *fctx = chan->fence; 362 struct dma_resv *resv = nvbo->bo.base.resv; 363 int i, ret; 364 365 ret = dma_resv_reserve_fences(resv, 1); 366 if (ret) 367 return ret; 368 369 /* Waiting for the writes first causes performance regressions 370 * under some circumstances. So manually wait for the reads first. 371 */ 372 for (i = 0; i < 2; ++i) { 373 struct dma_resv_iter cursor; 374 struct dma_fence *fence; 375 376 dma_resv_for_each_fence(&cursor, resv, 377 dma_resv_usage_rw(exclusive), 378 fence) { 379 enum dma_resv_usage usage; 380 struct nouveau_fence *f; 381 382 usage = dma_resv_iter_usage(&cursor); 383 if (i == 0 && usage == DMA_RESV_USAGE_WRITE) 384 continue; 385 386 f = nouveau_local_fence(fence, chan->cli->drm); 387 if (f) { 388 struct nouveau_channel *prev; 389 bool must_wait = true; 390 bool local; 391 392 rcu_read_lock(); 393 prev = rcu_dereference(f->channel); > 394 local = prev && prev->drm == chan->drm; 395 if (local && (prev == chan || 396 fctx->sync(f, prev, chan) == 0)) 397 must_wait = false; 398 rcu_read_unlock(); 399 if (!must_wait) 400 continue; 401 } 402 403 ret = dma_fence_wait(fence, intr); 404 if (ret) 405 return ret; 406 } 407 } 408 409 return 0; 410 } 411 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki