Christian König
2021-Dec-14 09:19 UTC
[Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2
Am 11.12.21 um 10:59 schrieb Stefan Fritsch:> On 09.12.21 11:23, Christian K?nig wrote: >> Always waiting for the exclusive fence resulted on some performance >> regressions. So try to wait for the shared fences first, then the >> exclusive fence should always be signaled already. >> >> v2: fix incorrectly placed "(", add some comment why we do this. >> >> Signed-off-by: Christian K?nig <christian.koenig at amd.com> > > Tested-by: Stefan Fritsch <sf at sfritsch.de>Thanks.> > Please also add a cc for linux-stable, so that this is fixed in 5.15.xSure, but I still need some acked-by or rb from one of the Nouveau guys. So gentle ping on that. Regards, Christian.> > Cheers, > Stefan > >> --- >> ? drivers/gpu/drm/nouveau/nouveau_fence.c | 28 +++++++++++++------------ >> ? 1 file changed, 15 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >> b/drivers/gpu/drm/nouveau/nouveau_fence.c >> index 05d0b3eb3690..0ae416aa76dc 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >> @@ -353,15 +353,22 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (ret) >> ????????????? return ret; >> -??? } >> ? -??? fobj = dma_resv_shared_list(resv); >> -??? fence = dma_resv_excl_fence(resv); >> +??????? fobj = NULL; >> +??? } else { >> +??????? fobj = dma_resv_shared_list(resv); >> +??? } >> ? -??? if (fence) { >> +??? /* Waiting for the exclusive fence first causes performance >> regressions >> +???? * under some circumstances. So manually wait for the shared >> ones first. >> +???? */ >> +??? for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) { >> ????????? struct nouveau_channel *prev = NULL; >> ????????? bool must_wait = true; >> ? +??????? fence = rcu_dereference_protected(fobj->shared[i], >> +??????????????????????? dma_resv_held(resv)); >> + >> ????????? f = nouveau_local_fence(fence, chan->drm); >> ????????? if (f) { >> ????????????? rcu_read_lock(); >> @@ -373,20 +380,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (must_wait) >> ????????????? ret = dma_fence_wait(fence, intr); >> - >> -??????? return ret; >> ????? } >> ? -??? if (!exclusive || !fobj) >> -??????? return ret; >> - >> -??? for (i = 0; i < fobj->shared_count && !ret; ++i) { >> +??? fence = dma_resv_excl_fence(resv); >> +??? if (fence) { >> ????????? struct nouveau_channel *prev = NULL; >> ????????? bool must_wait = true; >> ? -??????? fence = rcu_dereference_protected(fobj->shared[i], >> -??????????????????????? dma_resv_held(resv)); >> - >> ????????? f = nouveau_local_fence(fence, chan->drm); >> ????????? if (f) { >> ????????????? rcu_read_lock(); >> @@ -398,6 +398,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (must_wait) >> ????????????? ret = dma_fence_wait(fence, intr); >> + >> +??????? return ret; >> ????? } >> ? ????? return ret;
Ben Skeggs
2021-Dec-15 22:32 UTC
[Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2
On Tue, 14 Dec 2021 at 19:19, Christian K?nig <ckoenig.leichtzumerken at gmail.com> wrote:> > Am 11.12.21 um 10:59 schrieb Stefan Fritsch: > > On 09.12.21 11:23, Christian K?nig wrote: > >> Always waiting for the exclusive fence resulted on some performance > >> regressions. So try to wait for the shared fences first, then the > >> exclusive fence should always be signaled already. > >> > >> v2: fix incorrectly placed "(", add some comment why we do this. > >> > >> Signed-off-by: Christian K?nig <christian.koenig at amd.com> > > > > Tested-by: Stefan Fritsch <sf at sfritsch.de> > > Thanks. > > > > > Please also add a cc for linux-stable, so that this is fixed in 5.15.x > > Sure, but I still need some acked-by or rb from one of the Nouveau guys. > So gentle ping on that.Acked-by: Ben Skeggs <bskeggs at redhat.com>> > Regards, > Christian. > > > > > Cheers, > > Stefan > > > >> --- > >> drivers/gpu/drm/nouveau/nouveau_fence.c | 28 +++++++++++++------------ > >> 1 file changed, 15 insertions(+), 13 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c > >> b/drivers/gpu/drm/nouveau/nouveau_fence.c > >> index 05d0b3eb3690..0ae416aa76dc 100644 > >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > >> @@ -353,15 +353,22 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, > >> struct nouveau_channel *chan, bool e > >> if (ret) > >> return ret; > >> - } > >> - fobj = dma_resv_shared_list(resv); > >> - fence = dma_resv_excl_fence(resv); > >> + fobj = NULL; > >> + } else { > >> + fobj = dma_resv_shared_list(resv); > >> + } > >> - if (fence) { > >> + /* Waiting for the exclusive fence first causes performance > >> regressions > >> + * under some circumstances. So manually wait for the shared > >> ones first. > >> + */ > >> + for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) { > >> struct nouveau_channel *prev = NULL; > >> bool must_wait = true; > >> + fence = rcu_dereference_protected(fobj->shared[i], > >> + dma_resv_held(resv)); > >> + > >> f = nouveau_local_fence(fence, chan->drm); > >> if (f) { > >> rcu_read_lock(); > >> @@ -373,20 +380,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, > >> struct nouveau_channel *chan, bool e > >> if (must_wait) > >> ret = dma_fence_wait(fence, intr); > >> - > >> - return ret; > >> } > >> - if (!exclusive || !fobj) > >> - return ret; > >> - > >> - for (i = 0; i < fobj->shared_count && !ret; ++i) { > >> + fence = dma_resv_excl_fence(resv); > >> + if (fence) { > >> struct nouveau_channel *prev = NULL; > >> bool must_wait = true; > >> - fence = rcu_dereference_protected(fobj->shared[i], > >> - dma_resv_held(resv)); > >> - > >> f = nouveau_local_fence(fence, chan->drm); > >> if (f) { > >> rcu_read_lock(); > >> @@ -398,6 +398,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, > >> struct nouveau_channel *chan, bool e > >> if (must_wait) > >> ret = dma_fence_wait(fence, intr); > >> + > >> + return ret; > >> } > >> return ret; >