Stefan Fritsch
2021-Dec-11 09:59 UTC
[Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2
On 09.12.21 11:23, Christian K?nig wrote:> Always waiting for the exclusive fence resulted on some performance > regressions. So try to wait for the shared fences first, then the > exclusive fence should always be signaled already. > > v2: fix incorrectly placed "(", add some comment why we do this. > > Signed-off-by: Christian K?nig <christian.koenig at amd.com>Tested-by: Stefan Fritsch <sf at sfritsch.de> Please also add a cc for linux-stable, so that this is fixed in 5.15.x Cheers, Stefan> --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 28 +++++++++++++------------ > 1 file changed, 15 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index 05d0b3eb3690..0ae416aa76dc 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -353,15 +353,22 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e > > if (ret) > return ret; > - } > > - fobj = dma_resv_shared_list(resv); > - fence = dma_resv_excl_fence(resv); > + fobj = NULL; > + } else { > + fobj = dma_resv_shared_list(resv); > + } > > - if (fence) { > + /* Waiting for the exclusive fence first causes performance regressions > + * under some circumstances. So manually wait for the shared ones first. > + */ > + for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) { > struct nouveau_channel *prev = NULL; > bool must_wait = true; > > + fence = rcu_dereference_protected(fobj->shared[i], > + dma_resv_held(resv)); > + > f = nouveau_local_fence(fence, chan->drm); > if (f) { > rcu_read_lock(); > @@ -373,20 +380,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e > > if (must_wait) > ret = dma_fence_wait(fence, intr); > - > - return ret; > } > > - if (!exclusive || !fobj) > - return ret; > - > - for (i = 0; i < fobj->shared_count && !ret; ++i) { > + fence = dma_resv_excl_fence(resv); > + if (fence) { > struct nouveau_channel *prev = NULL; > bool must_wait = true; > > - fence = rcu_dereference_protected(fobj->shared[i], > - dma_resv_held(resv)); > - > f = nouveau_local_fence(fence, chan->drm); > if (f) { > rcu_read_lock(); > @@ -398,6 +398,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e > > if (must_wait) > ret = dma_fence_wait(fence, intr); > + > + return ret; > } > > return ret;
Christian König
2021-Dec-14 09:19 UTC
[Nouveau] [PATCH] drm/nouveau: wait for the exclusive fence after the shared ones v2
Am 11.12.21 um 10:59 schrieb Stefan Fritsch:> On 09.12.21 11:23, Christian K?nig wrote: >> Always waiting for the exclusive fence resulted on some performance >> regressions. So try to wait for the shared fences first, then the >> exclusive fence should always be signaled already. >> >> v2: fix incorrectly placed "(", add some comment why we do this. >> >> Signed-off-by: Christian K?nig <christian.koenig at amd.com> > > Tested-by: Stefan Fritsch <sf at sfritsch.de>Thanks.> > Please also add a cc for linux-stable, so that this is fixed in 5.15.xSure, but I still need some acked-by or rb from one of the Nouveau guys. So gentle ping on that. Regards, Christian.> > Cheers, > Stefan > >> --- >> ? drivers/gpu/drm/nouveau/nouveau_fence.c | 28 +++++++++++++------------ >> ? 1 file changed, 15 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c >> b/drivers/gpu/drm/nouveau/nouveau_fence.c >> index 05d0b3eb3690..0ae416aa76dc 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c >> @@ -353,15 +353,22 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (ret) >> ????????????? return ret; >> -??? } >> ? -??? fobj = dma_resv_shared_list(resv); >> -??? fence = dma_resv_excl_fence(resv); >> +??????? fobj = NULL; >> +??? } else { >> +??????? fobj = dma_resv_shared_list(resv); >> +??? } >> ? -??? if (fence) { >> +??? /* Waiting for the exclusive fence first causes performance >> regressions >> +???? * under some circumstances. So manually wait for the shared >> ones first. >> +???? */ >> +??? for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) { >> ????????? struct nouveau_channel *prev = NULL; >> ????????? bool must_wait = true; >> ? +??????? fence = rcu_dereference_protected(fobj->shared[i], >> +??????????????????????? dma_resv_held(resv)); >> + >> ????????? f = nouveau_local_fence(fence, chan->drm); >> ????????? if (f) { >> ????????????? rcu_read_lock(); >> @@ -373,20 +380,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (must_wait) >> ????????????? ret = dma_fence_wait(fence, intr); >> - >> -??????? return ret; >> ????? } >> ? -??? if (!exclusive || !fobj) >> -??????? return ret; >> - >> -??? for (i = 0; i < fobj->shared_count && !ret; ++i) { >> +??? fence = dma_resv_excl_fence(resv); >> +??? if (fence) { >> ????????? struct nouveau_channel *prev = NULL; >> ????????? bool must_wait = true; >> ? -??????? fence = rcu_dereference_protected(fobj->shared[i], >> -??????????????????????? dma_resv_held(resv)); >> - >> ????????? f = nouveau_local_fence(fence, chan->drm); >> ????????? if (f) { >> ????????????? rcu_read_lock(); >> @@ -398,6 +398,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, >> struct nouveau_channel *chan, bool e >> ? ????????? if (must_wait) >> ????????????? ret = dma_fence_wait(fence, intr); >> + >> +??????? return ret; >> ????? } >> ? ????? return ret;