John Ogness
2023-Mar-27 08:55 UTC
[Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
On 2023-01-19, Tanmay Bhushan <007047221b at gmail.com> wrote:> dma_resv_wait_timeout returns greater than zero on success > as opposed to ttm_bo_wait_ctx. As a result of that relocs > will fail and give failure even when it was a success.Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing lots of new kernel messages: [ 642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 [ 642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 [ 646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 [ 646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 The graphics seemed to go slower or hang a bit when these messages would appear. I then found your patch! However, I have some comments about it. First, it should include a fixes tag: Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")> Signed-off-by: Tanmay Bhushan <007047221b at gmail.com> > --- > drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c > index f77e44958037..0e3690459144 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c > @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, > ret = dma_resv_wait_timeout(nvbo->bo.base.resv, > DMA_RESV_USAGE_BOOKKEEP, > false, 15 * HZ); > - if (ret == 0) > + if (ret <= 0) { > ret = -EBUSY;This is incorrect for 2 reasons: * it treats restarts as timeouts * this function now returns >0 on success> - if (ret) { > NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", > ret); > break;I rearranged things to basically correctly translate the return code of dma_resv_wait_timeout() to match the previous ttm_bo_wait(): ret = dma_resv_wait_timeout(nvbo->bo.base.resv, DMA_RESV_USAGE_BOOKKEEP, false, 15 * HZ); if (ret == 0) ret = -EBUSY; if (ret > 0) ret = 0; if (ret) { NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", ret); break; } So the patch just becomes: @@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n false, 15 * HZ); if (ret == 0) ret = -EBUSY; + if (ret > 0) + ret = 0; if (ret) { NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", ret); With this variant, everything runs correctly on my workstation again. It probably deserves a comment about why @ret is being translated. Or perhaps a new variable should be introduced to separate the return value of dma_resv_wait_timeout() from the return value of this function. Either way, this is an important fix for 6.3-rc! John Ogness
Christian König
2023-Mar-27 19:53 UTC
[Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
Am 27.03.23 um 10:42 schrieb John Ogness:> On 2023-01-19, Tanmay Bhushan <007047221b at gmail.com> wrote: >> dma_resv_wait_timeout returns greater than zero on success >> as opposed to ttm_bo_wait_ctx. As a result of that relocs >> will fail and give failure even when it was a success. > Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing > lots of new kernel messages: > > [ 642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 > [ 642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 > [ 646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 > [ 646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 > > The graphics seemed to go slower or hang a bit when these messages would > appear. I then found your patch! However, I have some comments about it. > > First, it should include a fixes tag: > > Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait") > >> Signed-off-by: Tanmay Bhushan <007047221b at gmail.com> >> --- >> drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c >> index f77e44958037..0e3690459144 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c >> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, >> ret = dma_resv_wait_timeout(nvbo->bo.base.resv, >> DMA_RESV_USAGE_BOOKKEEP, >> false, 15 * HZ); >> - if (ret == 0) >> + if (ret <= 0) { >> ret = -EBUSY; > This is incorrect for 2 reasons: > > * it treats restarts as timeouts > > * this function now returns >0 on success > >> - if (ret) { >> NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", >> ret); >> break; > I rearranged things to basically correctly translate the return code of > dma_resv_wait_timeout() to match the previous ttm_bo_wait(): > > ret = dma_resv_wait_timeout(nvbo->bo.base.resv, > DMA_RESV_USAGE_BOOKKEEP, > false, 15 * HZ); > if (ret == 0) > ret = -EBUSY; > if (ret > 0) > ret = 0; > if (ret) { > NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", > ret); > break; > } > > So the patch just becomes: > > @@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n > false, 15 * HZ); > if (ret == 0) > ret = -EBUSY; > + if (ret > 0) > + ret = 0; > if (ret) { > NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", > ret); > > With this variant, everything runs correctly on my workstation again. > > It probably deserves a comment about why @ret is being translated. Or > perhaps a new variable should be introduced to separate the return value > of dma_resv_wait_timeout() from the return value of this function.I'm going to take a look tomorrow, but your code already looks pretty correct to me. And sorry for the noise, missed the different in the conversion. Thanks, Christian.> > Either way, this is an important fix for 6.3-rc! > > John Ogness