Tanmay Bhushan
2023-Jan-19 22:54 UTC
[Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
dma_resv_wait_timeout returns greater than zero on success as opposed to ttm_bo_wait_ctx. As a result of that relocs will fail and give failure even when it was a success. Signed-off-by: Tanmay Bhushan <007047221b at gmail.com> --- drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index f77e44958037..0e3690459144 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, ret = dma_resv_wait_timeout(nvbo->bo.base.resv, DMA_RESV_USAGE_BOOKKEEP, false, 15 * HZ); - if (ret == 0) + if (ret <= 0) { ret = -EBUSY; - if (ret) { NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", ret); break; -- 2.34.1
John Ogness
2023-Mar-27 08:55 UTC
[Nouveau] [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
On 2023-01-19, Tanmay Bhushan <007047221b at gmail.com> wrote:> dma_resv_wait_timeout returns greater than zero on success > as opposed to ttm_bo_wait_ctx. As a result of that relocs > will fail and give failure even when it was a success.Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing lots of new kernel messages: [ 642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 [ 642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 [ 646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500 [ 646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500 The graphics seemed to go slower or hang a bit when these messages would appear. I then found your patch! However, I have some comments about it. First, it should include a fixes tag: Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")> Signed-off-by: Tanmay Bhushan <007047221b at gmail.com> > --- > drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c > index f77e44958037..0e3690459144 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c > @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, > ret = dma_resv_wait_timeout(nvbo->bo.base.resv, > DMA_RESV_USAGE_BOOKKEEP, > false, 15 * HZ); > - if (ret == 0) > + if (ret <= 0) { > ret = -EBUSY;This is incorrect for 2 reasons: * it treats restarts as timeouts * this function now returns >0 on success> - if (ret) { > NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", > ret); > break;I rearranged things to basically correctly translate the return code of dma_resv_wait_timeout() to match the previous ttm_bo_wait(): ret = dma_resv_wait_timeout(nvbo->bo.base.resv, DMA_RESV_USAGE_BOOKKEEP, false, 15 * HZ); if (ret == 0) ret = -EBUSY; if (ret > 0) ret = 0; if (ret) { NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", ret); break; } So the patch just becomes: @@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n false, 15 * HZ); if (ret == 0) ret = -EBUSY; + if (ret > 0) + ret = 0; if (ret) { NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n", ret); With this variant, everything runs correctly on my workstation again. It probably deserves a comment about why @ret is being translated. Or perhaps a new variable should be introduced to separate the return value of dma_resv_wait_timeout() from the return value of this function. Either way, this is an important fix for 6.3-rc! John Ogness