Karol Herbst
2022-Sep-20 11:36 UTC
[Nouveau] [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <carnil at debian.org> wrote:> > Hi, > > On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote: > > It is a bit unlcear to us why that's helping, but it does and unbreaks > > suspend/resume on a lot of GPUs without any known drawbacks. > > > > Cc: stable at vger.kernel.org # v5.15+ > > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156 > > Signed-off-by: Karol Herbst <kherbst at redhat.com> > > --- > > drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c > > index 35bb0bb3fe61..126b3c6e12f9 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, > > if (ret == 0) { > > ret = nouveau_fence_new(chan, false, &fence); > > if (ret == 0) { > > + /* TODO: figure out a better solution here > > + * > > + * wait on the fence here explicitly as going through > > + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it. > > + * > > + * Without this the operation can timeout and we'll fallback to a > > + * software copy, which might take several minutes to finish. > > + */ > > + nouveau_fence_wait(fence, false, false); > > ret = ttm_bo_move_accel_cleanup(bo, > > &fence->base, > > evict, false, > > -- > > 2.37.1 > > > > > > While this is marked for 5.15+ only, a user in Debian was seeing the > suspend issue as well on 5.10.y and did confirm the commit fixes the > issue as well in the 5.10.y series: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69 > > Karol, Lyude, should that as well be picked for 5.10.y? >mhh from the original report 5.10 was fine, but maybe something got backported and it broke it? I'll try to do some testing on my machine and see what I can figure out, but it could also be a debian only issue at this point.> Regards, > Salvatore >
Salvatore Bonaccorso
2022-Sep-20 11:59 UTC
[Nouveau] [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf
Hi, On Tue, Sep 20, 2022 at 01:36:32PM +0200, Karol Herbst wrote:> On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <carnil at debian.org> wrote: > > > > Hi, > > > > On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote: > > > It is a bit unlcear to us why that's helping, but it does and unbreaks > > > suspend/resume on a lot of GPUs without any known drawbacks. > > > > > > Cc: stable at vger.kernel.org # v5.15+ > > > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156 > > > Signed-off-by: Karol Herbst <kherbst at redhat.com> > > > --- > > > drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c > > > index 35bb0bb3fe61..126b3c6e12f9 100644 > > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > > > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, > > > if (ret == 0) { > > > ret = nouveau_fence_new(chan, false, &fence); > > > if (ret == 0) { > > > + /* TODO: figure out a better solution here > > > + * > > > + * wait on the fence here explicitly as going through > > > + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it. > > > + * > > > + * Without this the operation can timeout and we'll fallback to a > > > + * software copy, which might take several minutes to finish. > > > + */ > > > + nouveau_fence_wait(fence, false, false); > > > ret = ttm_bo_move_accel_cleanup(bo, > > > &fence->base, > > > evict, false, > > > -- > > > 2.37.1 > > > > > > > > > > While this is marked for 5.15+ only, a user in Debian was seeing the > > suspend issue as well on 5.10.y and did confirm the commit fixes the > > issue as well in the 5.10.y series: > > > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69 > > > > Karol, Lyude, should that as well be picked for 5.10.y? > > > > mhh from the original report 5.10 was fine, but maybe something got > backported and it broke it? I'll try to do some testing on my machine > and see what I can figure out, but it could also be a debian only > issue at this point.Right, this is a possiblity, thanks for looking into it! Computer Enthusiastic, can you verify the problem as well in a non-Debian patched upstream kernel directly from the 5.10.y series (latest 5.10.144) and verify the fix there? Regards, Salvatore