Karol Herbst
2021-Nov-03 21:25 UTC
[Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in ttm_transfered_destroy
On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac at gmx.de> wrote:> > On 2021-11-03 21:32 +0100, Karol Herbst wrote: > > > On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst at redhat.com> wrote: > >> > >> On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac at gmx.de> wrote: > >> > > >> > On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote: > >> > > >> > > From: Christian K?nig <christian.koenig at amd.com> > >> > > > >> > > commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream. > >> > > > >> > > We need to cleanup the fences for ghost objects as well. > >> > > > >> > > Signed-off-by: Christian K?nig <christian.koenig at amd.com> > >> > > Reported-by: Erhard F. <erhard_f at mailbox.org> > >> > > Tested-by: Erhard F. <erhard_f at mailbox.org> > >> > > Reviewed-by: Huang Rui <ray.huang at amd.com> > >> > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214029 > >> > > Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214447 > >> > > CC: <stable at vger.kernel.org> > >> > > Link: https://patchwork.freedesktop.org/patch/msgid/20211020173211.2247-1-christian.koenig at amd.com > >> > > Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org> > >> > > --- > >> > > drivers/gpu/drm/ttm/ttm_bo_util.c | 1 + > >> > > 1 file changed, 1 insertion(+) > >> > > > >> > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > >> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > >> > > @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc > >> > > struct ttm_transfer_obj *fbo; > >> > > > >> > > fbo = container_of(bo, struct ttm_transfer_obj, base); > >> > > + dma_resv_fini(&fbo->base.base._resv); > >> > > ttm_bo_put(fbo->bo); > >> > > kfree(fbo); > >> > > } > >> > > >> > Alas, this innocuous looking commit causes one of my systems to lock up > >> > as soon as run startx. This happens with the nouveau driver, two other > >> > systems with radeon and intel graphics are not affected. Also I only > >> > noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I > >> > do not use 5.4 anymore. > >> > > >> > I am not familiar with nouveau's ttm management and what has changed > >> > there between 5.10 and 5.14, but maybe one of their developers can shed > >> > a light on this. > >> > > >> > Cheers, > >> > Sven > >> > > >> > >> could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1 > > > > maybe not.. but I did remember there being a few tmm related patches > > which only hurt nouveau :/ I guess one could do a git bisect to > > figure out what change "fixes" it. > > Maybe, but since the memory leaks reported by Erhard only started to > show up in 5.14 (if I read the bugzilla reports correctly), perhaps the > patch should simply be reverted on earlier kernels? >Yeah, I think this is probably the right approach.> > On which GPU do you see this problem? > > On an old GeForce 8500 GT, the whole PC is rather ancient. > > Cheers, > Sven >
Christian König
2021-Nov-04 07:39 UTC
[Nouveau] [PATCH 5.10 32/77] drm/ttm: fix memleak in ttm_transfered_destroy
Am 03.11.21 um 22:25 schrieb Karol Herbst:> On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim <svenjoac at gmx.de> wrote: >> On 2021-11-03 21:32 +0100, Karol Herbst wrote: >> >>> On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst <kherbst at redhat.com> wrote: >>>> On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim <svenjoac at gmx.de> wrote: >>>>> On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote: >>>>> >>>>>> From: Christian K?nig <christian.koenig at amd.com> >>>>>> >>>>>> commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream. >>>>>> >>>>>> We need to cleanup the fences for ghost objects as well. >>>>>> >>>>>> Signed-off-by: Christian K?nig <christian.koenig at amd.com> >>>>>> Reported-by: Erhard F. <erhard_f at mailbox.org> >>>>>> Tested-by: Erhard F. <erhard_f at mailbox.org> >>>>>> Reviewed-by: Huang Rui <ray.huang at amd.com> >>>>>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214029&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806624439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UIo0hw0OHeLlGL%2Bcj%2Fjt%2FgTwniaJoNmhgDHSFvymhCc%3D&reserved=0 >>>>>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214447&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TIAUb6AdYm2Bo0%2BvFZUFPS8yu55orjnfxMLCmUgC%2FDk%3D&reserved=0 >>>>>> CC: <stable at vger.kernel.org> >>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20211020173211.2247-1-christian.koenig%40amd.com&data=04%7C01%7Cchristian.koenig%40amd.com%7C9b70f83c53c74b35fee808d99f1091b3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715715806634433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=c9i7AR44MVUyZuXHZkLOCBx2%2BZeetq8alGtbz0Wgqzk%3D&reserved=0 >>>>>> Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org> >>>>>> --- >>>>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c >>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c >>>>>> @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc >>>>>> struct ttm_transfer_obj *fbo; >>>>>> >>>>>> fbo = container_of(bo, struct ttm_transfer_obj, base); >>>>>> + dma_resv_fini(&fbo->base.base._resv); >>>>>> ttm_bo_put(fbo->bo); >>>>>> kfree(fbo); >>>>>> } >>>>> Alas, this innocuous looking commit causes one of my systems to lock up >>>>> as soon as run startx. This happens with the nouveau driver, two other >>>>> systems with radeon and intel graphics are not affected. Also I only >>>>> noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I >>>>> do not use 5.4 anymore. >>>>> >>>>> I am not familiar with nouveau's ttm management and what has changed >>>>> there between 5.10 and 5.14, but maybe one of their developers can shed >>>>> a light on this. >>>>> >>>>> Cheers, >>>>> Sven >>>>> >>>> could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1 >>> maybe not.. but I did remember there being a few tmm related patches >>> which only hurt nouveau :/ I guess one could do a git bisect to >>> figure out what change "fixes" it. >> Maybe, but since the memory leaks reported by Erhard only started to >> show up in 5.14 (if I read the bugzilla reports correctly), perhaps the >> patch should simply be reverted on earlier kernels? >> > Yeah, I think this is probably the right approach.I agree. The problem is this memory leak could potentially happen with 5.10 as wel, just much much much less likely. But my guess is that 5.10 is so buggy that when the leak does NOT happen we double free and obviously causing a crash. So for the sake of stability please don't apply this patch to 5.10. I'm going to comment on the original bug report as well. Thanks, Christian.> >>> On which GPU do you see this problem? >> On an old GeForce 8500 GT, the whole PC is rather ancient. >> >> Cheers, >> Sven >>