Ondrej Zary
2021-Jun-09 20:00 UTC
[Nouveau] nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer dereference in nouveau_bo_sync_for_device
On Wednesday 09 June 2021 11:21:05 Christian K?nig wrote:> Am 09.06.21 um 09:10 schrieb Ondrej Zary: > > On Wednesday 09 June 2021, Christian K?nig wrote: > >> Am 09.06.21 um 08:57 schrieb Ondrej Zary: > >>> [SNIP] > >>>> Thanks for the heads up. So the problem with my patch is already fixed, > >>>> isn't it? > >>> The NULL pointer dereference in nouveau_bo_wr16 introduced in > >>> 141b15e59175aa174ca1f7596188bd15a7ca17ba was fixed by > >>> aea656b0d05ec5b8ed5beb2f94c4dd42ea834e9d. > >>> > >>> That's the bug I hit when bisecting the original problem: > >>> NULL pointer dereference in nouveau_bo_sync_for_device > >>> It's caused by: > >>> # first bad commit: [e34b8feeaa4b65725b25f49c9b08a0f8707e8e86] drm/ttm: merge ttm_dma_tt back into ttm_tt > >> Good that I've asked :) > >> > >> Ok that's a bit strange. e34b8feeaa4b65725b25f49c9b08a0f8707e8e86 was > >> created mostly automated. > >> > >> Do you have the original backtrace of that NULL pointer deref once more? > > The original backtrace is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2021%2F6%2F5%2F350&data=04%7C01%7Cchristian.koenig%40amd.com%7Ce905b6bd2aa842ace15508d92b15b96d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637588195000729460%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zFqheBbJcOHtYgqG%2Fs63AT1dwuk4REmUDJWHvzaLAlc%3D&reserved=0 > > And the problem is that ttm_dma->dma_address is NULL, right? Mhm, I > don't see how that can happen since nouveau is using ttm_sg_tt_init(). > > Apart from that what nouveau does here is rather questionable since you > need a coherent architecture for most things anyway, but that's not what > we are trying to fix here. > > Can you try to narrow down if ttm_sg_tt_init is called before calling > this function for the tt object in question?ttm_sg_tt_init is not called: [ 12.150124] nouveau 0000:01:00.0: DRM: VRAM: 31 MiB [ 12.150133] nouveau 0000:01:00.0: DRM: GART: 128 MiB [ 12.150143] nouveau 0000:01:00.0: DRM: BMP version 5.6 [ 12.150151] nouveau 0000:01:00.0: DRM: No DCB data found in VBIOS [ 12.151362] ttm_tt_init [ 12.151370] ttm_tt_init_fields [ 12.151374] ttm_tt_alloc_page_directory [ 12.151615] BUG: kernel NULL pointer dereference, address: 00000000 -- Ondrej Zary
Christian König
2021-Jun-10 06:43 UTC
[Nouveau] nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer dereference in nouveau_bo_sync_for_device
Am 09.06.21 um 22:00 schrieb Ondrej Zary:> On Wednesday 09 June 2021 11:21:05 Christian K?nig wrote: >> Am 09.06.21 um 09:10 schrieb Ondrej Zary: >>> On Wednesday 09 June 2021, Christian K?nig wrote: >>>> Am 09.06.21 um 08:57 schrieb Ondrej Zary: >>>>> [SNIP] >>>>>> Thanks for the heads up. So the problem with my patch is already fixed, >>>>>> isn't it? >>>>> The NULL pointer dereference in nouveau_bo_wr16 introduced in >>>>> 141b15e59175aa174ca1f7596188bd15a7ca17ba was fixed by >>>>> aea656b0d05ec5b8ed5beb2f94c4dd42ea834e9d. >>>>> >>>>> That's the bug I hit when bisecting the original problem: >>>>> NULL pointer dereference in nouveau_bo_sync_for_device >>>>> It's caused by: >>>>> # first bad commit: [e34b8feeaa4b65725b25f49c9b08a0f8707e8e86] drm/ttm: merge ttm_dma_tt back into ttm_tt >>>> Good that I've asked :) >>>> >>>> Ok that's a bit strange. e34b8feeaa4b65725b25f49c9b08a0f8707e8e86 was >>>> created mostly automated. >>>> >>>> Do you have the original backtrace of that NULL pointer deref once more? >>> The original backtrace is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2021%2F6%2F5%2F350&data=04%7C01%7Cchristian.koenig%40amd.com%7C4309ff021d5e4cbe948b08d92b813106%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637588657045383056%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=t70c9ktzPJzDaEAcO4wpQMv3TUo5b53cUy66AkLeVwE%3D&reserved=0 >> And the problem is that ttm_dma->dma_address is NULL, right? Mhm, I >> don't see how that can happen since nouveau is using ttm_sg_tt_init(). >> >> Apart from that what nouveau does here is rather questionable since you >> need a coherent architecture for most things anyway, but that's not what >> we are trying to fix here. >> >> Can you try to narrow down if ttm_sg_tt_init is called before calling >> this function for the tt object in question? > ttm_sg_tt_init is not called: > [ 12.150124] nouveau 0000:01:00.0: DRM: VRAM: 31 MiB > [ 12.150133] nouveau 0000:01:00.0: DRM: GART: 128 MiB > [ 12.150143] nouveau 0000:01:00.0: DRM: BMP version 5.6 > [ 12.150151] nouveau 0000:01:00.0: DRM: No DCB data found in VBIOS > [ 12.151362] ttm_tt_init > [ 12.151370] ttm_tt_init_fields > [ 12.151374] ttm_tt_alloc_page_directory > [ 12.151615] BUG: kernel NULL pointer dereference, address: 00000000Please add dump_stack(); to ttm_tt_init() and report back with the backtrace. I can't see how this is called from the nouveau code, only possibility I see is that it is maybe called through the AGP code somehow. Christian.