thr3ads.net - Nouveau - [Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected [Jan 2023]

If this information is useful, please help other people find it:
Share via:

Ben Skeggs

2023-Jan-30 01:09 UTC

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

On Sat, 28 Jan 2023 at 21:29, Chris Clayton <chris2553 at googlemail.com>
wrote:>
>
>
> On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
> > On 27.01.23 20:46, Chris Clayton wrote:
> >> [Resend because the mail client on my phone decided to turn HTML
on behind my back, so my reply got bounced.]
> >>
> >> Thanks Thorsten.
> >>
> >> I did try to revert but it didnt revert cleanly and I don't
have the knowledge to fix it up.
> >>
> >> The patch was part of a merge that included a number of related
patches. Tomorrow, I'll try to revert the lot and report
> >> back.
> >
> > You are free to do so, but there is no need for that from my side. I
> > only wanted to know if a simple revert would do the trick; if it
> > doesn't, it in my experience often is best to leave things to the
> > developers of the code in question,
>
> Sound advice, Thorsten. Way to many conflicts for me to resolve.Hey,

This is a complete shot-in-the-dark, as I don't see this behaviour on
*any* of my boards.  Could you try the attached patch please?

Thanks,
Ben.
>
> as they know it best and thus have a
> > better idea which hidden side effect a more complex revert might have.
> >
> > Ciao, Thorsten
> >
> >> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten
Leemhuis) wrote:
> >>> Hi, this is your Linux kernel regression tracker. Top-posting
for once,
> >>> to make this easily accessible to everyone.
> >>>
> >>> @nouveau-maintainers, did anyone take a look at this? The
report is
> >>> already 8 days old and I don't see a single reply. Sure,
we'll likely
> >>> get a -rc8, but still it would be good to not fix this on the
finish line.
> >>>
> >>> Chris, btw, did you try if you can revert the commit on top of
latest
> >>> mainline? And if so, does it fix the problem?
> >>>
> >>> Ciao, Thorsten (wearing his 'the Linux kernel's
regression tracker' hat)
> >>> --
> >>> Everything you wanna know about Linux kernel regression
tracking:
> >>> https://linux-regtracking.leemhuis.info/about/#tldr
> >>> If I did something stupid, please tell me, as explained on
that page.
> >>>
> >>> #regzbot poke
> >>>
> >>> On 19.01.23 15:33, Linux kernel regression tracking (Thorsten
Leemhuis)
> >>> wrote:
> >>>> [adding various lists and the two other nouveau
maintainers to the list
> >>>> of recipients]
> >>>
> >>>> On 18.01.23 21:59, Chris Clayton wrote:
> >>>>> Hi.
> >>>>>
> >>>>> I build and installed the lastest development kernel
earlier this week. I've found that when I try the laptop down (or
> >>>>> reboot it), it hangs right at the end of closing the
current session. The last line I see on  the screen when rebooting is:
> >>>>>
> >>>>>   sd 4:0:0:0: [sda] Synchronising SCSI cache
> >>>>>
> >>>>> when closing down I see one additional line:
> >>>>>
> >>>>>   sd 4:0:0:0 [sda]Stopping disk
> >>>>>
> >>>>> In both cases the machine then hangs and I have to
hold down the power button fot a few seconds to switch it off.
> >>>>>
> >>>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected
between this two and landed on:
> >>>>>
> >>>>>   # first bad commit:
[0e44c21708761977dcbea9b846b51a6fb684907a] drm/nouveau/flcn: new code to
load+boot simple HS FWs
> >>>>> (VPR scrubber)
> >>>>>
> >>>>> I built and installed a kernel with
f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) checked
out
> >>>>> and that shuts down and reboots fine. It the did the
same with the bad commit checked out and that does indeed hang, so
> >>>>> I'm confident the bisect outcome is OK.
> >>>>>
> >>>>> Kernels 6.1.6 and 5.15.88 are also OK.
> >>>>>
> >>>>> My system had dual GPUs - one intel and one NVidia.
Related extracts from 'lscpi -v' is:
> >>>>>
> >>>>> 00:02.0 VGA compatible controller: Intel Corporation
CometLake-H GT2 [UHD Graphics] (rev 05) (prog-if 00 [VGA controller])
> >>>>>         Subsystem: CLEVO/KAPOK Computer CometLake-H
GT2 [UHD Graphics]
> >>>>>
> >>>>>         Flags: bus master, fast devsel, latency 0, IRQ
142
> >>>>>
> >>>>>         Memory at c2000000 (64-bit, non-prefetchable)
[size=16M]
> >>>>>
> >>>>>         Memory at a0000000 (64-bit, prefetchable)
[size=256M]
> >>>>>
> >>>>>         I/O ports at 5000 [size=64]
> >>>>>
> >>>>>         Expansion ROM at 000c0000 [virtual] [disabled]
[size=128K]
> >>>>>
> >>>>>         Capabilities: [40] Vendor Specific
Information: Len=0c <?>
> >>>>>
> >>>>>         Capabilities: [70] Express Root Complex
Integrated Endpoint, MSI 00
> >>>>>
> >>>>>         Capabilities: [ac] MSI: Enable+ Count=1/1
Maskable- 64bit-
> >>>>>
> >>>>>         Capabilities: [d0] Power Management version 2
> >>>>>
> >>>>>         Kernel driver in use: i915
> >>>>>
> >>>>>         Kernel modules: i915
> >>>>>
> >>>>>
> >>>>> 01:00.0 VGA compatible controller: NVIDIA Corporation
TU117M [GeForce GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
> >>>>> controller])
> >>>>>         Subsystem: CLEVO/KAPOK Computer TU117M
[GeForce GTX 1650 Ti Mobile]
> >>>>>         Flags: bus master, fast devsel, latency 0, IRQ
141
> >>>>>         Memory at c4000000 (32-bit, non-prefetchable)
[size=16M]
> >>>>>         Memory at b0000000 (64-bit, prefetchable)
[size=256M]
> >>>>>         Memory at c0000000 (64-bit, prefetchable)
[size=32M]
> >>>>>         I/O ports at 4000 [size=128]
> >>>>>         Expansion ROM at c3000000 [disabled]
[size=512K]
> >>>>>         Capabilities: [60] Power Management version 3
> >>>>>         Capabilities: [68] MSI: Enable+ Count=1/1
Maskable- 64bit+
> >>>>>         Capabilities: [78] Express Legacy Endpoint,
MSI 00
> >>>>>         Kernel driver in use: nouveau
> >>>>>         Kernel modules: nouveau
> >>>>>
> >>>>> DRI_PRIME=1 is exported in one of my init scripts
(yes, I am still using sysvinit).
> >>>>>
> >>>>> I've attached the bisect.log, but please let me
know if I can provide any other diagnostics. Please cc me as I'm not
> >>>>> subscribed.
> >>>>
> >>>> Thanks for the report. To be sure the issue doesn't
fall through the
> >>>> cracks unnoticed, I'm adding it to regzbot, the Linux
kernel regression
> >>>> tracking bot:
> >>>>
> >>>> #regzbot ^introduced e44c2170876197
> >>>> #regzbot title drm: nouveau: hangs on poweroff/reboot
> >>>> #regzbot ignore-activity
> >>>>
> >>>> This isn't a regression? This issue or a fix for it
are already
> >>>> discussed somewhere else? It was fixed already? You want
to clarify when
> >>>> the regression started to happen? Or point out I got the
title or
> >>>> something else totally wrong? Then just reply and tell me
-- ideally
> >>>> while also telling regzbot about it, as explained by the
page listed in
> >>>> the footer of this mail.
> >>>>
> >>>> Developers: When fixing the issue, remember to add
'Link:' tags pointing
> >>>> to the report (the parent of this mail). See page linked
in footer for
> >>>> details.
> >>>>
> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's
regression tracker' hat)
> >>>> --
> >>>> Everything you wanna know about Linux kernel regression
tracking:
> >>>> https://linux-regtracking.leemhuis.info/about/#tldr
> >>>> That page also explains what to do if mails like this
annoy you.
> >>
> >>-------------- next part --------------
A non-text attachment was scrubbed...
Name: nvdec0-reset.diff
Type: text/x-patch
Size: 849 bytes
Desc: not available
URL:
<https://lists.freedesktop.org/archives/nouveau/attachments/20230130/a52ad70d/attachment.bin>

Chris Clayton

2023-Jan-30 20:19 UTC

head link

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

Thanks, Ben.

On 30/01/2023 01:09, Ben Skeggs wrote:> On Sat, 28 Jan 2023 at 21:29, Chris Clayton <chris2553 at
googlemail.com> wrote:
>>
>>
>>
>> On 28/01/2023 05:42, Linux kernel regression tracking (Thorsten
Leemhuis) wrote:
>>> On 27.01.23 20:46, Chris Clayton wrote:
>>>> [Resend because the mail client on my phone decided to turn
HTML on behind my back, so my reply got bounced.]
>>>>
>>>> Thanks Thorsten.
>>>>
>>>> I did try to revert but it didnt revert cleanly and I don't
have the knowledge to fix it up.
>>>>
>>>> The patch was part of a merge that included a number of related
patches. Tomorrow, I'll try to revert the lot and report
>>>> back.
>>>
>>> You are free to do so, but there is no need for that from my side.
I
>>> only wanted to know if a simple revert would do the trick; if it
>>> doesn't, it in my experience often is best to leave things to
the
>>> developers of the code in question,
>>
>> Sound advice, Thorsten. Way to many conflicts for me to resolve.
> Hey,
> 
> This is a complete shot-in-the-dark, as I don't see this behaviour on
> *any* of my boards.  Could you try the attached patch please?
Unfortunately, the patch made no difference.

I've been looking at how the graphics on my laptop is set up, and have a bit
of a worry about whether the firmware might
be playing a part in this problem. In order to offload video decoding to the
NVidia TU117 GPU, it seems the scrubber
firmware must be available, but as far as I know,that has not been released by
NVidia. To get it to work, I followed
what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/ is a
symlink to
../../tu116/nvdev/scrubber.bin. That, of course, means that some of the firmware
loaded is for a different card is being
loaded. I note that processing related to firmware is being changed in the
patch. Might my set up be at the root of my
problem?

I'll have a fiddle an see what I can work out.

Chris
> 
> Thanks,
> Ben.
> 
>>
>> as they know it best and thus have a
>>> better idea which hidden side effect a more complex revert might
have.
>>>
>>> Ciao, Thorsten
>>>
>>>> On 27/01/2023 11:20, Linux kernel regression tracking (Thorsten
Leemhuis) wrote:
>>>>> Hi, this is your Linux kernel regression tracker.
Top-posting for once,
>>>>> to make this easily accessible to everyone.
>>>>>
>>>>> @nouveau-maintainers, did anyone take a look at this? The
report is
>>>>> already 8 days old and I don't see a single reply.
Sure, we'll likely
>>>>> get a -rc8, but still it would be good to not fix this on
the finish line.
>>>>>
>>>>> Chris, btw, did you try if you can revert the commit on top
of latest
>>>>> mainline? And if so, does it fix the problem?
>>>>>
>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's
regression tracker' hat)
>>>>> --
>>>>> Everything you wanna know about Linux kernel regression
tracking:
>>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>>> If I did something stupid, please tell me, as explained on
that page.
>>>>>
>>>>> #regzbot poke
>>>>>
>>>>> On 19.01.23 15:33, Linux kernel regression tracking
(Thorsten Leemhuis)
>>>>> wrote:
>>>>>> [adding various lists and the two other nouveau
maintainers to the list
>>>>>> of recipients]
>>>>>
>>>>>> On 18.01.23 21:59, Chris Clayton wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> I build and installed the lastest development
kernel earlier this week. I've found that when I try the laptop down (or
>>>>>>> reboot it), it hangs right at the end of closing
the current session. The last line I see on  the screen when rebooting is:
>>>>>>>
>>>>>>>   sd 4:0:0:0: [sda] Synchronising SCSI cache
>>>>>>>
>>>>>>> when closing down I see one additional line:
>>>>>>>
>>>>>>>   sd 4:0:0:0 [sda]Stopping disk
>>>>>>>
>>>>>>> In both cases the machine then hangs and I have to
hold down the power button fot a few seconds to switch it off.
>>>>>>>
>>>>>>> Linux 6.1 is OK but 6.2-rc1 hangs, so I bisected
between this two and landed on:
>>>>>>>
>>>>>>>   # first bad commit:
[0e44c21708761977dcbea9b846b51a6fb684907a] drm/nouveau/flcn: new code to
load+boot simple HS FWs
>>>>>>> (VPR scrubber)
>>>>>>>
>>>>>>> I built and installed a kernel with
f15cde64b66161bfa74fb58f4e5697d8265b802e (the parent of the bad commit) checked
out
>>>>>>> and that shuts down and reboots fine. It the did
the same with the bad commit checked out and that does indeed hang, so
>>>>>>> I'm confident the bisect outcome is OK.
>>>>>>>
>>>>>>> Kernels 6.1.6 and 5.15.88 are also OK.
>>>>>>>
>>>>>>> My system had dual GPUs - one intel and one NVidia.
Related extracts from 'lscpi -v' is:
>>>>>>>
>>>>>>> 00:02.0 VGA compatible controller: Intel
Corporation CometLake-H GT2 [UHD Graphics] (rev 05) (prog-if 00 [VGA
controller])
>>>>>>>         Subsystem: CLEVO/KAPOK Computer CometLake-H
GT2 [UHD Graphics]
>>>>>>>
>>>>>>>         Flags: bus master, fast devsel, latency 0,
IRQ 142
>>>>>>>
>>>>>>>         Memory at c2000000 (64-bit,
non-prefetchable) [size=16M]
>>>>>>>
>>>>>>>         Memory at a0000000 (64-bit, prefetchable)
[size=256M]
>>>>>>>
>>>>>>>         I/O ports at 5000 [size=64]
>>>>>>>
>>>>>>>         Expansion ROM at 000c0000 [virtual]
[disabled] [size=128K]
>>>>>>>
>>>>>>>         Capabilities: [40] Vendor Specific
Information: Len=0c <?>
>>>>>>>
>>>>>>>         Capabilities: [70] Express Root Complex
Integrated Endpoint, MSI 00
>>>>>>>
>>>>>>>         Capabilities: [ac] MSI: Enable+ Count=1/1
Maskable- 64bit-
>>>>>>>
>>>>>>>         Capabilities: [d0] Power Management version
2
>>>>>>>
>>>>>>>         Kernel driver in use: i915
>>>>>>>
>>>>>>>         Kernel modules: i915
>>>>>>>
>>>>>>>
>>>>>>> 01:00.0 VGA compatible controller: NVIDIA
Corporation TU117M [GeForce GTX 1650 Ti Mobile] (rev a1) (prog-if 00 [VGA
>>>>>>> controller])
>>>>>>>         Subsystem: CLEVO/KAPOK Computer TU117M
[GeForce GTX 1650 Ti Mobile]
>>>>>>>         Flags: bus master, fast devsel, latency 0,
IRQ 141
>>>>>>>         Memory at c4000000 (32-bit,
non-prefetchable) [size=16M]
>>>>>>>         Memory at b0000000 (64-bit, prefetchable)
[size=256M]
>>>>>>>         Memory at c0000000 (64-bit, prefetchable)
[size=32M]
>>>>>>>         I/O ports at 4000 [size=128]
>>>>>>>         Expansion ROM at c3000000 [disabled]
[size=512K]
>>>>>>>         Capabilities: [60] Power Management version
3
>>>>>>>         Capabilities: [68] MSI: Enable+ Count=1/1
Maskable- 64bit+
>>>>>>>         Capabilities: [78] Express Legacy Endpoint,
MSI 00
>>>>>>>         Kernel driver in use: nouveau
>>>>>>>         Kernel modules: nouveau
>>>>>>>
>>>>>>> DRI_PRIME=1 is exported in one of my init scripts
(yes, I am still using sysvinit).
>>>>>>>
>>>>>>> I've attached the bisect.log, but please let me
know if I can provide any other diagnostics. Please cc me as I'm not
>>>>>>> subscribed.
>>>>>>
>>>>>> Thanks for the report. To be sure the issue doesn't
fall through the
>>>>>> cracks unnoticed, I'm adding it to regzbot, the
Linux kernel regression
>>>>>> tracking bot:
>>>>>>
>>>>>> #regzbot ^introduced e44c2170876197
>>>>>> #regzbot title drm: nouveau: hangs on poweroff/reboot
>>>>>> #regzbot ignore-activity
>>>>>>
>>>>>> This isn't a regression? This issue or a fix for it
are already
>>>>>> discussed somewhere else? It was fixed already? You
want to clarify when
>>>>>> the regression started to happen? Or point out I got
the title or
>>>>>> something else totally wrong? Then just reply and tell
me -- ideally
>>>>>> while also telling regzbot about it, as explained by
the page listed in
>>>>>> the footer of this mail.
>>>>>>
>>>>>> Developers: When fixing the issue, remember to add
'Link:' tags pointing
>>>>>> to the report (the parent of this mail). See page
linked in footer for
>>>>>> details.
>>>>>>
>>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's
regression tracker' hat)
>>>>>> --
>>>>>> Everything you wanna know about Linux kernel regression
tracking:
>>>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>>>> That page also explains what to do if mails like this
annoy you.
>>>>
>>>>

Chris Clayton

2023-Jan-30 23:09 UTC

head link

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

Hi again.

On 30/01/2023 20:19, Chris Clayton wrote:> Thanks, Ben.
<snip>
>> Hey,
>>
>> This is a complete shot-in-the-dark, as I don't see this behaviour
on
>> *any* of my boards.  Could you try the attached patch please?
> 
> Unfortunately, the patch made no difference.
> 
> I've been looking at how the graphics on my laptop is set up, and have
a bit of a worry about whether the firmware might
> be playing a part in this problem. In order to offload video decoding to
the NVidia TU117 GPU, it seems the scrubber
> firmware must be available, but as far as I know,that has not been released
by NVidia. To get it to work, I followed
> what ubuntu have done and the scrubber in /lib/firmware/nvidia/tu117/nvdec/
is a symlink to
> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the
firmware loaded is for a different card is being
> loaded. I note that processing related to firmware is being changed in the
patch. Might my set up be at the root of my
> problem?
> 
> I'll have a fiddle an see what I can work out.
> 
> Chris
> 
>>
>> Thanks,
>> Ben.
>>
>>>
Well, my fiddling has got my system rebooting and shutting down successfully
again. I found that if I delete the symlink
to the scrubber firmware, reboot and shutdown work again. There are however, a
number of other files in the tu117
firmware directory tree that that are symlinks to actual files in its tu116
counterpart. So I deleted all of those too.
Unfortunately, the absence of one or more of those symlinks causes Xorg to fail
to start. I've reinstated all the links
except scrubber and I now have a system that works as it did until I tried to
run a kernel that includes the bad commit
I identified in my bisection. That includes offloading video decoding to the
NVidia card, so what ever I read that said
the scrubber firmware was needed seems to have been wrong. I get a new message
that (nouveau 0000:01:00.0: fb: VPR
locked, but no scrubber binary!), but, hey, we can't have everything.

If you still want to get to the bottom of this, let me know what you need me to
provide and I'll do my best. I suspect
you might want to because there will a n awful lot of Ubuntu-based systems out
there with that scrubber.bin symlink in
place. On the other hand,m it could but quite a while before ubuntu are
deploying 6.2 or later kernels.

Thanks,

Chris

<snip>

Nouveau - Jan 2023 - linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

[Nouveau] linux-6.2-rc4+ hangs on poweroff/reboot: Bisected