Michael S. Tsirkin
2018-Nov-29 01:29 UTC
[Nouveau] 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
On Thu, Nov 29, 2018 at 12:21:31AM +0100, Karol Herbst wrote:> this was already debugged and there is no point in searching inside > the Firmware. It's not a firmware bug or anything. > > The proper fix is to do something inside Nouveau so that we don't > upset the device and being able to runtime resume it again. > > The initial thing we do inside Nouveau to cause those issues is to run > that so called "DEVINIT" script inside the vbios to initialize the > GPU, problem is, it changes something on the PCIe configuration so > that the GPU isn't able to runtime resume anymore. I am in contact > with Nvidia about that issue and hopefully we get the proper answers. > When I was digging into that myself I was able to make the situation > more stable by setting the PCIE link speed to the boot defaults, but > that was still pretty unstable. > > Anyway, because the binary driver fails here as well (through > bumblebee and so on) there isn't much of reverse engineering we can do > besides guessing and trying it on literally every hardware until it > works. > > We also have an upstream bug for this issue: > https://bugzilla.kernel.org/show_bug.cgi?id=156341If you like I can probably dump the pcie registers on card and/or the pcie port under windows. The card works there :) Let me know. -- MST
Karol Herbst
2018-Nov-29 10:53 UTC
[Nouveau] 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
On Thu, Nov 29, 2018 at 2:29 AM Michael S. Tsirkin <mst at redhat.com> wrote:> > On Thu, Nov 29, 2018 at 12:21:31AM +0100, Karol Herbst wrote: > > this was already debugged and there is no point in searching inside > > the Firmware. It's not a firmware bug or anything. > > > > The proper fix is to do something inside Nouveau so that we don't > > upset the device and being able to runtime resume it again. > > > > The initial thing we do inside Nouveau to cause those issues is to run > > that so called "DEVINIT" script inside the vbios to initialize the > > GPU, problem is, it changes something on the PCIe configuration so > > that the GPU isn't able to runtime resume anymore. I am in contact > > with Nvidia about that issue and hopefully we get the proper answers. > > When I was digging into that myself I was able to make the situation > > more stable by setting the PCIE link speed to the boot defaults, but > > that was still pretty unstable. > > > > Anyway, because the binary driver fails here as well (through > > bumblebee and so on) there isn't much of reverse engineering we can do > > besides guessing and trying it on literally every hardware until it > > works. > > > > We also have an upstream bug for this issue: > > https://bugzilla.kernel.org/show_bug.cgi?id=156341 > > If you like I can probably dump the pcie registers on card > and/or the pcie port under windows. The card works there :) > Let me know. > > -- > MSTthe problem is, we would need to know the registers right before suspending the GPU. If someone would be able to trace all PCIe register read and writes for the entire suspending/resume process, that would be very helpful.
Michael S. Tsirkin
2018-Nov-29 17:12 UTC
[Nouveau] 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
On Thu, Nov 29, 2018 at 11:53:53AM +0100, Karol Herbst wrote:> On Thu, Nov 29, 2018 at 2:29 AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Thu, Nov 29, 2018 at 12:21:31AM +0100, Karol Herbst wrote: > > > this was already debugged and there is no point in searching inside > > > the Firmware. It's not a firmware bug or anything. > > > > > > The proper fix is to do something inside Nouveau so that we don't > > > upset the device and being able to runtime resume it again. > > > > > > The initial thing we do inside Nouveau to cause those issues is to run > > > that so called "DEVINIT" script inside the vbios to initialize the > > > GPU, problem is, it changes something on the PCIe configuration so > > > that the GPU isn't able to runtime resume anymore. I am in contact > > > with Nvidia about that issue and hopefully we get the proper answers. > > > When I was digging into that myself I was able to make the situation > > > more stable by setting the PCIE link speed to the boot defaults, but > > > that was still pretty unstable. > > > > > > Anyway, because the binary driver fails here as well (through > > > bumblebee and so on) there isn't much of reverse engineering we can do > > > besides guessing and trying it on literally every hardware until it > > > works. > > > > > > We also have an upstream bug for this issue: > > > https://bugzilla.kernel.org/show_bug.cgi?id=156341 > > > > If you like I can probably dump the pcie registers on card > > and/or the pcie port under windows. The card works there :) > > Let me know. > > > > -- > > MST > > the problem is, we would need to know the registers right before > suspending the GPU. If someone would be able to trace all PCIe > register read and writes for the entire suspending/resume process, > that would be very helpful.Well I can pass the card to a VM, and trace it on the hypervisor, that isn't a problem. A tricky thing is the ACPI tables, would need to somehow know which ones are relevant to pass them to guest ... ideas on that? -- MST
Reasonably Related Threads
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups
- 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups