Hans de Goede
2015-Jul-27 15:52 UTC
[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
Hi, On 24-07-15 04:32, Ben Skeggs wrote:> On 24 July 2015 at 01:20, Hans de Goede <hdegoede at redhat.com> wrote: >> MSI interrupts appear to not work for nv46 based cards. Change the mc >> subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is >> identical to the nv44 mc code except that it does not use msi >> (it does not define a msi_rearm callback). > I'm fine with this, but it'd be nice to check that the binary driver > doesn't/can't use MSI on these too (there might be an alternate method > we need to use). > > Would you be able to grab the latest proprietary driver that works on > nv4x, and do a mmiotrace of it?I've grabbed 304.125> You *might* need to use "modprobe > nvidia NVreg_EnableMSI=1", because at some point NVIDIA didn't use it > by default anywhere.You're right I needed to specify NVreg_EnableMSI=1, with that set /proc/interrupts shows that MSI is used. Here is an of running glxgears with the binary driver using msi interrupts mmiotrace: https://fedorapeople.org/~jwrdegoede/nvidia-bin-nv46-msi-on-glxgears.mmiotrace.gz AFAIK there are some nouveau tools to parse this a bit, right ? I'm going to call it a day for today, if you can give me some pointers what to do with the mmiotrace to find a potential fix for the msi issues, that would be appreciated. BTW I had to build my own kernel with mmiotrace enabled in Kconfig, as this is disabled in the Fedora kernels by default. Do you know if there is a good reason to have this disabled by default, or should I ask the Fedora kernel maintainers to enable it by default ? Slightly offtopic: I decided to be bold and try gnome-shell on the nv46 with msi disabled, which sofar was a guaranteed way to freeze the system, and it now works somewhat (latest kernel, ddx and mesa). I see something which shows horizontal lines which are small parts from my desktop background, and things change significantly when I switch to the overview mode. But other then that the display is completely wrong, it looks a bit like a framebuffer pitch problem, but then different. I think it is likely some tiling problem or some such. Note that metacity + glxgears works, this only shows with gnome-shell, any hints where to start looking wrt debugging this? Or should I first try to run piglet and see if some tests there point out the culprit? Regards, Hans> > Thanks, > Ben. > >> BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=90435 >> Signed-off-by: Hans de Goede <hdegoede at redhat.com> >> --- >> drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >> index c630136..b4ad791 100644 >> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >> @@ -265,7 +265,7 @@ nv40_identify(struct nvkm_device *device) >> device->oclass[NVDEV_SUBDEV_CLK ] = &nv40_clk_oclass; >> device->oclass[NVDEV_SUBDEV_THERM ] = &nv40_therm_oclass; >> device->oclass[NVDEV_SUBDEV_DEVINIT] = nv1a_devinit_oclass; >> - device->oclass[NVDEV_SUBDEV_MC ] = nv44_mc_oclass; >> + device->oclass[NVDEV_SUBDEV_MC ] = nv4c_mc_oclass; >> device->oclass[NVDEV_SUBDEV_BUS ] = nv31_bus_oclass; >> device->oclass[NVDEV_SUBDEV_TIMER ] = &nv04_timer_oclass; >> device->oclass[NVDEV_SUBDEV_FB ] = nv46_fb_oclass; >> -- >> 2.4.3 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/nouveau
Ilia Mirkin
2015-Jul-27 16:25 UTC
[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
On Mon, Jul 27, 2015 at 11:52 AM, Hans de Goede <hdegoede at redhat.com> wrote:> https://fedorapeople.org/~jwrdegoede/nvidia-bin-nv46-msi-on-glxgears.mmiotrace.gz > > AFAIK there are some nouveau tools to parse this a bit, right ? I'm going > to call it a day for today, if you can give me some pointers what to do with > the > mmiotrace to find a potential fix for the msi issues, that would be > appreciated.rnn/demmio -l foo-mmiotrace.gz Enjoy :)
Ben Skeggs
2015-Jul-28 07:26 UTC
[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
On 28 July 2015 at 01:52, Hans de Goede <hdegoede at redhat.com> wrote:> Hi, > > On 24-07-15 04:32, Ben Skeggs wrote: >> >> On 24 July 2015 at 01:20, Hans de Goede <hdegoede at redhat.com> wrote: >>> >>> MSI interrupts appear to not work for nv46 based cards. Change the mc >>> subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is >>> identical to the nv44 mc code except that it does not use msi >>> (it does not define a msi_rearm callback). >> >> I'm fine with this, but it'd be nice to check that the binary driver >> doesn't/can't use MSI on these too (there might be an alternate method >> we need to use). >> >> Would you be able to grab the latest proprietary driver that works on >> nv4x, and do a mmiotrace of it? > > > I've grabbed 304.125 > >> You *might* need to use "modprobe >> nvidia NVreg_EnableMSI=1", because at some point NVIDIA didn't use it >> by default anywhere. > > > You're right I needed to specify NVreg_EnableMSI=1, with that set > /proc/interrupts shows that MSI is used. > > Here is an of running glxgears with the binary driver using msi interrupts > mmiotrace: > > https://fedorapeople.org/~jwrdegoede/nvidia-bin-nv46-msi-on-glxgears.mmiotrace.gz > > AFAIK there are some nouveau tools to parse this a bit, right ? I'm going > to call it a day for today, if you can give me some pointers what to do with > the > mmiotrace to find a potential fix for the msi issues, that would be > appreciated. > > > BTW I had to build my own kernel with mmiotrace enabled in Kconfig, as this > is disabled in the Fedora kernels by default. Do you know if there is a good > reason to have this disabled by default, or should I ask the Fedora > kernel maintainers to enable it by default ?The -debug kernel has it enabled already. However, it's also problematic in that it enables various lockdep debugging stuff that causes the binary driver kernel module to end up depending on GPL-only symbols, which you have to hack around by changing the MODULE_LICENSE() for the binary driver to "GPL"... Which is clearly a pain :) So, I guess if you want a slightly more straight-forward approach, it'd be good to enable in the non-debug kernels too.> > > Slightly offtopic: > > I decided to be bold and try gnome-shell on the nv46 with msi disabled, > which sofar was a guaranteed way to freeze the system, and it now works > somewhat (latest kernel, ddx and mesa). I see something which shows > horizontal lines which are small parts from my desktop background, and > things change significantly when I switch to the overview mode. > > But other then that the display is completely wrong, it looks a bit > like a framebuffer pitch problem, but then different. I think it > is likely some tiling problem or some such. > > Note that metacity + glxgears works, this only shows with > gnome-shell, any hints where to start looking wrt debugging this?These are the main issues that I'd like to see resolved :)> > Or should I first try to run piglet and see if some tests there > point out the culprit?I think this is a good place to start. Thanks, Ben.> > > Regards, > > Hans > > > >> >> Thanks, >> Ben. >> >>> BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=90435 >>> Signed-off-by: Hans de Goede <hdegoede at redhat.com> >>> --- >>> drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >>> b/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >>> index c630136..b4ad791 100644 >>> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >>> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/nv40.c >>> @@ -265,7 +265,7 @@ nv40_identify(struct nvkm_device *device) >>> device->oclass[NVDEV_SUBDEV_CLK ] = &nv40_clk_oclass; >>> device->oclass[NVDEV_SUBDEV_THERM ] >>> &nv40_therm_oclass; >>> device->oclass[NVDEV_SUBDEV_DEVINIT] >>> nv1a_devinit_oclass; >>> - device->oclass[NVDEV_SUBDEV_MC ] = nv44_mc_oclass; >>> + device->oclass[NVDEV_SUBDEV_MC ] = nv4c_mc_oclass; >>> device->oclass[NVDEV_SUBDEV_BUS ] = nv31_bus_oclass; >>> device->oclass[NVDEV_SUBDEV_TIMER ] >>> &nv04_timer_oclass; >>> device->oclass[NVDEV_SUBDEV_FB ] = nv46_fb_oclass; >>> -- >>> 2.4.3 >>> >>> _______________________________________________ >>> Nouveau mailing list >>> Nouveau at lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/nouveau
Hans de Goede
2015-Jul-29 15:36 UTC
[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
Hi, On 28-07-15 09:26, Ben Skeggs wrote:> On 28 July 2015 at 01:52, Hans de Goede <hdegoede at redhat.com> wrote: >> Hi, >> >> On 24-07-15 04:32, Ben Skeggs wrote: >>> >>> On 24 July 2015 at 01:20, Hans de Goede <hdegoede at redhat.com> wrote: >>>> >>>> MSI interrupts appear to not work for nv46 based cards. Change the mc >>>> subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is >>>> identical to the nv44 mc code except that it does not use msi >>>> (it does not define a msi_rearm callback). >>> >>> I'm fine with this, but it'd be nice to check that the binary driver >>> doesn't/can't use MSI on these too (there might be an alternate method >>> we need to use). >>> >>> Would you be able to grab the latest proprietary driver that works on >>> nv4x, and do a mmiotrace of it? >> >> >> I've grabbed 304.125 >> >>> You *might* need to use "modprobe >>> nvidia NVreg_EnableMSI=1", because at some point NVIDIA didn't use it >>> by default anywhere. >> >> >> You're right I needed to specify NVreg_EnableMSI=1, with that set >> /proc/interrupts shows that MSI is used. >> >> Here is an of running glxgears with the binary driver using msi interrupts >> mmiotrace: >> >> https://fedorapeople.org/~jwrdegoede/nvidia-bin-nv46-msi-on-glxgears.mmiotrace.gz >> >> AFAIK there are some nouveau tools to parse this a bit, right ? I'm going >> to call it a day for today, if you can give me some pointers what to do with >> the >> mmiotrace to find a potential fix for the msi issues, that would be >> appreciated.I've run demmio on this as suggested by Ilia, I've checked all the writes to the pmc pbus and pci ranges, and I've been unable to find anything which helps I'm afraid. I've also checked the interrupt regs of the crt block, and those are correct, and the interrupt flag for vblank is set. So I'm all out of clues I'm afraid. One thing which does stand out is that lspci -vvv shows the following differences between nouveau vs nvidea output: @@ -361,23 +361,23 @@ Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Ste Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- Latency: 0 - Interrupt: pin A routed to IRQ 28 + Interrupt: pin A routed to IRQ 29 Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fc000000 (64-bit, non-prefetchable) [size=16M] - Expansion ROM at fe9e0000 [disabled] [size=128K] + [virtual] Expansion ROM at fe9e0000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3ho Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ - Address: 00000000fee0300c Data: 41a2 + Address: 00000000fee0300c Data: 41c2 Capabilities: [78] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns, ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupport RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes - DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransP + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransP LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit La ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk+ @@ -393,7 +393,7 @@ Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> - Kernel driver in use: nouveau + Kernel driver in use: nvidia Kernel modules: nouveau, nvidia The DevSta shows that we are sending some commands the device does not like. At first I thought this would be the culprit, but as discussed before on some boots things just work and on others they do not (when using nouveau). I've checked a boot with nouveau where things just happen to work, and there UncorrErr+ and UnsuppReq+ are still set when things just work. So I'm officially giving up on this, and I'm going to continue to work on the nv46 with msi disabled. Note that when things do not work, we do get some interrupts, they just stop coming at one point shortly after boot.>> BTW I had to build my own kernel with mmiotrace enabled in Kconfig, as this >> is disabled in the Fedora kernels by default. Do you know if there is a good >> reason to have this disabled by default, or should I ask the Fedora >> kernel maintainers to enable it by default ? > The -debug kernel has it enabled already. However, it's also > problematic in that it enables various lockdep debugging stuff that > causes the binary driver kernel module to end up depending on GPL-only > symbols, which you have to hack around by changing the > MODULE_LICENSE() for the binary driver to "GPL"... Which is clearly a > pain :) So, I guess if you want a slightly more straight-forward > approach, it'd be good to enable in the non-debug kernels too.Ok, before I submit a patch to the Fedora kernel devs for this, mmiotrace uses live patching like the other ftrace stuff, so no performance impact unless actually used, right ?> >> >> >> Slightly offtopic: >> >> I decided to be bold and try gnome-shell on the nv46 with msi disabled, >> which sofar was a guaranteed way to freeze the system, and it now works >> somewhat (latest kernel, ddx and mesa). I see something which shows >> horizontal lines which are small parts from my desktop background, and >> things change significantly when I switch to the overview mode. >> >> But other then that the display is completely wrong, it looks a bit >> like a framebuffer pitch problem, but then different. I think it >> is likely some tiling problem or some such. >> >> Note that metacity + glxgears works, this only shows with >> gnome-shell, any hints where to start looking wrt debugging this?> These are the main issues that I'd like to see resolved :)Agreed getting gnome-shell running is really the minimum level we should support cards at.>> Or should I first try to run piglet and see if some tests there >> point out the culprit? > I think this is a good place to start.Ok, will do. Regards, Hans
Hans de Goede
2015-Jul-30 12:42 UTC
[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
Hi, On 27-07-15 17:52, Hans de Goede wrote:> Slightly offtopic: > > I decided to be bold and try gnome-shell on the nv46 with msi disabled, > which sofar was a guaranteed way to freeze the system, and it now works > somewhat (latest kernel, ddx and mesa). I see something which shows > horizontal lines which are small parts from my desktop background, and > things change significantly when I switch to the overview mode. > > But other then that the display is completely wrong, it looks a bit > like a framebuffer pitch problem, but then different. I think it > is likely some tiling problem or some such. > > Note that metacity + glxgears works, this only shows with > gnome-shell, any hints where to start looking wrt debugging this? > > Or should I first try to run piglet and see if some tests there > point out the culprit?I've been working on this today, I decided to first make sure that the latest ddx + mesa did not have a regression on nv4x in general, so I plugged in my nv43 card which used to run gnome-shell fine and that shows the same problem. Some debugging with that card shows that things break with this ddx commit: http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=241e7289f25a342a457952b9b0e539c2f0b81d99 "enable dri3 support without glamor" Using an older ddx + latest mesa master gnome-shell runs fine on my nv43 card. And adding my patch to disable msi interrupts on nv46 makes gnome-shell run fine on my nv46 card too :) So unless someone has a good idea to fix msi interrupts on nv46, I suggest we merge my patch to disable them (with a Cc: stable at vger.kernel.org), which should fix most problems nv46 users have been seeing. Regards, Hans