Hi All, So I've been working~w banging my head against the nv46 vblank bug again and I've been finding out some interesting things. This is all using the latest kernel + ddx + mesa code. All of this was tested with *cold* (power removed from wall outlet for 20 seconds) boots in between the scenarios because this is really weird, all boots where into text mode. Scenario 1: a) startx /usr/bin/glxgears -info glxgears works, is using the nouveau mesa driver and is synced at 60 fps. b) startx /usr/bin/xterm -title foo And then in the xterm: metacity& glxgears -info glxgears works, is using the nouveau mesa driver and is synced at 60 fps. Scenario 2: a) startx /usr/bin/xterm -title foo And then in the xterm: metacity& glxgears -info glxgears does not work /proc/interrupts shows the interrupt for nvkm is not firing. b) startx /usr/bin/glxgears -info glxgears does not work /proc/interrupts shows the interrupt for nvkm is not firing. Weird huh, but it gets even weirder, if after scenario 2 I reload the nouveau kernel module, using the exact same module as loaded during boot, then I can run scenario 1 and it works the same as after a cold boot (iow things work as they should). Ok, so reloading the module sets things back to a pristine state, well not quite. Because after a module reload scenario 2 also works, where as after a cold boot scenario 2 does not work ... Also once things are in a working state I can pretty much do whatever I want and they stay working... My theory so far is that plymouth does something which causes problems when followed by starting X + xterm + metacity, where as firing X + glxgears directly after boot undoes the something plymouth has done, and from there on everything is good. So any hints how to mvoe forward with this are appreciated. Regards, Hans p.s. Possibly related, likely unrelated during nouveau module (re) load I get these 2 errors: [ 240.837471] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x6833c8 [ 240.837945] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x4f4f5c4e FAULT at 0x6833c8 Where by the addresses listed as being written to (0x00000000 and 0x4f4f5c4e) are different each module load, so they seem to be taken from uninitialized memory. Hints on how to debug this are welcome too.
On Tue, Jun 16, 2015 at 5:34 AM, Hans de Goede <hdegoede at redhat.com> wrote:> So any hints how to mvoe forward with this are appreciated.I can only say what I would do... forget about trying to quantify which cases work and which don't, just take the case that you can reliably reproduce the problem with. Start up a second xterm (or ssh in, that might be simpler), and start poking at stuff with nvapeek/nvapoke. You can look in the driver for what it does when enabling/disabling vblanks, and you can verify the values of various registers to see if they're what you expect or not. My bet is the the vblanks are somehow masked off. The dispnv04 code is pretty convoluted, and probably an odd call sequence causes it to mess things up. Also adding a drm.debug=0xf and comparing the successful and failure cases may prove interesting. [and nouveau.debug=debug for good measure as well, can't hurt]> Possibly related, likely unrelated during nouveau module (re) load I get > these 2 errors: > > [ 240.837471] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 > FAULT at 0x6833c8 > [ 240.837945] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x4f4f5c4e > FAULT at 0x6833c8 > > Where by the addresses listed as being written to (0x00000000 and > 0x4f4f5c4e) are different > each module load, so they seem to be taken from uninitialized memory.How different? This example appears to decode to: $ ~/src/envytools/rnn/lookup -a 46 6833c8 PRMDIO2.PAL_INDEX => 0 Which is definitely video-related. Perhaps it's something executed by a VBIOS script? (Or wait, you were thinking that 0x4f4f5c4e is the address? no, that is the value being written. And it occurs to me that that is 'N\OO' in fourcc. Probably irrelevant.) You may find 'nvbios' a useful tool for decoding the bios scripts. Also the 3c8 is suspiciously similar to the VGA I/O 0x3c4 (?) register? Probably coincidence. Good luck, -ilia
Hi, On 16-06-15 16:02, Ilia Mirkin wrote:> On Tue, Jun 16, 2015 at 5:34 AM, Hans de Goede <hdegoede at redhat.com> wrote: >> So any hints how to mvoe forward with this are appreciated. > > I can only say what I would do... forget about trying to quantify > which cases work and which don't, just take the case that you can > reliably reproduce the problem with. Start up a second xterm (or ssh > in, that might be simpler), and start poking at stuff with > nvapeek/nvapoke. You can look in the driver for what it does when > enabling/disabling vblanks, and you can verify the values of various > registers to see if they're what you expect or not. My bet is the the > vblanks are somehow masked off. The dispnv04 code is pretty > convoluted, and probably an odd call sequence causes it to mess things > up.Ok, so I've been poking at registers for a couple of hours yesterday and today, but I've not gotten anywhere. In the mean time I've learned something about my 2 scenarios, I was wrong that one works and one does not work, they both work and do not work some of the time ... It seems that we're not initializing some register and sometimes this comes out of reset with a good value and sometimes with a bad value ... Running nvapeek on interesting register ranges also shows that quite a few registers contain different values between boots. Some of these are things seem to be counters for the current line / scanout address, but others are not. Is this normal for nouveau hardware? I'm used to most hardware having everything in a consistent state after a reset.> > Also adding a drm.debug=0xf and comparing the successful and failure > cases may prove interesting. [and nouveau.debug=debug for good measure > as well, can't hurt] > >> Possibly related, likely unrelated during nouveau module (re) load I get >> these 2 errors: >> >> [ 240.837471] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 >> FAULT at 0x6833c8 >> [ 240.837945] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x4f4f5c4e >> FAULT at 0x6833c8 >> >> Where by the addresses listed as being written to (0x00000000 and >> 0x4f4f5c4e) are different >> each module load, so they seem to be taken from uninitialized memory. > > How different? This example appears to decode to: > > $ ~/src/envytools/rnn/lookup -a 46 6833c8 > PRMDIO2.PAL_INDEX => 0 > > Which is definitely video-related. Perhaps it's something executed by > a VBIOS script? (Or wait, you were thinking that 0x4f4f5c4e is the > address? no, that is the value being written. And it occurs to me that > that is 'N\OO' in fourcc. Probably irrelevant.) You may find 'nvbios' > a useful tool for decoding the bios scripts. Also the 3c8 is > suspiciously similar to the VGA I/O 0x3c4 (?) register? Probably > coincidence.Ah yes I had the 2 value and address swapped. you're right it is writing to 0x6833c8 usually 2 times, but sometimes it is writing to that register a lot of times in a row, usually on a nouveau module reload. Regards, Hans
Possibly Parallel Threads
- [Bug 83168] New: No display after suspend to RAM
- [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c
- "unknown fragment shader param 17" error on NV46 when running glxgears
- [Bug 67255] New: black screen after resuming from Hibernate
- Suspend to disk with nvidia NV40/NV46 (Geforce Go 7400)