Maarten Maathuis
2010-Jan-02 15:36 UTC
[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+
Many people using nv50+ hardware are aware of gpu lockups when a fifo closes under certain conditions. Based on a mmio-trace and some trail and error testing i've come up with a patch that improves the situation on my NV96. This patch needs testing on NV50+ hardware and regression testing on older hardware, since i did change some of the common codepaths. This is very much a work in progress, and if you have anything to add/correct, please share it. I've also attached a 2 test apps, once is bitscan-fail from mwk, use it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified version only emits NOPs (method 0x100) and represents the no error situation. For me, i can run the NOP program in loops of 10000 iterations with no problems (i've done so several times), the bitscan-fail survives 10000 iterations sometimes, but can also fail after a few thousand. In comparison, a single run of bitscan-fail could cause a gpu lockup for me in the past. Please try the gallium driver, the test apps, suspend to ram. Suspend to ram isn't 100% reliable yet for me (this was always the case after strange experiments/hammering/etc), but should not regress. This goes for older hw as well, whatever worked should still work, but i wouldn't expect serious improvements there. As always, feedback is appreciated, especially since this is a touchy subject. Maarten. -------------- next part -------------- A non-text attachment was scrubbed... Name: test_better_context_handling_v4.patch Type: text/x-patch Size: 9042 bytes Desc: not available Url : http://lists.freedesktop.org/archives/nouveau/attachments/20100102/b38828c4/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: nop.c Type: application/octet-stream Size: 1236 bytes Desc: not available Url : http://lists.freedesktop.org/archives/nouveau/attachments/20100102/b38828c4/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: bitscan-fail.c Type: application/octet-stream Size: 1264 bytes Desc: not available Url : http://lists.freedesktop.org/archives/nouveau/attachments/20100102/b38828c4/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Makefile Type: application/octet-stream Size: 340 bytes Desc: not available Url : http://lists.freedesktop.org/archives/nouveau/attachments/20100102/b38828c4/attachment-0002.obj
Maarten Maathuis
2010-Jan-02 15:39 UTC
[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+
Please do report your successes, and not only failures. On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:> Many people using nv50+ hardware are aware of gpu lockups when a fifo > closes under certain conditions. Based on a mmio-trace and some trail > and error testing i've come up with a patch that improves the > situation on my NV96. > > This patch needs testing on NV50+ hardware and regression testing on > older hardware, since i did change some of the common codepaths. This > is very much a work in progress, and if you have anything to > add/correct, please share it. > > I've also attached a 2 test apps, once is bitscan-fail from mwk, use > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified > version only emits NOPs (method 0x100) and represents the no error > situation. > > For me, i can run the NOP program in loops of 10000 iterations with no > problems (i've done so several times), the bitscan-fail survives 10000 > iterations sometimes, but can also fail after a few thousand. In > comparison, a single run of bitscan-fail could cause a gpu lockup for > me in the past. > > Please try the gallium driver, the test apps, suspend to ram. Suspend > to ram isn't 100% reliable yet for me (this was always the case after > strange experiments/hammering/etc), but should not regress. This goes > for older hw as well, whatever worked should still work, but i > wouldn't expect serious improvements there. > > As always, feedback is appreciated, especially since this is a touchy subject. > > Maarten. >
Maarten Maathuis
2010-Jan-04 19:29 UTC
[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+
I've narrowed it down further, the "pgraph->fifo_access" bit is still cleanup (register 0x400500 represents pgraph fifo access), the rest appears needed for the desired effect. The reordering of pfifo and pgraph destroy is needed. As usual, feedback is appreciated. Maarten. On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:> Many people using nv50+ hardware are aware of gpu lockups when a fifo > closes under certain conditions. Based on a mmio-trace and some trail > and error testing i've come up with a patch that improves the > situation on my NV96. > > This patch needs testing on NV50+ hardware and regression testing on > older hardware, since i did change some of the common codepaths. This > is very much a work in progress, and if you have anything to > add/correct, please share it. > > I've also attached a 2 test apps, once is bitscan-fail from mwk, use > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified > version only emits NOPs (method 0x100) and represents the no error > situation. > > For me, i can run the NOP program in loops of 10000 iterations with no > problems (i've done so several times), the bitscan-fail survives 10000 > iterations sometimes, but can also fail after a few thousand. In > comparison, a single run of bitscan-fail could cause a gpu lockup for > me in the past. > > Please try the gallium driver, the test apps, suspend to ram. Suspend > to ram isn't 100% reliable yet for me (this was always the case after > strange experiments/hammering/etc), but should not regress. This goes > for older hw as well, whatever worked should still work, but i > wouldn't expect serious improvements there. > > As always, feedback is appreciated, especially since this is a touchy subject. > > Maarten. >-------------- next part -------------- A non-text attachment was scrubbed... Name: test_better_context_handling_v5.patch Type: text/x-patch Size: 4446 bytes Desc: not available Url : http://lists.freedesktop.org/archives/nouveau/attachments/20100104/c75b1942/attachment-0001.bin
Xavier
2010-Jan-04 22:58 UTC
[Nouveau] [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+
On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003 at gmail.com> wrote:> Many people using nv50+ hardware are aware of gpu lockups when a fifo > closes under certain conditions. Based on a mmio-trace and some trail > and error testing i've come up with a patch that improves the > situation on my NV96. > > This patch needs testing on NV50+ hardware and regression testing on > older hardware, since i did change some of the common codepaths. This > is very much a work in progress, and if you have anything to > add/correct, please share it. > > I've also attached a 2 test apps, once is bitscan-fail from mwk, use > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified > version only emits NOPs (method 0x100) and represents the no error > situation. > > For me, i can run the NOP program in loops of 10000 iterations with no > problems (i've done so several times), the bitscan-fail survives 10000 > iterations sometimes, but can also fail after a few thousand. In > comparison, a single run of bitscan-fail could cause a gpu lockup for > me in the past. > > Please try the gallium driver, the test apps, suspend to ram. Suspend > to ram isn't 100% reliable yet for me (this was always the case after > strange experiments/hammering/etc), but should not regress. This goes > for older hw as well, whatever worked should still work, but i > wouldn't expect serious improvements there. > > As always, feedback is appreciated, especially since this is a touchy subject. >I tried patch v5 on nv84. I get similar results for bitscan-fail and nop, i.e. a huge improvement compared to before (with or without X). No apparent regressions on gallium and suspend/resume.
Seemingly Similar Threads
- [TEST REQUEST] NV50/NV8x/NV9x ctxprog and ctxvals generator
- [Bug 75776] New: Hearthstone displays corrupted buffers on NVA5
- [PATCH envytools] demmio: Add decoding of some MEM_TIMINGS registers for NVC0.
- [Bug 82843] New: [NV96][Regression][Bisected] Failure to resume
- [Bug 89572] New: [NV86] Video playback corruption with nouveau, vdpau