Ilia Mirkin
2015-Dec-15 19:04 UTC
[Nouveau] Debugging INVALID_OPCODE / MULTIPLE_WARP_ERRORS ?
Also, where's the exit op? Perhaps what's happening is that you don't have an exit and it just goes off executing into the ether? On Tue, Dec 15, 2015 at 12:00 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> A few things that stand out: > > 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) > > wtf is that 0x0000000000000 thing doing there? Was it a %rX which got > constant-folded into 0? That indirectness should have then been > removed... that said, the final encoding looks fine. > > I believe that kepler has this launch descriptor thing too... is that > being set correctly? Please generate a mmt trace, and we can see if > anything stands out compared to a blob trace that also does compute. > > Cheers, > > -ilia > > On Tue, Dec 15, 2015 at 9:15 AM, Hans de Goede <hdegoede at redhat.com> wrote: >> Hi all, >> >> As part of my compute work I'm trying to get some TGSI compute >> code to work. The code from mesa/src/gallium/tests/trivial.c >> works. >> >> So now I'm trying to get a "native" tgsi kernel to run via >> clover, I'm using Francisco's nbody.c example for this: >> >> https://fedorapeople.org/~jwrdegoede/nbody.c >> >> Which does not work, at first I thought there was an issue >> with the setup of the input / output buffers, but that seems to >> work fine, and moreover I finally got the smart idea to look >> in dmesg, which says: >> >> [ 9920.802435] nouveau 0000:01:00.0: gr: TRAP ch 6 [007f7fa000 nbody[31881]] >> [ 9920.802449] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 >> [] warp 10009 [INVALID_OPCODE] >> [ 9920.802456] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 >> [MULTIPLE_WARP_ERRORS] warp 20009 [INVALID_OPCODE] >> >> and repeats that for every "step" in the nobody simulation, this is on a >> gk107 card. >> >> So that seems to be the real problem, since the >> error says "INVALID_OPCODE", I've put the tgsi code from nbody.c >> through "nouveau_compiler -a e4" and then run "nvdisasm -b SM30" >> on it, but the output looks ok. There is a 8 byte sequence which does >> not get decoded every 64 bytes but AFAIK that is the scheduling info, >> so that should be fine. >> >> One thing which does stand out is that this: >> >> 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) >> 1: ld u32 %r222 c0[0x4] (0) >> 2: ld u64 { %r225 %r228 } c0[0x8] (0) >> 3: ld u32 %r234 c0[0x10] (0) >> >> Gets translated into (nvdisasm output) : >> >> /*0008*/ LDC R4, c[0x0][0x0]; >> /* 0x1400000003f11c86 */ >> /*0010*/ MOV R2, c[0x0][0x4]; >> /* 0x2800400010009de4 */ >> /*0018*/ LDC.64 R0, c[0x0][0x8]; >> /* 0x1400000023f01ca6 */ >> /*0020*/ MOV R3, c[0x0][0x10]; >> /* 0x280040004000dde4 */ >> >> Where I would expect for LDC instructions, could that be the problem ? >> >> If that is not the problem, then hints how to debug this further would be >> greatly appreciated. >> >> Regards, >> >> Hans >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/nouveau
Hans de Goede
2015-Dec-16 17:06 UTC
[Nouveau] Debugging INVALID_OPCODE / MULTIPLE_WARP_ERRORS ?
Hi, On 15-12-15 20:04, Ilia Mirkin wrote:> Also, where's the exit op? Perhaps what's happening is that you don't > have an exit and it just goes off executing into the ether?Sorry I only included a small bit of the program in my original mail because I found the use of "MOV" instructions to load constants suspicious, is that normal ? I've put a log with NV50_PROG_DEBUG=1 output here: https://fedorapeople.org/~jwrdegoede/nbody.log nvdisasm -b SM30 for the generated binary code is here: https://fedorapeople.org/~jwrdegoede/nbody.disasm There are already .tgsi, .hex and .bin files there if you find those easier to use then the NV50_PROG_DEBUG=1 output.> > On Tue, Dec 15, 2015 at 12:00 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> A few things that stand out: >> >> 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) >> >> wtf is that 0x0000000000000 thing doing there? Was it a %rX which got >> constant-folded into 0? That indirectness should have then been >> removed... that said, the final encoding looks fine.I don't know, maybe there is a hint in the log file? Regards, Hans>> >> I believe that kepler has this launch descriptor thing too... is that >> being set correctly? Please generate a mmt trace, and we can see if >> anything stands out compared to a blob trace that also does compute. >> >> Cheers, >> >> -ilia >> >> On Tue, Dec 15, 2015 at 9:15 AM, Hans de Goede <hdegoede at redhat.com> wrote: >>> Hi all, >>> >>> As part of my compute work I'm trying to get some TGSI compute >>> code to work. The code from mesa/src/gallium/tests/trivial.c >>> works. >>> >>> So now I'm trying to get a "native" tgsi kernel to run via >>> clover, I'm using Francisco's nbody.c example for this: >>> >>> https://fedorapeople.org/~jwrdegoede/nbody.c >>> >>> Which does not work, at first I thought there was an issue >>> with the setup of the input / output buffers, but that seems to >>> work fine, and moreover I finally got the smart idea to look >>> in dmesg, which says: >>> >>> [ 9920.802435] nouveau 0000:01:00.0: gr: TRAP ch 6 [007f7fa000 nbody[31881]] >>> [ 9920.802449] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000 >>> [] warp 10009 [INVALID_OPCODE] >>> [ 9920.802456] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 >>> [MULTIPLE_WARP_ERRORS] warp 20009 [INVALID_OPCODE] >>> >>> and repeats that for every "step" in the nobody simulation, this is on a >>> gk107 card. >>> >>> So that seems to be the real problem, since the >>> error says "INVALID_OPCODE", I've put the tgsi code from nbody.c >>> through "nouveau_compiler -a e4" and then run "nvdisasm -b SM30" >>> on it, but the output looks ok. There is a 8 byte sequence which does >>> not get decoded every 64 bytes but AFAIK that is the scheduling info, >>> so that should be fine. >>> >>> One thing which does stand out is that this: >>> >>> 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) >>> 1: ld u32 %r222 c0[0x4] (0) >>> 2: ld u64 { %r225 %r228 } c0[0x8] (0) >>> 3: ld u32 %r234 c0[0x10] (0) >>> >>> Gets translated into (nvdisasm output) : >>> >>> /*0008*/ LDC R4, c[0x0][0x0]; >>> /* 0x1400000003f11c86 */ >>> /*0010*/ MOV R2, c[0x0][0x4]; >>> /* 0x2800400010009de4 */ >>> /*0018*/ LDC.64 R0, c[0x0][0x8]; >>> /* 0x1400000023f01ca6 */ >>> /*0020*/ MOV R3, c[0x0][0x10]; >>> /* 0x280040004000dde4 */ >>> >>> Where I would expect for LDC instructions, could that be the problem ? >>> >>> If that is not the problem, then hints how to debug this further would be >>> greatly appreciated. >>> >>> Regards, >>> >>> Hans >>> _______________________________________________ >>> Nouveau mailing list >>> Nouveau at lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/nouveau
Ilia Mirkin
2015-Dec-16 17:24 UTC
[Nouveau] Debugging INVALID_OPCODE / MULTIPLE_WARP_ERRORS ?
I believe that your problem is this: /*01a0*/ LD R8, [R8]; /* 0x8000000000821c85 */ That needs to be LD.E (and your ST's need to be ST.E). You're using a 32-bit gmem address, but you need to be using a 64-bit one. I believe the 32-bit ones work on fermi, but afaik not on Kepler. Cheers, -ilia On Wed, Dec 16, 2015 at 12:06 PM, Hans de Goede <hdegoede at redhat.com> wrote:> Hi, > > On 15-12-15 20:04, Ilia Mirkin wrote: >> >> Also, where's the exit op? Perhaps what's happening is that you don't >> have an exit and it just goes off executing into the ether? > > > Sorry I only included a small bit of the program in my original mail > because I found the use of "MOV" instructions to load constants > suspicious, is that normal ? > > I've put a log with NV50_PROG_DEBUG=1 output here: > > https://fedorapeople.org/~jwrdegoede/nbody.log > > nvdisasm -b SM30 for the generated binary code is here: > > https://fedorapeople.org/~jwrdegoede/nbody.disasm > > There are already .tgsi, .hex and .bin files there if > you find those easier to use then the > NV50_PROG_DEBUG=1 output. > > >> >> On Tue, Dec 15, 2015 at 12:00 PM, Ilia Mirkin <imirkin at alum.mit.edu> >> wrote: >>> >>> A few things that stand out: >>> >>> 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) >>> >>> wtf is that 0x0000000000000 thing doing there? Was it a %rX which got >>> constant-folded into 0? That indirectness should have then been >>> removed... that said, the final encoding looks fine. > > > I don't know, maybe there is a hint in the log file? > > Regards, > > Hans > > > >>> >>> I believe that kepler has this launch descriptor thing too... is that >>> being set correctly? Please generate a mmt trace, and we can see if >>> anything stands out compared to a blob trace that also does compute. >>> >>> Cheers, >>> >>> -ilia >>> >>> On Tue, Dec 15, 2015 at 9:15 AM, Hans de Goede <hdegoede at redhat.com> >>> wrote: >>>> >>>> Hi all, >>>> >>>> As part of my compute work I'm trying to get some TGSI compute >>>> code to work. The code from mesa/src/gallium/tests/trivial.c >>>> works. >>>> >>>> So now I'm trying to get a "native" tgsi kernel to run via >>>> clover, I'm using Francisco's nbody.c example for this: >>>> >>>> https://fedorapeople.org/~jwrdegoede/nbody.c >>>> >>>> Which does not work, at first I thought there was an issue >>>> with the setup of the input / output buffers, but that seems to >>>> work fine, and moreover I finally got the smart idea to look >>>> in dmesg, which says: >>>> >>>> [ 9920.802435] nouveau 0000:01:00.0: gr: TRAP ch 6 [007f7fa000 >>>> nbody[31881]] >>>> [ 9920.802449] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global >>>> 00000000 >>>> [] warp 10009 [INVALID_OPCODE] >>>> [ 9920.802456] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global >>>> 00000004 >>>> [MULTIPLE_WARP_ERRORS] warp 20009 [INVALID_OPCODE] >>>> >>>> and repeats that for every "step" in the nobody simulation, this is on a >>>> gk107 card. >>>> >>>> So that seems to be the real problem, since the >>>> error says "INVALID_OPCODE", I've put the tgsi code from nbody.c >>>> through "nouveau_compiler -a e4" and then run "nvdisasm -b SM30" >>>> on it, but the output looks ok. There is a 8 byte sequence which does >>>> not get decoded every 64 bytes but AFAIK that is the scheduling info, >>>> so that should be fine. >>>> >>>> One thing which does stand out is that this: >>>> >>>> 0: ld u32 %r219 c0[0x0000000000000000+0x0] (0) >>>> 1: ld u32 %r222 c0[0x4] (0) >>>> 2: ld u64 { %r225 %r228 } c0[0x8] (0) >>>> 3: ld u32 %r234 c0[0x10] (0) >>>> >>>> Gets translated into (nvdisasm output) : >>>> >>>> /*0008*/ LDC R4, c[0x0][0x0]; >>>> /* 0x1400000003f11c86 */ >>>> /*0010*/ MOV R2, c[0x0][0x4]; >>>> /* 0x2800400010009de4 */ >>>> /*0018*/ LDC.64 R0, c[0x0][0x8]; >>>> /* 0x1400000023f01ca6 */ >>>> /*0020*/ MOV R3, c[0x0][0x10]; >>>> /* 0x280040004000dde4 */ >>>> >>>> Where I would expect for LDC instructions, could that be the problem ? >>>> >>>> If that is not the problem, then hints how to debug this further would >>>> be >>>> greatly appreciated. >>>> >>>> Regards, >>>> >>>> Hans >>>> _______________________________________________ >>>> Nouveau mailing list >>>> Nouveau at lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/nouveau