Hi, On 02-12-15 19:33, Samuel Pitoiset wrote:> > > On 12/02/2015 04:34 PM, Hans de Goede wrote: >> On 01-12-15, Samuel Pitoiset wrote: >> >> >>> Ok, here is a MMT trace of vectorAdd: >> >>> >> >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz >> >> >> >> Hi Hans, >> >> >> >> Thanks a lot. >> > >> > Well, I didn't know but Martin has a GK208... >> > I just tested the compute support on his card and ... it works without >> > any changes. :-) >> > >> > I'm sorry, I was sure the compute support didn't work on this chipset. >> >> No need to be sorry because, ... >> >> > Feel free to test on your GK208 and report back if you have problems. >> >> I've done that, and for me it does not work, if I try to enable compute >> support like this: >> >> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >> index 461fcaa..ab4ea85 100644 >> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >> @@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, >> enum pipe_cap param) >> case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: >> return (class_3d >= NVE4_3D_CLASS) ? 1 : 0; >> case PIPE_CAP_COMPUTE: >> - return (class_3d <= NVE4_3D_CLASS) ? 1 : 0; >> + return 1; >> case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: >> return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ? >> 1 : 0; >> >> @@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen >> *pscreen, unsigned shader, >> return 0; >> break; >> case PIPE_SHADER_COMPUTE: >> - if (class_3d > NVE4_3D_CLASS) >> - return 0; >> break; >> default: >> return 0; >> @@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen *screen) >> case 0xd0: >> return nvc0_screen_compute_setup(screen, screen->base.pushbuf); >> case 0xe0: >> - return nve4_screen_compute_setup(screen, screen->base.pushbuf); >> case 0xf0: >> case 0x100: >> case 0x110: >> - return 0; >> + return nve4_screen_compute_setup(screen, screen->base.pushbuf); >> default: >> return -1; >> } >> >> Then as soon as I do startx (which starts gnome-shell) the machine >> freezes. This is with mesa-master with the above changes on top. >> >> X / gnome-shell will happily work of I do not call >> nve4_screen_compute_setup() >> but then test/trivial/compute fails with a null-ptr exception. >> >> Do you perhaps have some extra patches in your tree, or am I just unlucky ? >> >> I've tested this on both a 4.2 and a 4.4-rc3 kernel. > > Hi, > > My bad... I used the wrong card on reator (which is the REing machine of Martin). The primary card is a GK106 and the second one is the GK208. That doesn't explain why I did something wrong but heh? :-) > > You are right. With those bits added locally, the compute support totally hangs the GPU on my GK208 (NV108), and a reboot is needed. > > Please give a shot at this branch : > http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute > > It fixes the initialization of the compute state and allows me to > launch 'test_input_global' (ie. ./compute 8) on my GK208 without > any dmesg fails. That's a good start but more patches are coming. :-)This branch indeed works somewhat better, but things still hang on the test_system_values compute test for me (this is the first test executed I did not try the others). So this seems to need more work. I've ordered a GTX740 (GK107) card, which should arrive soon, and I'll be using that so I can (hopefully) focus on the llvm tgsi bits again.> Btw, according to the trace you sent me, you have a GK208b (NV106).Right, sorry I thought the differences between GK208 and GK208b would not matter. Thanks for all the input / help! Regards, Hans
On 12/04/2015 09:45 AM, Hans de Goede wrote:> Hi, > > On 02-12-15 19:33, Samuel Pitoiset wrote: >> >> >> On 12/02/2015 04:34 PM, Hans de Goede wrote: >>> On 01-12-15, Samuel Pitoiset wrote: >>> >>> >>> Ok, here is a MMT trace of vectorAdd: >>> >>> >>> >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz >>> >> >>> >> Hi Hans, >>> >> >>> >> Thanks a lot. >>> > >>> > Well, I didn't know but Martin has a GK208... >>> > I just tested the compute support on his card and ... it works >>> without >>> > any changes. :-) >>> > >>> > I'm sorry, I was sure the compute support didn't work on this >>> chipset. >>> >>> No need to be sorry because, ... >>> >>> > Feel free to test on your GK208 and report back if you have problems. >>> >>> I've done that, and for me it does not work, if I try to enable compute >>> support like this: >>> >>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> index 461fcaa..ab4ea85 100644 >>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c >>> @@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, >>> enum pipe_cap param) >>> case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: >>> return (class_3d >= NVE4_3D_CLASS) ? 1 : 0; >>> case PIPE_CAP_COMPUTE: >>> - return (class_3d <= NVE4_3D_CLASS) ? 1 : 0; >>> + return 1; >>> case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: >>> return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ? >>> 1 : 0; >>> >>> @@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen >>> *pscreen, unsigned shader, >>> return 0; >>> break; >>> case PIPE_SHADER_COMPUTE: >>> - if (class_3d > NVE4_3D_CLASS) >>> - return 0; >>> break; >>> default: >>> return 0; >>> @@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen >>> *screen) >>> case 0xd0: >>> return nvc0_screen_compute_setup(screen, screen->base.pushbuf); >>> case 0xe0: >>> - return nve4_screen_compute_setup(screen, screen->base.pushbuf); >>> case 0xf0: >>> case 0x100: >>> case 0x110: >>> - return 0; >>> + return nve4_screen_compute_setup(screen, screen->base.pushbuf); >>> default: >>> return -1; >>> } >>> >>> Then as soon as I do startx (which starts gnome-shell) the machine >>> freezes. This is with mesa-master with the above changes on top. >>> >>> X / gnome-shell will happily work of I do not call >>> nve4_screen_compute_setup() >>> but then test/trivial/compute fails with a null-ptr exception. >>> >>> Do you perhaps have some extra patches in your tree, or am I just >>> unlucky ? >>> >>> I've tested this on both a 4.2 and a 4.4-rc3 kernel. >> >> Hi, >> >> My bad... I used the wrong card on reator (which is the REing machine >> of Martin). The primary card is a GK106 and the second one is the >> GK208. That doesn't explain why I did something wrong but heh? :-) >> >> You are right. With those bits added locally, the compute support >> totally hangs the GPU on my GK208 (NV108), and a reboot is needed. >> >> Please give a shot at this branch : >> http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute >> >> It fixes the initialization of the compute state and allows me to >> launch 'test_input_global' (ie. ./compute 8) on my GK208 without >> any dmesg fails. That's a good start but more patches are coming. :-) > > This branch indeed works somewhat better, but things still hang on the > > test_system_values compute test for me (this is the first test executed > I did not try the others). So this seems to need more work.What about test_input_global? test_system_values doesn't work on my side but it doesn't hang the GPU. Could you please provide dmesg log?> > I've ordered a GTX740 (GK107) card, which should arrive soon, and > I'll be using that so I can (hopefully) focus on the llvm tgsi bits > again.Yeah, GK107 will do the job. :-)> >> Btw, according to the trace you sent me, you have a GK208b (NV106). > > Right, sorry I thought the differences between GK208 and GK208b would > not matter.I don't know exactly the differences between these two chipsets but since test_system_values hangs your GPU and not mine, I think they are some.> > Thanks for all the input / help! > > Regards, > > Hans > >-- -Samuel
Hi, On 04-12-15 09:54, Samuel Pitoiset wrote:> > > On 12/04/2015 09:45 AM, Hans de Goede wrote:<snip>>>> Please give a shot at this branch : >>> http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute >>> >>> It fixes the initialization of the compute state and allows me to >>> launch 'test_input_global' (ie. ./compute 8) on my GK208 without >>> any dmesg fails. That's a good start but more patches are coming. :-) >> >> This branch indeed works somewhat better, but things still hang on the >> >> test_system_values compute test for me (this is the first test executed >> I did not try the others). So this seems to need more work. > > What about test_input_global? test_system_values doesn't work on my side but it doesn't hang the GPU.Yes that one works.> Could you please provide dmesg log?[ 2.786631] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1) [ 2.914291] nouveau 0000:01:00.0: bios: version 80.28.79.00.0b [ 2.937909] nouveau 0000:01:00.0: priv: HUB0: 086014 ffffffff (1f70820c) [ 2.937953] nouveau 0000:01:00.0: fb: 1024 MiB DDR3 [ 3.623202] [TTM] Zone kernel: Available graphics memory: 2010556 kiB [ 3.623205] [TTM] Initializing pool allocator [ 3.623241] [TTM] Initializing DMA pool allocator [ 3.623440] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB [ 3.623442] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB [ 3.623447] nouveau 0000:01:00.0: DRM: TMDS table version 2.0 [ 3.623449] nouveau 0000:01:00.0: DRM: DCB version 4.0 [ 3.623451] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030 [ 3.623454] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010 [ 3.623456] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000 [ 3.623458] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031 [ 3.623460] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161 [ 3.623462] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200 [ 3.627283] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 3.627285] [drm] Driver supports precise vblank timestamp query. [ 3.671871] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies [ 3.889940] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff880119050000 [ 3.890952] fbcon: nouveaufb (fb0) is primary device [ 4.132343] Console: switching to colour frame buffer device 240x67 [ 4.134930] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device [ 4.141094] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 <snip> [ 1713.421460] nouveau 0000:01:00.0: gr: TRAP ch 6 [003fa32000 compute[21117]] [ 1713.421471] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 [] warp 3000e [MEM_OUT_OF_BOUNDS] [ 1713.441248] nouveau 0000:01:00.0: gr: TRAP ch 6 [003fa32000 compute[21117]] [ 1713.441260] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC] [ 1713.441265] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC] [ 1717.773839] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1717.773848] nouveau 0000:01:00.0: fifo: sw engine fault on channel 2, recovering... [ 1719.776529] nouveau 0000:01:00.0: fifo: runlist 0 update timeout [ 1722.068923] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1726.363660] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1730.658395] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1734.951720] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1739.241861] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1743.532005] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1747.826728] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1752.121462] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1756.416200] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1760.710930] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1765.005663] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1769.300396] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1773.595135] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1777.889863] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1782.184598] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1786.479328] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1789.730020] nouveau 0000:01:00.0: compute[21117]: failed to idle channel 6 [compute[21117]] [ 1790.774060] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1791.729963] nouveau 0000:01:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()! [ 1791.729966] nouveau 0000:01:00.0: fifo: channel 6 [compute[21117]] kick timeout [ 1791.729973] nouveau: compute[21117]:00000000:0000a06f: detach gr failed, -16 [ 1791.731401] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d [] [ 1793.731275] nouveau 0000:01:00.0: timeout at drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()! [ 1793.731279] nouveau 0000:01:00.0: fifo: channel 6 [compute[21117]] kick timeout [ 1793.731281] nouveau: compute[21117]:00000000:0000a06f: detach sw failed, -16 [ 1796.026118] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1800.320809] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1804.615446] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1808.731016] nouveau 0000:01:00.0: compute[21117]: failed to idle channel 6 [compute[21117]] [ 1808.738716] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d [] [ 1810.738093] nouveau 0000:01:00.0: fifo: runlist 0 update timeout [ 1810.738106] nouveau 0000:01:00.0: fifo: BIND_ERROR 03 [UNBIND_WHILE_RUNNING] [ 1815.032747] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1819.327395] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [ 1823.622036] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] <last line keeps repeating at aprox 4 sec interval>>> I've ordered a GTX740 (GK107) card, which should arrive soon, and >> I'll be using that so I can (hopefully) focus on the llvm tgsi bits >> again. > > Yeah, GK107 will do the job. :-)Good a said it should arrive soon.>>> Btw, according to the trace you sent me, you have a GK208b (NV106). >> >> Right, sorry I thought the differences between GK208 and GK208b would >> not matter. > > I don't know exactly the differences between these two chipsets but since test_system_values hangs your GPU and not mine, I think they are some.Ok. Regards, Hans
Hi On 04-12-15 09:45, Hans de Goede wrote:> I've ordered a GTX740 (GK107) card, which should arrive soon, and > I'll be using that so I can (hopefully) focus on the llvm tgsi bits > again.So the card arrived today and I've plugged it in tests/trivial/compute looks much better with this. But there does seem to be one issue (other then the atomic bits not working) : - test_resource_indirect (1, 0)[0]: got 0x2/0.000000, expected 0x3/0.000000 (3, 0)[0]: got 0x6/0.000000, expected 0x7/0.000000 (5, 0)[0]: got 0xa/0.000000, expected 0xb/0.000000 (7, 0)[0]: got 0xe/0.000000, expected 0xf/0.000000 (9, 0)[0]: got 0x12/0.000000, expected 0x13/0.000000 (11, 0)[0]: got 0x16/0.000000, expected 0x17/0.000000 (13, 0)[0]: got 0x1a/0.000000, expected 0x1b/0.000000 (15, 0)[0]: got 0x1e/0.000000, expected 0x1f/0.000000 (17, 0)[0]: got 0x22/0.000000, expected 0x23/0.000000 (19, 0)[0]: got 0x26/0.000000, expected 0x27/0.000000 (21, 0)[0]: got 0x2a/0.000000, expected 0x2b/0.000000 (23, 0)[0]: got 0x2e/0.000000, expected 0x2f/0.000000 (25, 0)[0]: got 0x32/0.000000, expected 0x33/0.000000 (27, 0)[0]: got 0x36/0.000000, expected 0x37/0.000000 (29, 0)[0]: got 0x3a/0.000000, expected 0x3b/0.000000 (31, 0)[0]: got 0x3e/0.000000, expected 0x3f/0.000000 (33, 0)[0]: got 0x42/0.000000, expected 0x43/0.000000 (35, 0)[0]: got 0x46/0.000000, expected 0x47/0.000000 (37, 0)[0]: got 0x4a/0.000000, expected 0x4b/0.000000 (39, 0)[0]: got 0x4e/0.000000, expected 0x4f/0.000000 (64, 1): FAIL (32) Regards, Hans
On 12/07/2015 04:10 PM, Hans de Goede wrote:> Hi >Hi,> On 04-12-15 09:45, Hans de Goede wrote: > >> I've ordered a GTX740 (GK107) card, which should arrive soon, and >> I'll be using that so I can (hopefully) focus on the llvm tgsi bits >> again. > > So the card arrived today and I've plugged it in tests/trivial/compute > looks much better with this. But there does seem to be one issue > (other then the atomic bits not working) : > > - test_resource_indirectExactly, two or three test don't work on Kepler < GK110. It's on my todolist, but with a low priority. :-) Thanks for reporting this anyway.> (1, 0)[0]: got 0x2/0.000000, expected 0x3/0.000000 > (3, 0)[0]: got 0x6/0.000000, expected 0x7/0.000000 > (5, 0)[0]: got 0xa/0.000000, expected 0xb/0.000000 > (7, 0)[0]: got 0xe/0.000000, expected 0xf/0.000000 > (9, 0)[0]: got 0x12/0.000000, expected 0x13/0.000000 > (11, 0)[0]: got 0x16/0.000000, expected 0x17/0.000000 > (13, 0)[0]: got 0x1a/0.000000, expected 0x1b/0.000000 > (15, 0)[0]: got 0x1e/0.000000, expected 0x1f/0.000000 > (17, 0)[0]: got 0x22/0.000000, expected 0x23/0.000000 > (19, 0)[0]: got 0x26/0.000000, expected 0x27/0.000000 > (21, 0)[0]: got 0x2a/0.000000, expected 0x2b/0.000000 > (23, 0)[0]: got 0x2e/0.000000, expected 0x2f/0.000000 > (25, 0)[0]: got 0x32/0.000000, expected 0x33/0.000000 > (27, 0)[0]: got 0x36/0.000000, expected 0x37/0.000000 > (29, 0)[0]: got 0x3a/0.000000, expected 0x3b/0.000000 > (31, 0)[0]: got 0x3e/0.000000, expected 0x3f/0.000000 > (33, 0)[0]: got 0x42/0.000000, expected 0x43/0.000000 > (35, 0)[0]: got 0x46/0.000000, expected 0x47/0.000000 > (37, 0)[0]: got 0x4a/0.000000, expected 0x4b/0.000000 > (39, 0)[0]: got 0x4e/0.000000, expected 0x4f/0.000000 > (64, 1): FAIL (32) > > Regards, > > Hans-- -Samuel