thr3ads.net - Nouveau - [Nouveau] NV50 compute support questions [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Hans de Goede

2015-Dec-04 08:45 UTC

[Nouveau] NV50 compute support questions

Hi,

On 02-12-15 19:33, Samuel Pitoiset wrote:>
>
> On 12/02/2015 04:34 PM, Hans de Goede wrote:
>> On 01-12-15, Samuel Pitoiset wrote:
>>
>>  >>> Ok, here is a MMT trace of vectorAdd:
>>  >>>
>>  >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz
>>  >>
>>  >> Hi Hans,
>>  >>
>>  >> Thanks a lot.
>>  >
>>  > Well, I didn't know but Martin has a GK208...
>>  > I just tested the compute support on his card and ... it works
without
>>  > any changes. :-)
>>  >
>>  > I'm sorry, I was sure the compute support didn't work on
this chipset.
>>
>> No need to be sorry because, ...
>>
>>  > Feel free to test on your GK208 and report back if you have
problems.
>>
>> I've done that, and for me it does not work, if I try to enable
compute
>> support like this:
>>
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> index 461fcaa..ab4ea85 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>> @@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen,
>> enum pipe_cap param)
>>      case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
>>         return (class_3d >= NVE4_3D_CLASS) ? 1 : 0;
>>      case PIPE_CAP_COMPUTE:
>> -      return (class_3d <= NVE4_3D_CLASS) ? 1 : 0;
>> +      return 1;
>>      case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
>>         return nouveau_screen(pscreen)->vram_domain &
NOUVEAU_BO_VRAM ?
>> 1 : 0;
>>
>> @@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen
>> *pscreen, unsigned shader,
>>            return 0;
>>         break;
>>      case PIPE_SHADER_COMPUTE:
>> -      if (class_3d > NVE4_3D_CLASS)
>> -         return 0;
>>         break;
>>      default:
>>         return 0;
>> @@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen
*screen)
>>      case 0xd0:
>>         return nvc0_screen_compute_setup(screen,
screen->base.pushbuf);
>>      case 0xe0:
>> -      return nve4_screen_compute_setup(screen,
screen->base.pushbuf);
>>      case 0xf0:
>>      case 0x100:
>>      case 0x110:
>> -      return 0;
>> +      return nve4_screen_compute_setup(screen,
screen->base.pushbuf);
>>      default:
>>         return -1;
>>      }
>>
>> Then as soon as I do startx (which starts gnome-shell) the machine
>> freezes. This is with mesa-master with the above changes on top.
>>
>> X / gnome-shell will happily work of I do not call
>> nve4_screen_compute_setup()
>> but then test/trivial/compute fails with a null-ptr exception.
>>
>> Do you perhaps have some extra patches in your tree, or am I just
unlucky ?
>>
>> I've tested this on both a 4.2 and a 4.4-rc3 kernel.
>
> Hi,
>
> My bad... I used the wrong card on reator (which is the REing machine of
Martin). The primary card is a GK106 and the second one is the GK208. That
doesn't explain why I did something wrong but heh? :-)
>
> You are right. With those bits added locally, the compute support totally
hangs the GPU on my GK208 (NV108), and a reboot is needed.
>
> Please give a shot at this branch :
> http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute
>
> It fixes the initialization of the compute state and allows me to
> launch 'test_input_global' (ie. ./compute 8) on my GK208 without
> any dmesg fails. That's a good start but more patches are coming. :-)
This branch indeed works somewhat better, but things still hang on the

test_system_values compute test for me (this is the first test executed
I did not try the others). So this seems to need more work.

I've ordered a GTX740 (GK107) card, which should arrive soon, and
I'll be using that so I can (hopefully) focus on the llvm tgsi bits
again.
> Btw, according to the trace you sent me, you have a GK208b (NV106).
Right, sorry I thought the differences between GK208 and GK208b would not
matter.

Thanks for all the input / help!

Regards,

Hans

Samuel Pitoiset

2015-Dec-04 08:54 UTC

head link

[Nouveau] NV50 compute support questions

On 12/04/2015 09:45 AM, Hans de Goede wrote:> Hi,
>
> On 02-12-15 19:33, Samuel Pitoiset wrote:
>>
>>
>> On 12/02/2015 04:34 PM, Hans de Goede wrote:
>>> On 01-12-15, Samuel Pitoiset wrote:
>>>
>>>  >>> Ok, here is a MMT trace of vectorAdd:
>>>  >>>
>>>  >>> https://fedorapeople.org/~jwrdegoede/vectorAdd.log.gz
>>>  >>
>>>  >> Hi Hans,
>>>  >>
>>>  >> Thanks a lot.
>>>  >
>>>  > Well, I didn't know but Martin has a GK208...
>>>  > I just tested the compute support on his card and ... it
works
>>> without
>>>  > any changes. :-)
>>>  >
>>>  > I'm sorry, I was sure the compute support didn't work
on this
>>> chipset.
>>>
>>> No need to be sorry because, ...
>>>
>>>  > Feel free to test on your GK208 and report back if you have
problems.
>>>
>>> I've done that, and for me it does not work, if I try to enable
compute
>>> support like this:
>>>
>>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>>> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>>> index 461fcaa..ab4ea85 100644
>>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
>>> @@ -187,7 +187,7 @@ nvc0_screen_get_param(struct pipe_screen
*pscreen,
>>> enum pipe_cap param)
>>>      case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
>>>         return (class_3d >= NVE4_3D_CLASS) ? 1 : 0;
>>>      case PIPE_CAP_COMPUTE:
>>> -      return (class_3d <= NVE4_3D_CLASS) ? 1 : 0;
>>> +      return 1;
>>>      case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
>>>         return nouveau_screen(pscreen)->vram_domain &
NOUVEAU_BO_VRAM ?
>>> 1 : 0;
>>>
>>> @@ -246,8 +246,6 @@ nvc0_screen_get_shader_param(struct pipe_screen
>>> *pscreen, unsigned shader,
>>>            return 0;
>>>         break;
>>>      case PIPE_SHADER_COMPUTE:
>>> -      if (class_3d > NVE4_3D_CLASS)
>>> -         return 0;
>>>         break;
>>>      default:
>>>         return 0;
>>> @@ -574,11 +572,10 @@ nvc0_screen_init_compute(struct nvc0_screen
>>> *screen)
>>>      case 0xd0:
>>>         return nvc0_screen_compute_setup(screen,
screen->base.pushbuf);
>>>      case 0xe0:
>>> -      return nve4_screen_compute_setup(screen,
screen->base.pushbuf);
>>>      case 0xf0:
>>>      case 0x100:
>>>      case 0x110:
>>> -      return 0;
>>> +      return nve4_screen_compute_setup(screen,
screen->base.pushbuf);
>>>      default:
>>>         return -1;
>>>      }
>>>
>>> Then as soon as I do startx (which starts gnome-shell) the machine
>>> freezes. This is with mesa-master with the above changes on top.
>>>
>>> X / gnome-shell will happily work of I do not call
>>> nve4_screen_compute_setup()
>>> but then test/trivial/compute fails with a null-ptr exception.
>>>
>>> Do you perhaps have some extra patches in your tree, or am I just
>>> unlucky ?
>>>
>>> I've tested this on both a 4.2 and a 4.4-rc3 kernel.
>>
>> Hi,
>>
>> My bad... I used the wrong card on reator (which is the REing machine
>> of Martin). The primary card is a GK106 and the second one is the
>> GK208. That doesn't explain why I did something wrong but heh? :-)
>>
>> You are right. With those bits added locally, the compute support
>> totally hangs the GPU on my GK208 (NV108), and a reboot is needed.
>>
>> Please give a shot at this branch :
>> http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute
>>
>> It fixes the initialization of the compute state and allows me to
>> launch 'test_input_global' (ie. ./compute 8) on my GK208
without
>> any dmesg fails. That's a good start but more patches are coming.
:-)
>
> This branch indeed works somewhat better, but things still hang on the
>
> test_system_values compute test for me (this is the first test executed
> I did not try the others). So this seems to need more work.
What about test_input_global? test_system_values doesn't work on my side 
but it doesn't hang the GPU. Could you please provide dmesg log?
>
> I've ordered a GTX740 (GK107) card, which should arrive soon, and
> I'll be using that so I can (hopefully) focus on the llvm tgsi bits
> again.
Yeah, GK107 will do the job. :-)
>
>> Btw, according to the trace you sent me, you have a GK208b (NV106).
>
> Right, sorry I thought the differences between GK208 and GK208b would
> not matter.
I don't know exactly the differences between these two chipsets but 
since test_system_values hangs your GPU and not mine, I think they are some.
>
> Thanks for all the input / help!
>
> Regards,
>
> Hans
>
>
-- 
-Samuel

Hans de Goede

2015-Dec-04 09:12 UTC

head link

[Nouveau] NV50 compute support questions

Hi,

On 04-12-15 09:54, Samuel Pitoiset wrote:>
>
> On 12/04/2015 09:45 AM, Hans de Goede wrote:
<snip>
>>> Please give a shot at this branch :
>>> http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nvf0_compute
>>>
>>> It fixes the initialization of the compute state and allows me to
>>> launch 'test_input_global' (ie. ./compute 8) on my GK208
without
>>> any dmesg fails. That's a good start but more patches are
coming. :-)
>>
>> This branch indeed works somewhat better, but things still hang on the
>>
>> test_system_values compute test for me (this is the first test executed
>> I did not try the others). So this seems to need more work.
>
> What about test_input_global? test_system_values doesn't work on my
side but it doesn't hang the GPU.
Yes that one works.
> Could you please provide dmesg log?
[    2.786631] nouveau 0000:01:00.0: NVIDIA GK208B (b06070b1)
[    2.914291] nouveau 0000:01:00.0: bios: version 80.28.79.00.0b
[    2.937909] nouveau 0000:01:00.0: priv: HUB0: 086014 ffffffff (1f70820c)
[    2.937953] nouveau 0000:01:00.0: fb: 1024 MiB DDR3
[    3.623202] [TTM] Zone  kernel: Available graphics memory: 2010556 kiB
[    3.623205] [TTM] Initializing pool allocator
[    3.623241] [TTM] Initializing DMA pool allocator
[    3.623440] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
[    3.623442] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[    3.623447] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    3.623449] nouveau 0000:01:00.0: DRM: DCB version 4.0
[    3.623451] nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030
[    3.623454] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f62 00020010
[    3.623456] nouveau 0000:01:00.0: DRM: DCB outp 02: 02022f10 00000000
[    3.623458] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001031
[    3.623460] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002161
[    3.623462] nouveau 0000:01:00.0: DRM: DCB conn 02: 00000200
[    3.627283] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.627285] [drm] Driver supports precise vblank timestamp query.
[    3.671871] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[    3.889940] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x60000, bo
ffff880119050000
[    3.890952] fbcon: nouveaufb (fb0) is primary device
[    4.132343] Console: switching to colour frame buffer device 240x67
[    4.134930] nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device
[    4.141094] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on
minor 0

<snip>

[ 1713.421460] nouveau 0000:01:00.0: gr: TRAP ch 6 [003fa32000 compute[21117]]
[ 1713.421471] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000 []
warp 3000e [MEM_OUT_OF_BOUNDS]
[ 1713.441248] nouveau 0000:01:00.0: gr: TRAP ch 6 [003fa32000 compute[21117]]
[ 1713.441260] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004
[MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC]
[ 1713.441265] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000004
[MULTIPLE_WARP_ERRORS] warp 20005 [MISALIGNED_PC]
[ 1717.773839] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1717.773848] nouveau 0000:01:00.0: fifo: sw engine fault on channel 2,
recovering...
[ 1719.776529] nouveau 0000:01:00.0: fifo: runlist 0 update timeout
[ 1722.068923] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1726.363660] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1730.658395] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1734.951720] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1739.241861] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1743.532005] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1747.826728] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1752.121462] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1756.416200] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1760.710930] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1765.005663] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1769.300396] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1773.595135] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1777.889863] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1782.184598] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1786.479328] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1789.730020] nouveau 0000:01:00.0: compute[21117]: failed to idle channel 6
[compute[21117]]
[ 1790.774060] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1791.729963] nouveau 0000:01:00.0: timeout at
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()!
[ 1791.729966] nouveau 0000:01:00.0: fifo: channel 6 [compute[21117]] kick
timeout
[ 1791.729973] nouveau: compute[21117]:00000000:0000a06f: detach gr failed, -16
[ 1791.731401] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d []
[ 1793.731275] nouveau 0000:01:00.0: timeout at
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c:47/gk104_fifo_gpfifo_kick()!
[ 1793.731279] nouveau 0000:01:00.0: fifo: channel 6 [compute[21117]] kick
timeout
[ 1793.731281] nouveau: compute[21117]:00000000:0000a06f: detach sw failed, -16
[ 1796.026118] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1800.320809] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1804.615446] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1808.731016] nouveau 0000:01:00.0: compute[21117]: failed to idle channel 6
[compute[21117]]
[ 1808.738716] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0d []
[ 1810.738093] nouveau 0000:01:00.0: fifo: runlist 0 update timeout
[ 1810.738106] nouveau 0000:01:00.0: fifo: BIND_ERROR 03 [UNBIND_WHILE_RUNNING]
[ 1815.032747] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1819.327395] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[ 1823.622036] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]

<last line keeps repeating at aprox 4 sec interval>
>> I've ordered a GTX740 (GK107) card, which should arrive soon, and
>> I'll be using that so I can (hopefully) focus on the llvm tgsi bits
>> again.
>
> Yeah, GK107 will do the job. :-)
Good a said it should arrive soon.
>>> Btw, according to the trace you sent me, you have a GK208b (NV106).
>>
>> Right, sorry I thought the differences between GK208 and GK208b would
>> not matter.
>
> I don't know exactly the differences between these two chipsets but
since test_system_values hangs your GPU and not mine, I think they are some.
Ok.

Regards,

Hans

Hans de Goede

2015-Dec-07 15:10 UTC

head link

[Nouveau] NV50 compute support questions

Hi

On 04-12-15 09:45, Hans de Goede wrote:
> I've ordered a GTX740 (GK107) card, which should arrive soon, and
> I'll be using that so I can (hopefully) focus on the llvm tgsi bits
> again.
So the card arrived today and I've plugged it in tests/trivial/compute
looks much better with this. But there does seem to be one issue
(other then the atomic bits not working) :

- test_resource_indirect
(1, 0)[0]: got 0x2/0.000000, expected 0x3/0.000000
(3, 0)[0]: got 0x6/0.000000, expected 0x7/0.000000
(5, 0)[0]: got 0xa/0.000000, expected 0xb/0.000000
(7, 0)[0]: got 0xe/0.000000, expected 0xf/0.000000
(9, 0)[0]: got 0x12/0.000000, expected 0x13/0.000000
(11, 0)[0]: got 0x16/0.000000, expected 0x17/0.000000
(13, 0)[0]: got 0x1a/0.000000, expected 0x1b/0.000000
(15, 0)[0]: got 0x1e/0.000000, expected 0x1f/0.000000
(17, 0)[0]: got 0x22/0.000000, expected 0x23/0.000000
(19, 0)[0]: got 0x26/0.000000, expected 0x27/0.000000
(21, 0)[0]: got 0x2a/0.000000, expected 0x2b/0.000000
(23, 0)[0]: got 0x2e/0.000000, expected 0x2f/0.000000
(25, 0)[0]: got 0x32/0.000000, expected 0x33/0.000000
(27, 0)[0]: got 0x36/0.000000, expected 0x37/0.000000
(29, 0)[0]: got 0x3a/0.000000, expected 0x3b/0.000000
(31, 0)[0]: got 0x3e/0.000000, expected 0x3f/0.000000
(33, 0)[0]: got 0x42/0.000000, expected 0x43/0.000000
(35, 0)[0]: got 0x46/0.000000, expected 0x47/0.000000
(37, 0)[0]: got 0x4a/0.000000, expected 0x4b/0.000000
(39, 0)[0]: got 0x4e/0.000000, expected 0x4f/0.000000
(64, 1): FAIL (32)

Regards,

Hans

Samuel Pitoiset

2015-Dec-07 16:18 UTC

head link

[Nouveau] NV50 compute support questions

On 12/07/2015 04:10 PM, Hans de Goede wrote:> Hi
>
Hi,
> On 04-12-15 09:45, Hans de Goede wrote:
>
>> I've ordered a GTX740 (GK107) card, which should arrive soon, and
>> I'll be using that so I can (hopefully) focus on the llvm tgsi bits
>> again.
>
> So the card arrived today and I've plugged it in tests/trivial/compute
> looks much better with this. But there does seem to be one issue
> (other then the atomic bits not working) :
>
> - test_resource_indirect
Exactly, two or three test don't work on Kepler < GK110.
It's on my todolist, but with a low priority. :-)

Thanks for reporting this anyway.
> (1, 0)[0]: got 0x2/0.000000, expected 0x3/0.000000
> (3, 0)[0]: got 0x6/0.000000, expected 0x7/0.000000
> (5, 0)[0]: got 0xa/0.000000, expected 0xb/0.000000
> (7, 0)[0]: got 0xe/0.000000, expected 0xf/0.000000
> (9, 0)[0]: got 0x12/0.000000, expected 0x13/0.000000
> (11, 0)[0]: got 0x16/0.000000, expected 0x17/0.000000
> (13, 0)[0]: got 0x1a/0.000000, expected 0x1b/0.000000
> (15, 0)[0]: got 0x1e/0.000000, expected 0x1f/0.000000
> (17, 0)[0]: got 0x22/0.000000, expected 0x23/0.000000
> (19, 0)[0]: got 0x26/0.000000, expected 0x27/0.000000
> (21, 0)[0]: got 0x2a/0.000000, expected 0x2b/0.000000
> (23, 0)[0]: got 0x2e/0.000000, expected 0x2f/0.000000
> (25, 0)[0]: got 0x32/0.000000, expected 0x33/0.000000
> (27, 0)[0]: got 0x36/0.000000, expected 0x37/0.000000
> (29, 0)[0]: got 0x3a/0.000000, expected 0x3b/0.000000
> (31, 0)[0]: got 0x3e/0.000000, expected 0x3f/0.000000
> (33, 0)[0]: got 0x42/0.000000, expected 0x43/0.000000
> (35, 0)[0]: got 0x46/0.000000, expected 0x47/0.000000
> (37, 0)[0]: got 0x4a/0.000000, expected 0x4b/0.000000
> (39, 0)[0]: got 0x4e/0.000000, expected 0x4f/0.000000
> (64, 1): FAIL (32)
>
> Regards,
>
> Hans
-- 
-Samuel

Seemingly Similar Threads

Search for more reasonably related threads

Nouveau - Dec 2015 - NV50 compute support questions

[Nouveau] NV50 compute support questions

[Nouveau] NV50 compute support questions

[Nouveau] NV50 compute support questions

[Nouveau] NV50 compute support questions

[Nouveau] NV50 compute support questions

Seemingly Similar Threads