thr3ads.net - Nouveau - [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2016-Feb-22 16:00 UTC

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at redhat.com>
wrote:>> But assuming I'm right, what I'm proposing is that instead of
passing
>> the input in as a global buffer, to instead pass it in as a const
>> buffer. As such instead of sticking it into ->set_global_binding,
>> you'd stick it into ->set_constant_buffer, and then you'll
be able to
>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim
when
>> it's 0.) You don't even have to load these, you can use them as
args
>> directly anywhere you like (except as indirect addresses).
>>
>> The old code would actually take the supplied inputs, stick them into
>> a constbuf, and then lower RINPUT accesses to load from that constbuf.
>> I'm suggesting we cut out the middleman.
>>
>> By the way, another term for "constant buffer" is
"uniform buffer", on
>> the off chance it helps. Basically it's super-cached by the shader
for
>> values that never change across shader invocations. [And there's
>> special stuff in the hw to allow running multiple sets of shader
>> invocations with different "constant" values... or so we
think.]
>
>
> I'm fine with using constant buffers for the input, it is not the
> mechanism I'm worried about it is the tgsi syntax to express things,
> I think it would be beneficial for the tgsi syntax to be abstract, and
> not worry about the underlying mechanism, this will i.e. allow us
> to use shared memory for input on tesla and const bufs on later generations
> without the part generating the tgsi code needing to worry about this.
Yeah, I think you're right. I didn't realize that tesla had a special
form of input for user params, I assumed it was just the usual thing.
So forget about constbufs, go with the INPUT thing. Which is great,
since we had one value left over in that (future) 2-bit field :)
>
> ###
>
> Somewhat unrelated to the input problem, I'm also somewhat worried
> about the addressing method for MEMORY type registers.
>
> Looking at the old RES stuff then the "index" passed into say a
LOAD
> was not as much an index as it was simply a 32 bit GPU virtual memory
> address, which fits well with the OpenCL ways of doing things (the
> register number as in the 55 in RES[55] was more or less ignored).
>
> Where as, e.g. the new BUFFER style "registers" the index really
> is an index, e.g. doing:
> LOAD TEMP[0].x, BUFFER[0], IMM[0]
> resp.
> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>
> Will read from a different memory address, correct ?
Correct -- BUFFER[0] refers to the buffer at binding point 0, and
BUFFER[1] refers to the buffer at binding point 1. They might, in
fact, overlap, or even be the same buffer. But the code doesn't know
about that.
>
> So how will this work for MEMORY type registers ? For OpenCL having the
> 1-dimensional behavior of RES really is quite useful, and having the
> address be composed of a hidden base address which gets determined under
> the hood from the register number, and then adding an index on top of
> it does not fit so well.
Not sure what the question is... you have code like

int *foo = [pointer value from user input];
*foo = *(foo + 5);

right?

So that'd just become

MOV TEMP[0].x, <val from user input, whereever it is>
ADD TEMP[0].y, TEMP[0].x, 5 * 4
LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
STORE MEMORY[0], TEMP[0].x, TEMP[1].x

or perhaps I'm misunderstanding something?

MEMORY, GLOBAL == the global virtual memory address space, not some
specific buffer. Trying to load address 0 from it will likely lead to
sadness, unless you happen to have something mapped there. BUFFER has
an implied base address, based on the binding point, but MEMORY has no
such thing.

  -ilia

Pierre Moreau

2016-Feb-22 16:07 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

----- Mail original -----> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at
redhat.com>
> wrote:
> >> But assuming I'm right, what I'm proposing is that instead
of
> >> passing
> >> the input in as a global buffer, to instead pass it in as a const
> >> buffer. As such instead of sticking it into
->set_global_binding,
> >> you'd stick it into ->set_constant_buffer, and then
you'll be able
> >> to
> >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
> >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second
dim
> >> when
> >> it's 0.) You don't even have to load these, you can use
them as
> >> args
> >> directly anywhere you like (except as indirect addresses).
> >>
> >> The old code would actually take the supplied inputs, stick them
> >> into
> >> a constbuf, and then lower RINPUT accesses to load from that
> >> constbuf.
> >> I'm suggesting we cut out the middleman.
> >>
> >> By the way, another term for "constant buffer" is
"uniform
> >> buffer", on
> >> the off chance it helps. Basically it's super-cached by the
shader
> >> for
> >> values that never change across shader invocations. [And
there's
> >> special stuff in the hw to allow running multiple sets of shader
> >> invocations with different "constant" values... or so we
think.]
> >
> >
> > I'm fine with using constant buffers for the input, it is not the
> > mechanism I'm worried about it is the tgsi syntax to express
> > things,
> > I think it would be beneficial for the tgsi syntax to be abstract,
> > and
> > not worry about the underlying mechanism, this will i.e. allow us
> > to use shared memory for input on tesla and const bufs on later
> > generations
> > without the part generating the tgsi code needing to worry about
> > this.
> 
> Yeah, I think you're right. I didn't realize that tesla had a
special
> form of input for user params, I assumed it was just the usual thing.
> So forget about constbufs, go with the INPUT thing. Which is great,
> since we had one value left over in that (future) 2-bit field :)
I can have a try at using constbufs for user inputs on Tesla. It's just
that the blob was using shared for them, so we kept shared.

Pierre
> 
> >
> > ###
> >
> > Somewhat unrelated to the input problem, I'm also somewhat worried
> > about the addressing method for MEMORY type registers.
> >
> > Looking at the old RES stuff then the "index" passed into
say a
> > LOAD
> > was not as much an index as it was simply a 32 bit GPU virtual
> > memory
> > address, which fits well with the OpenCL ways of doing things (the
> > register number as in the 55 in RES[55] was more or less ignored).
> >
> > Where as, e.g. the new BUFFER style "registers" the index
really
> > is an index, e.g. doing:
> > LOAD TEMP[0].x, BUFFER[0], IMM[0]
> > resp.
> > LOAD TEMP[0].x, BUFFER[1], IMM[0]
> >
> > Will read from a different memory address, correct ?
> 
> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
> BUFFER[1] refers to the buffer at binding point 1. They might, in
> fact, overlap, or even be the same buffer. But the code doesn't know
> about that.
> 
> >
> > So how will this work for MEMORY type registers ? For OpenCL having
> > the
> > 1-dimensional behavior of RES really is quite useful, and having
> > the
> > address be composed of a hidden base address which gets determined
> > under
> > the hood from the register number, and then adding an index on top
> > of
> > it does not fit so well.
> 
> Not sure what the question is... you have code like
> 
> int *foo = [pointer value from user input];
> *foo = *(foo + 5);
> 
> right?
> 
> So that'd just become
> 
> MOV TEMP[0].x, <val from user input, whereever it is>
> ADD TEMP[0].y, TEMP[0].x, 5 * 4
> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
> 
> or perhaps I'm misunderstanding something?
> 
> MEMORY, GLOBAL == the global virtual memory address space, not some
> specific buffer. Trying to load address 0 from it will likely lead to
> sadness, unless you happen to have something mapped there. BUFFER has
> an implied base address, based on the binding point, but MEMORY has
> no
> such thing.
> 
>   -ilia
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>

Ilia Mirkin

2016-Feb-22 16:11 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

On Mon, Feb 22, 2016 at 11:07 AM, Pierre Moreau <pierre.morrow at free.fr>
wrote:>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at
redhat.com>
>> wrote:
>> >> But assuming I'm right, what I'm proposing is that
instead of
>> >> passing
>> >> the input in as a global buffer, to instead pass it in as a
const
>> >> buffer. As such instead of sticking it into
->set_global_binding,
>> >> you'd stick it into ->set_constant_buffer, and then
you'll be able
>> >> to
>> >> refer to it as CONST[0], CONST[1], etc. (Which are,
implicitly,
>> >> CONST[0][0], CONST[0][1], etc -- it doesn't print the
second dim
>> >> when
>> >> it's 0.) You don't even have to load these, you can
use them as
>> >> args
>> >> directly anywhere you like (except as indirect addresses).
>> >>
>> >> The old code would actually take the supplied inputs, stick
them
>> >> into
>> >> a constbuf, and then lower RINPUT accesses to load from that
>> >> constbuf.
>> >> I'm suggesting we cut out the middleman.
>> >>
>> >> By the way, another term for "constant buffer" is
"uniform
>> >> buffer", on
>> >> the off chance it helps. Basically it's super-cached by
the shader
>> >> for
>> >> values that never change across shader invocations. [And
there's
>> >> special stuff in the hw to allow running multiple sets of
shader
>> >> invocations with different "constant" values... or
so we think.]
>> >
>> >
>> > I'm fine with using constant buffers for the input, it is not
the
>> > mechanism I'm worried about it is the tgsi syntax to express
>> > things,
>> > I think it would be beneficial for the tgsi syntax to be abstract,
>> > and
>> > not worry about the underlying mechanism, this will i.e. allow us
>> > to use shared memory for input on tesla and const bufs on later
>> > generations
>> > without the part generating the tgsi code needing to worry about
>> > this.
>>
>> Yeah, I think you're right. I didn't realize that tesla had a
special
>> form of input for user params, I assumed it was just the usual thing.
>> So forget about constbufs, go with the INPUT thing. Which is great,
>> since we had one value left over in that (future) 2-bit field :)
>
> I can have a try at using constbufs for user inputs on Tesla. It's just
> that the blob was using shared for them, so we kept shared.
I suspect there are hw advantages to doing it the way it was being
done. We should keep doing that.

Ilia Mirkin

2016-Feb-22 16:13 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at
redhat.com> wrote:
>>> But assuming I'm right, what I'm proposing is that instead
of passing
>>> the input in as a global buffer, to instead pass it in as a const
>>> buffer. As such instead of sticking it into
->set_global_binding,
>>> you'd stick it into ->set_constant_buffer, and then
you'll be able to
>>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second
dim when
>>> it's 0.) You don't even have to load these, you can use
them as args
>>> directly anywhere you like (except as indirect addresses).
>>>
>>> The old code would actually take the supplied inputs, stick them
into
>>> a constbuf, and then lower RINPUT accesses to load from that
constbuf.
>>> I'm suggesting we cut out the middleman.
>>>
>>> By the way, another term for "constant buffer" is
"uniform buffer", on
>>> the off chance it helps. Basically it's super-cached by the
shader for
>>> values that never change across shader invocations. [And
there's
>>> special stuff in the hw to allow running multiple sets of shader
>>> invocations with different "constant" values... or so we
think.]
>>
>>
>> I'm fine with using constant buffers for the input, it is not the
>> mechanism I'm worried about it is the tgsi syntax to express
things,
>> I think it would be beneficial for the tgsi syntax to be abstract, and
>> not worry about the underlying mechanism, this will i.e. allow us
>> to use shared memory for input on tesla and const bufs on later
generations
>> without the part generating the tgsi code needing to worry about this.
>
> Yeah, I think you're right. I didn't realize that tesla had a
special
> form of input for user params, I assumed it was just the usual thing.
> So forget about constbufs, go with the INPUT thing. Which is great,
> since we had one value left over in that (future) 2-bit field :)
>
>>
>> ###
>>
>> Somewhat unrelated to the input problem, I'm also somewhat worried
>> about the addressing method for MEMORY type registers.
>>
>> Looking at the old RES stuff then the "index" passed into say
a LOAD
>> was not as much an index as it was simply a 32 bit GPU virtual memory
>> address, which fits well with the OpenCL ways of doing things (the
>> register number as in the 55 in RES[55] was more or less ignored).
>>
>> Where as, e.g. the new BUFFER style "registers" the index
really
>> is an index, e.g. doing:
>> LOAD TEMP[0].x, BUFFER[0], IMM[0]
>> resp.
>> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>>
>> Will read from a different memory address, correct ?
>
> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
> BUFFER[1] refers to the buffer at binding point 1. They might, in
> fact, overlap, or even be the same buffer. But the code doesn't know
> about that.
>
>>
>> So how will this work for MEMORY type registers ? For OpenCL having the
>> 1-dimensional behavior of RES really is quite useful, and having the
>> address be composed of a hidden base address which gets determined
under
>> the hood from the register number, and then adding an index on top of
>> it does not fit so well.
>
> Not sure what the question is... you have code like
>
> int *foo = [pointer value from user input];
> *foo = *(foo + 5);
>
> right?
>
> So that'd just become
>
> MOV TEMP[0].x, <val from user input, whereever it is>
> ADD TEMP[0].y, TEMP[0].x, 5 * 4
> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
>
> or perhaps I'm misunderstanding something?
>
> MEMORY, GLOBAL == the global virtual memory address space, not some
> specific buffer. Trying to load address 0 from it will likely lead to
> sadness, unless you happen to have something mapped there. BUFFER has
> an implied base address, based on the binding point, but MEMORY has no
> such thing.
Another way of looking at it is that instead of having the hacky
RES[12345] being hardcoded to mean something special, you now have a
dedicated file called 'MEMORY', which has identical semantics.

  -ilia

Hans de Goede

2016-Feb-22 16:50 UTC

head link

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Hi,

On 22-02-16 17:13, Ilia Mirkin wrote:> On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdegoede at
redhat.com> wrote:
>>>> But assuming I'm right, what I'm proposing is that
instead of passing
>>>> the input in as a global buffer, to instead pass it in as a
const
>>>> buffer. As such instead of sticking it into
->set_global_binding,
>>>> you'd stick it into ->set_constant_buffer, and then
you'll be able to
>>>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>>>> CONST[0][0], CONST[0][1], etc -- it doesn't print the
second dim when
>>>> it's 0.) You don't even have to load these, you can use
them as args
>>>> directly anywhere you like (except as indirect addresses).
>>>>
>>>> The old code would actually take the supplied inputs, stick
them into
>>>> a constbuf, and then lower RINPUT accesses to load from that
constbuf.
>>>> I'm suggesting we cut out the middleman.
>>>>
>>>> By the way, another term for "constant buffer" is
"uniform buffer", on
>>>> the off chance it helps. Basically it's super-cached by the
shader for
>>>> values that never change across shader invocations. [And
there's
>>>> special stuff in the hw to allow running multiple sets of
shader
>>>> invocations with different "constant" values... or so
we think.]
>>>
>>>
>>> I'm fine with using constant buffers for the input, it is not
the
>>> mechanism I'm worried about it is the tgsi syntax to express
things,
>>> I think it would be beneficial for the tgsi syntax to be abstract,
and
>>> not worry about the underlying mechanism, this will i.e. allow us
>>> to use shared memory for input on tesla and const bufs on later
generations
>>> without the part generating the tgsi code needing to worry about
this.
>>
>> Yeah, I think you're right. I didn't realize that tesla had a
special
>> form of input for user params, I assumed it was just the usual thing.
>> So forget about constbufs, go with the INPUT thing. Which is great,
>> since we had one value left over in that (future) 2-bit field :)
>>
>>>
>>> ###
>>>
>>> Somewhat unrelated to the input problem, I'm also somewhat
worried
>>> about the addressing method for MEMORY type registers.
>>>
>>> Looking at the old RES stuff then the "index" passed into
say a LOAD
>>> was not as much an index as it was simply a 32 bit GPU virtual
memory
>>> address, which fits well with the OpenCL ways of doing things (the
>>> register number as in the 55 in RES[55] was more or less ignored).
>>>
>>> Where as, e.g. the new BUFFER style "registers" the index
really
>>> is an index, e.g. doing:
>>> LOAD TEMP[0].x, BUFFER[0], IMM[0]
>>> resp.
>>> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>>>
>>> Will read from a different memory address, correct ?
>>
>> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
>> BUFFER[1] refers to the buffer at binding point 1. They might, in
>> fact, overlap, or even be the same buffer. But the code doesn't
know
>> about that.
Ack.
>>> So how will this work for MEMORY type registers ? For OpenCL having
the
>>> 1-dimensional behavior of RES really is quite useful, and having
the
>>> address be composed of a hidden base address which gets determined
under
>>> the hood from the register number, and then adding an index on top
of
>>> it does not fit so well.
>>
>> Not sure what the question is... you have code like
>>
>> int *foo = [pointer value from user input];
>> *foo = *(foo + 5);
>>
>> right?
>>
>> So that'd just become
>>
>> MOV TEMP[0].x, <val from user input, whereever it is>
>> ADD TEMP[0].y, TEMP[0].x, 5 * 4
>> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
>> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
>>
>> or perhaps I'm misunderstanding something?
>>
>> MEMORY, GLOBAL == the global virtual memory address space, not some
>> specific buffer. Trying to load address 0 from it will likely lead to
>> sadness, unless you happen to have something mapped there. BUFFER has
>> an implied base address, based on the binding point, but MEMORY has no
>> such thing.
OK, that answers my questions / worries, I was worried that MEMORY
too would have an implied base address, which would more or less only
get in the way with opencl, but if the memory register file takes
a virtual memory address as second operand to LOAD then I'm happy.

So I guess that if we mix in say TGSI-shared / OpenCL-local memory
them I would do:

DCL MEMORY[0], GLOBAL
DCL MEMORY[1], SHARED

And then to load something from global mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[0], TEMP[0].yyyy

And to load something from the shared mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[1], TEMP[0].yyyy

Correct ?  And the shared mem to will take shared virtual memory
addresses, just like global takes global virtual memory
addresses ?
> Another way of looking at it is that instead of having the hacky
> RES[12345] being hardcoded to mean something special, you now have a
> dedicated file called 'MEMORY', which has identical semantics.
I'm all for getting rid of the RES[12345] hack :)

I guess where you write "you now have a dedicated file called
'MEMORY'"
You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
and LOCAL at least, and probably as discussed one for INPUT ?

This all sounds good to me, as said my worry was that MEMORY would have
an implied base address like BUFFER has, now that you've
made clear that MEMORY does not have this I'm happy :)

Regards,

Hans

Maybe Matching Threads

Search for more reasonably related threads

Nouveau - Feb 2016 - Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Maybe Matching Threads