thr3ads.net - llvm dev - [LLVMdev] PTX builtin functions. [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Alberto Magni

2011-Dec-04 18:10 UTC

[LLVMdev] PTX builtin functions.

Hi Justin,

sorry for the delay, I have been busy.

Micah's proposal requires to move the definitions of the intrinsics
from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
thus allowing the generation of the file PTXGenIntrinsics.inc which
will be included by PTXIntrinsicInfo.cpp.
This is a quite big modification, do you agree with this ?
Or do you have a better solution.

Also I don't know yet how to make llvm recognize the intrinsics
defined in lib/Target/PTX/PTXIntrinsics.td, the only other
backend that does so is MBlaze.

A tentative patch is attached.

Bye,
Alberto

On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
<justin.holewinski at gmail.com> wrote:>
> On Nov 23, 2011 8:33 AM, "Justin Holewinski"
<justin.holewinski at gmail.com>
> wrote:
>>
>>
>> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86
at gmail.com>
>> wrote:
>> >
>> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow
at amd.com>
>> > wrote:
>> > > Alberto,
>> > >  The AMDIL backend solves your problem with intrinsic
overloading this
>> > > way:
>> > > def int_AMDIL_mad     :
GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>> > >
>> > > Where TernaryIntFloat is defined as:
>> > > class TernaryIntFloat :
>> > >          Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,
>> > >          LLVMMatchType<0>, LLVMMatchType<0>],
[]>;
>> > >
>> > > This allows us to write a multi-def for int_AMDIL_mad like
so:
>> > > defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD,
int_AMDIL_mad>;
>> > >
>> > > Where TernaryIntrinsicFloat is defined as:
>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode,
Intrinsic intr>
>> > > {
>> > >  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>> > >      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > >      [(set GPRF32:$dst,
>> > >          (intr GPRF32:$src, GPRF32:$src2,
GPRF32:$src3))]>;
>> > >  def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>> > >      (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > >      [(set GPRV2F32:$dst,
>> > >          (intr GPRV2F32:$src, GPRV2F32:$src2,
GPRV2F32:$src3))]>;
>> > > ...
>> > > }
>> > >
>> > > Now, this doesn't completely work, because LLVM does not
allow
>> > > overloading of intrinsics values, so there needs to be a
little coding in
>> > > *IntrinsicInfo class.
>> > > AMD always encodes builtin names as __amdil_mad_f32,
>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>> > > So in the function "*IntrinsicInfo::lookup_name",
when attempting to
>> > > find out what intrinsic the function maps to, the AMDIL
backend strips off
>> > > the type, and then looks up for just '__amdil_mad'.
>> > >
>> > > This is how you can do intrinsic overloading in LLVM.
>> > >
>> > > Hope this helps,
>> > > Micah
>> >
>> > Thank you Micah, it really does.
>> >
>> > At the moment the PTX backend does not have a PTXIntrinsicInfo
class,
>> > the only backend which does so is MBlaze.
>> > If Justin agrees with the approach I will look on how to generate
the
>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>> > required by PTXIntrinsicInfo and write the lookUp method.
>>
>> Looks good to me.  For OpenCL support in clang, we definitely need the
>> built-in function support.  And the total number of intrinsics like
this
>> should be relatively minimal.
>
> One thing I forgot to mention:  once these are implemented, it may be worth
> implementing some instruction selection patterns to collapse icmp/fcmp and
> select pairs into Max/min whenever it makes sense.
>
>>
>> >
>> > Cheers,
>> >
>> > Alberto
>> >
>> > >
>> > >> -----Original Message-----
>> > >> From: llvmdev-bounces at cs.uiuc.edu
>> > >> [mailto:llvmdev-bounces at cs.uiuc.edu]
>> > >> On Behalf Of Alberto Magni
>> > >> Sent: Tuesday, November 22, 2011 8:41 AM
>> > >> To: Justin Holewinski
>> > >> Cc: LLVM Developers Mailing List
>> > >> Subject: Re: [LLVMdev] PTX builtin functions.
>> > >>
>> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>> > >> <justin.holewinski at gmail.com> wrote:
>> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>> > >> <alberto.magni86 at gmail.com>
>> > >> > wrote:
>> > >> >>
>> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin
Holewinski
>> > >> >> <justin.holewinski at gmail.com> wrote:
>> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto
Magni
>> > >> >> > <alberto.magni86 at gmail.com>
>> > >> >> > wrote:
>> > >> >> >>
>> > >> >> >> Hi Justin,
>> > >> >> >>
>> > >> >> >> attached you find the patch for the
integer max instruction.
>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in
file
>> > >> PTXIntrinsicInstrInfo.td
>> > >> >> >> is almost an exact copy of  PTX_INT3 in
PTXInstrInfo.td, maybe
>> > >> >> >> a modification of this class can be
defined in a separate file.
>> > >> >> >
>> > >> >> >
>> > >> >> > I'm copying llvmdev.  We should keep
discussions like this on
>> > >> >> > the
>> > >> list
>> > >> >> > for
>> > >> >> > the benefit of others.
>> > >> >>
>> > >> >> I always forget "Reply to All".
>> > >> >>
>> > >> >> > We can probably factor out a generic
description, or even just
>> > >> >> > use
>> > >> the
>> > >> >> > PTX_INT3 multiclass directly.  The
PTXIntrinsicInstrInfo.td file
>> > >> is
>> > >> >> > included
>> > >> >> > by PTXInstrInfo.td, so anything defined in
PTXInstrInfo.td is
>> > >> available
>> > >> >> > in
>> > >> >> > PTXIntrinsicInstrInfo.td.
>> > >> >>
>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3
works with an
>> > >> Intrinsic
>> > >> >> and not with a SDNode, like PTX_INT3.
>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of
the type of
>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> > >> >
>> > >> >
>> > >> > Alright, I'm fine with that.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Do you agree with this approach ?
>> > >> >> >> Also, do you think that a class like
PTX_INTRINSIC_INT3_SIGNED
>> > >> >> >> (a clone of PTX_INT3_SIGNED) is
required ?
>> > >> >> >
>> > >> >> >
>> > >> >> > Yes, I believe we should split these into
signed and unsigned
>> > >> variants.
>> > >> >> >  The
>> > >> >> > results of max/min operations can
definitely be different
>> > >> depending on
>> > >> >> > whether the operands are signed or
unsigned.  Since this
>> > >> information is
>> > >> >> > not
>> > >> >> > encoded in LLVM types, we may want to
create two versions for
>> > >> >> > each
>> > >> >> > integer
>> > >> >> > type; something like:
>> > >> >> >
>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> > >> >>
>> > >> >> Yes, this the only way.
>> > >> >
>> > >> >
>> > >> > A couple more comments:
>> > >> >
>> > >> > Please make sure to set TargetPrefix="ptx"
for the intrinsics
>> > >> (probably best
>> > >> > in the multiclass,
see PTXReadSpecialRegisterIntrinsic_r32)]
>> > >>
>> > >> Ok
>> > >>
>> > >> > I'm not sure how to define a GCCBuiltin for an
intrinsic that can
>> > >> take
>> > >> > multiple types, but it's probably worth looking
into so we can
>> > >> > expose
>> > >> this
>> > >> > intrinsic to Clang.
>> > >>
>> > >> This could be an issue. I looked for something similar in
other
>> > >> backends
>> > >> and I found no previous examples. It may be worth to ask
on the ML
>> > >> explicitly for this.
>> > >> The only fallback that I see is to define explicitly
every intrinsic
>> > >> for every data type,
>> > >> but this would prevent the usage of the multiclass for
the definition
>> > >> of the patterns.
>> > >>
>> > >>
>> > >> Bye.
>> > >>
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >
>> > >> >> > Otherwise, the patch looks good.
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >>
>> > >> >> >> Alberto
>> > >> >> >>
>> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM,
Alberto Magni
>> > >> >> >> <alberto.magni86 at gmail.com>
wrote:
>> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM,
Justin Holewinski
>> > >> >> >> > <justin.holewinski at
gmail.com> wrote:
>> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16
AM, Justin Holewinski
>> > >> >> >> >> <justin.holewinski at
gmail.com> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, Nov 16, 2011 at
8:05 AM, Alberto Magni
>> > >> >> >> >>> <alberto.magni86 at
gmail.com>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>> Dear Justin,
>> > >> >> >> >>>>
>> > >> >> >> >>>> I am trying to add the
support for some OpenCL builtin
>> > >> functions
>> > >> >> >> >>>> to
>> > >> >> >> >>>> the PTX backend.
>> > >> >> >> >>>> The attached file
represent the first stub of a patch for
>> > >> the fmax
>> > >> >> >> >>>> builtin function.
>> > >> >> >> >>>
>> > >> >> >> >>> First off, thanks for
helping to improve the PTX back-end!
>> > >> >> >> >>> There are really two main
issues here.  First, OpenCL
>> > >> >> >> >>> built-
>> > >> in
>> > >> >> >> >>> functions
>> > >> >> >> >>> do not belong in the PTX
back-end.  These will be
>> > >> >> >> >>> implemented
>> > >> in
>> > >> >> >> >>> the
>> > >> >> >> >>> libclc
>> > >> >> >> >>> library
(http://www.pcc.me.uk/~peter/libclc).  The back-end
>> > >> will
>> > >> >> >> >>> only
>> > >> >> >> >>> implement PTX intrinsics,
which may be used by the OpenCL
>> > >> built-in
>> > >> >> >> >>> functions
>> > >> >> >> >>> in libclc.  However, this
particular function (max)
>> > >> corresponds to
>> > >> >> >> >>> a
>> > >> >> >> >>> PTX
>> > >> >> >> >>> instruction, so it makes
sense to implement it as an
>> > >> intrinsic in
>> > >> >> >> >>> the
>> > >> >> >> >>> back-end.
>> > >> >> >> >>> Second, intrinsic
functions require a bit more work.
>> > >> >> >> >>>  You're
>> > >> off to
>> > >> >> >> >>> a
>> > >> >> >> >>> great start, but
intrinsics are implemented a bit
>> > >> differently.  It
>> > >> >> >> >>> looks
>> > >> >> >> >>> like LLVM does not have a
max intrinsic, so we'll need to
>> > >> create
>> > >> >> >> >>> one.
>> > >> >> >> >>>  Have
>> > >> >> >> >>> a look at
include/llvm/IntrinsicsPTX.td.  This file defines
>> > >> the
>> > >> >> >> >>> PTX-specific
>> > >> >> >> >>> intrinsics.  You can add
an intrinsic for max here, and
>> > >> >> >> >>> then
>> > >> >> >> >>> implement
>> > >> >> >> >>> a
>> > >> >> >> >>> pattern-match in the
PTXInstrInfo.td file.  There is no
>> > >> >> >> >>> need
>> > >> to
>> > >> >> >> >>> create
>> > >> >> >> >>> a new
>> > >> >> >> >>> SDNode type for
intrinsics, unless they require some
>> > >> >> >> >>> special
>> > >> >> >> >>> handling
>> > >> >> >> >>> in the
>> > >> >> >> >>> C++ code, which I do not
see being the case here.
>> > >> >> >> >>
>> > >> >> >> >> Sorry, there's a typo
here.  The intrinsic pattern matching
>> > >> goes in
>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >> > Thank you for the pointers I will
let you know when I have
>> > >> >> >> > the
>> > >> first
>> > >> >> >> > patch.
>> > >> >> >> >
>> > >> >> >> >>>
>> > >> >> >> >>> When you define a new
intrinsic, use the following template
>> > >> as a
>> > >> >> >> >>> name:
>> > >> >> >> >>> int_ptx_max.  This will
define the LLVM intrinsic as
>> > >> >> >> >>> @llvm.ptx.max().
>> > >> >> >> >>>  Please follow the same
convention when naming the
>> > >> __builtin_*
>> > >> >> >> >>> function.
>> > >> >> >> >>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> The test case I am
trying is the following:
>> > >> >> >> >>>>
>> > >> >> >> >>>> define ptx_device
float @f(float %x, float %y) {
>> > >> >> >> >>>> entry:
>> > >> >> >> >>>>  %z = call float
@fmax(float %x, float %y)
>> > >> >> >> >>>>  ret float %z
>> > >> >> >> >>>> }
>> > >> >> >> >>>>
>> > >> >> >> >>>> declare float
@fmax(float, float)
>> > >> >> >> >>>>
>> > >> >> >> >>>> But at the moment llc
crashes saying that "calls are not
>> > >> >> >> >>>> supported",
>> > >> >> >> >>>> this does not
>> > >> >> >> >>>> happens with llvm
builtins like llvm.sqrt.f32
>> > >> >> >> >>>
>> > >> >> >> >>> Which version of LLVM are
you using?  Calls to PTX device
>> > >> functions
>> > >> >> >> >>> have
>> > >> >> >> >>> been implemented for a
little while now, so I'm surprised
>> > >> >> >> >>> to
>> > >> see
>> > >> >> >> >>> that
>> > >> >> >> >>> error.
>> > >> >> >> >>>  Perhaps it's because
the fmax function is not defined as
>> > >> >> >> >>> ptx_device.
>> > >> >> >> >>>
>> > >> >> >> >
>> > >> >> >> > This is the testcase that I am
using to verify I the max
>> > >> builtin
>> > >> >> >> > function I am impementing
>> > >> >> >> > is actually recognised. I took
inspiration from the llvm-
>> > >> intrinsic.ll
>> > >> >> >> > test case.
>> > >> >> >> > The command I am using to compile
is:
>> > >> >> >> >
>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22
fmax.ll
>> > >> >> >> >
>> > >> >> >> > The option -mattr does not seem to
have any effect.
>> > >> >> >> > I tried also with the ptx_device
qualifier with the same
>> > >> outcome.
>> > >> >> >> > I am using llvm from the svn
repository.
>> > >> >> >> >
>> > >> >> >> > Bye,
>> > >> >> >> >
>> > >> >> >> > Alberto
>> > >> >> >> >
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you please give me
a hint on what I am missing, or
>> > >> >> >> >>>> some
>> > >> >> >> >>>> general
>> > >> >> >> >>>> advice on how
>> > >> >> >> >>>> to add builtin
functions.
>> > >> >> >> >>>>
>> > >> >> >> >>>> Thank you in advance,
>> > >> >> >> >>>>
>> > >> >> >> >>>> Alberto.
>> > >> >> >> >>>>
>> > >> >> >> >>>>
_______________________________________________
>> > >> >> >> >>>> LLVM Developers
mailing list
>> > >> >> >> >>>> LLVMdev at cs.uiuc.edu
        http://llvm.cs.uiuc.edu
>> > >> >> >> >>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> --
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>
>> > >> >> >> >> Thanks,
>> > >> >> >> >> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> >
>> > >> >> > Justin Holewinski
>> > >> >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Justin Holewinski
>> > >> >
>> > >>
>> > >> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >
>> > >-------------- next part --------------
A non-text attachment was scrubbed...
Name: max_builtin.patch
Type: text/x-patch
Size: 21573 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111204/6eaa9ab4/attachment.bin>

Justin Holewinski

2011-Dec-05 14:12 UTC

head link

[LLVMdev] PTX builtin functions.

On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at
gmail.com>wrote:
> Hi Justin,
>
> sorry for the delay, I have been busy.
>
> Micah's proposal requires to move the definitions of the intrinsics
> from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
> thus allowing the generation of the file PTXGenIntrinsics.inc which
> will be included by PTXIntrinsicInfo.cpp.
> This is a quite big modification, do you agree with this ?
> Or do you have a better solution.
>
I'm opposed to this, mainly because we need the intrinsic definitions to be
defined during LLVM IR optimization and not just at code-gen time.  This is
particularly important for pure intrinsics, like llvm.ptx.read.tid.x(),
where the optimizers can fold multiple calls to these functions into a
single call.  Without the intrinsic definitions in
include/llvm/IntrinsicsPTX.td, this optimization would be illegal.

At the moment, I'm not seeing a clean solution to this.  Overloading the
intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a
partial solution, with the problems mentioned above.  In my mind, the
cleanest solution would be to just write out explicit intrinsics for each
possible type.  We can still use multiclasses to an extent:

multiclass PTXBinaryIntrinsic<string prefix> {
  def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty],
[InstrNoMem]>,
             GCCBuiltin<!strconcat(prefix, "_u16")>;
  // Repeat for s16, u32, s32, u64, s64, f32, f64
}

defm int_ptx_mad<"__builtin_ptx_mad">;

It's not the cleanest, but it gets the job done (unless I'm missing
something).

>
> Also I don't know yet how to make llvm recognize the intrinsics
> defined in lib/Target/PTX/PTXIntrinsics.td, the only other
> backend that does so is MBlaze.
>
> A tentative patch is attached.
>
> Bye,
> Alberto
>
> On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
> <justin.holewinski at gmail.com> wrote:
> >
> > On Nov 23, 2011 8:33 AM, "Justin Holewinski" <
> justin.holewinski at gmail.com>
> > wrote:
> >>
> >>
> >> On Nov 23, 2011 6:57 AM, "Alberto Magni"
<alberto.magni86 at gmail.com>
> >> wrote:
> >> >
> >> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <
> Micah.Villmow at amd.com>
> >> > wrote:
> >> > > Alberto,
> >> > >  The AMDIL backend solves your problem with intrinsic
overloading
> this
> >> > > way:
> >> > > def int_AMDIL_mad     :
GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
> >> > >
> >> > > Where TernaryIntFloat is defined as:
> >> > > class TernaryIntFloat :
> >> > >          Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,
> >> > >          LLVMMatchType<0>,
LLVMMatchType<0>], []>;
> >> > >
> >> > > This allows us to write a multi-def for int_AMDIL_mad
like so:
> >> > > defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD,
int_AMDIL_mad>;
> >> > >
> >> > > Where TernaryIntrinsicFloat is defined as:
> >> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode,
Intrinsic intr>
> >> > > {
> >> > >  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
> >> > >      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
> >> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
> >> > >      [(set GPRF32:$dst,
> >> > >          (intr GPRF32:$src, GPRF32:$src2,
GPRF32:$src3))]>;
> >> > >  def _v2f32 : ThreeInOneOut<opcode, (outs
GPRV2F32:$dst),
> >> > >      (ins GPRV2F32:$src, GPRV2F32:$src2,
GPRV2F32:$src3),
> >> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
> >> > >      [(set GPRV2F32:$dst,
> >> > >          (intr GPRV2F32:$src, GPRV2F32:$src2,
GPRV2F32:$src3))]>;
> >> > > ...
> >> > > }
> >> > >
> >> > > Now, this doesn't completely work, because LLVM does
not allow
> >> > > overloading of intrinsics values, so there needs to be a
little
> coding in
> >> > > *IntrinsicInfo class.
> >> > > AMD always encodes builtin names as __amdil_mad_f32,
> >> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
> >> > > So in the function
"*IntrinsicInfo::lookup_name", when attempting to
> >> > > find out what intrinsic the function maps to, the AMDIL
backend
> strips off
> >> > > the type, and then looks up for just
'__amdil_mad'.
> >> > >
> >> > > This is how you can do intrinsic overloading in LLVM.
> >> > >
> >> > > Hope this helps,
> >> > > Micah
> >> >
> >> > Thank you Micah, it really does.
> >> >
> >> > At the moment the PTX backend does not have a
PTXIntrinsicInfo class,
> >> > the only backend which does so is MBlaze.
> >> > If Justin agrees with the approach I will look on how to
generate the
> >> > PTXGenIntrinsics.inc file (I am still learning TableGen)
> >> > required by PTXIntrinsicInfo and write the lookUp method.
> >>
> >> Looks good to me.  For OpenCL support in clang, we definitely need
the
> >> built-in function support.  And the total number of intrinsics
like this
> >> should be relatively minimal.
> >
> > One thing I forgot to mention:  once these are implemented, it may be
> worth
> > implementing some instruction selection patterns to collapse icmp/fcmp
> and
> > select pairs into Max/min whenever it makes sense.
> >
> >>
> >> >
> >> > Cheers,
> >> >
> >> > Alberto
> >> >
> >> > >
> >> > >> -----Original Message-----
> >> > >> From: llvmdev-bounces at cs.uiuc.edu
> >> > >> [mailto:llvmdev-bounces at cs.uiuc.edu]
> >> > >> On Behalf Of Alberto Magni
> >> > >> Sent: Tuesday, November 22, 2011 8:41 AM
> >> > >> To: Justin Holewinski
> >> > >> Cc: LLVM Developers Mailing List
> >> > >> Subject: Re: [LLVMdev] PTX builtin functions.
> >> > >>
> >> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
> >> > >> <justin.holewinski at gmail.com> wrote:
> >> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
> >> > >> <alberto.magni86 at gmail.com>
> >> > >> > wrote:
> >> > >> >>
> >> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin
Holewinski
> >> > >> >> <justin.holewinski at gmail.com>
wrote:
> >> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM,
Alberto Magni
> >> > >> >> > <alberto.magni86 at gmail.com>
> >> > >> >> > wrote:
> >> > >> >> >>
> >> > >> >> >> Hi Justin,
> >> > >> >> >>
> >> > >> >> >> attached you find the patch for
the integer max instruction.
> >> > >> >> >> The multiclass PTX_INTRINSIC_INT3
in file
> >> > >> PTXIntrinsicInstrInfo.td
> >> > >> >> >> is almost an exact copy of 
PTX_INT3 in PTXInstrInfo.td,
> maybe
> >> > >> >> >> a modification of this class can
be defined in a separate
> file.
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > I'm copying llvmdev.  We should
keep discussions like this on
> >> > >> >> > the
> >> > >> list
> >> > >> >> > for
> >> > >> >> > the benefit of others.
> >> > >> >>
> >> > >> >> I always forget "Reply to All".
> >> > >> >>
> >> > >> >> > We can probably factor out a generic
description, or even just
> >> > >> >> > use
> >> > >> the
> >> > >> >> > PTX_INT3 multiclass directly.  The
PTXIntrinsicInstrInfo.td
> file
> >> > >> is
> >> > >> >> > included
> >> > >> >> > by PTXInstrInfo.td, so anything
defined in PTXInstrInfo.td is
> >> > >> available
> >> > >> >> > in
> >> > >> >> > PTXIntrinsicInstrInfo.td.
> >> > >> >>
> >> > >> >> I agree with you but my class
PTX_INTRINSIC_INT3 works with an
> >> > >> Intrinsic
> >> > >> >> and not with a SDNode, like PTX_INT3.
> >> > >> >> PTX_INTRINSIC_INT3 also requires the
presence of the type of
> >> > >> >> the immediate in the pattern, e.g. (i32
imm:$b).
> >> > >> >
> >> > >> >
> >> > >> > Alright, I'm fine with that.
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> Do you agree with this approach ?
> >> > >> >> >> Also, do you think that a class
like
> PTX_INTRINSIC_INT3_SIGNED
> >> > >> >> >> (a clone of PTX_INT3_SIGNED) is
required ?
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > Yes, I believe we should split these
into signed and unsigned
> >> > >> variants.
> >> > >> >> >  The
> >> > >> >> > results of max/min operations can
definitely be different
> >> > >> depending on
> >> > >> >> > whether the operands are signed or
unsigned.  Since this
> >> > >> information is
> >> > >> >> > not
> >> > >> >> > encoded in LLVM types, we may want to
create two versions for
> >> > >> >> > each
> >> > >> >> > integer
> >> > >> >> > type; something like:
> >> > >> >> >
> >> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
> >> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32,
i32)
> >> > >> >>
> >> > >> >> Yes, this the only way.
> >> > >> >
> >> > >> >
> >> > >> > A couple more comments:
> >> > >> >
> >> > >> > Please make sure to set
TargetPrefix="ptx" for the intrinsics
> >> > >> (probably best
> >> > >> > in the multiclass, see
PTXReadSpecialRegisterIntrinsic_r32)]
> >> > >>
> >> > >> Ok
> >> > >>
> >> > >> > I'm not sure how to define a GCCBuiltin for
an intrinsic that can
> >> > >> take
> >> > >> > multiple types, but it's probably worth
looking into so we can
> >> > >> > expose
> >> > >> this
> >> > >> > intrinsic to Clang.
> >> > >>
> >> > >> This could be an issue. I looked for something
similar in other
> >> > >> backends
> >> > >> and I found no previous examples. It may be worth to
ask on the ML
> >> > >> explicitly for this.
> >> > >> The only fallback that I see is to define explicitly
every
> intrinsic
> >> > >> for every data type,
> >> > >> but this would prevent the usage of the multiclass
for the
> definition
> >> > >> of the patterns.
> >> > >>
> >> > >>
> >> > >> Bye.
> >> > >>
> >> > >> >
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >
> >> > >> >> > Otherwise, the patch looks good.
> >> > >> >> >
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> Thanks,
> >> > >> >> >>
> >> > >> >> >> Alberto
> >> > >> >> >>
> >> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM,
Alberto Magni
> >> > >> >> >> <alberto.magni86 at
gmail.com> wrote:
> >> > >> >> >> > On Wed, Nov 16, 2011 at 2:17
PM, Justin Holewinski
> >> > >> >> >> > <justin.holewinski at
gmail.com> wrote:
> >> > >> >> >> >> On Wed, Nov 16, 2011 at
9:16 AM, Justin Holewinski
> >> > >> >> >> >> <justin.holewinski at
gmail.com> wrote:
> >> > >> >> >> >>>
> >> > >> >> >> >>> On Wed, Nov 16, 2011
at 8:05 AM, Alberto Magni
> >> > >> >> >> >>> <alberto.magni86
at gmail.com>
> >> > >> >> >> >>> wrote:
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Dear Justin,
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> I am trying to
add the support for some OpenCL builtin
> >> > >> functions
> >> > >> >> >> >>>> to
> >> > >> >> >> >>>> the PTX backend.
> >> > >> >> >> >>>> The attached file
represent the first stub of a patch
> for
> >> > >> the fmax
> >> > >> >> >> >>>> builtin function.
> >> > >> >> >> >>>
> >> > >> >> >> >>> First off, thanks for
helping to improve the PTX
> back-end!
> >> > >> >> >> >>> There are really two
main issues here.  First, OpenCL
> >> > >> >> >> >>> built-
> >> > >> in
> >> > >> >> >> >>> functions
> >> > >> >> >> >>> do not belong in the
PTX back-end.  These will be
> >> > >> >> >> >>> implemented
> >> > >> in
> >> > >> >> >> >>> the
> >> > >> >> >> >>> libclc
> >> > >> >> >> >>> library
(http://www.pcc.me.uk/~peter/libclc).  The
> back-end
> >> > >> will
> >> > >> >> >> >>> only
> >> > >> >> >> >>> implement PTX
intrinsics, which may be used by the OpenCL
> >> > >> built-in
> >> > >> >> >> >>> functions
> >> > >> >> >> >>> in libclc.  However,
this particular function (max)
> >> > >> corresponds to
> >> > >> >> >> >>> a
> >> > >> >> >> >>> PTX
> >> > >> >> >> >>> instruction, so it
makes sense to implement it as an
> >> > >> intrinsic in
> >> > >> >> >> >>> the
> >> > >> >> >> >>> back-end.
> >> > >> >> >> >>> Second, intrinsic
functions require a bit more work.
> >> > >> >> >> >>>  You're
> >> > >> off to
> >> > >> >> >> >>> a
> >> > >> >> >> >>> great start, but
intrinsics are implemented a bit
> >> > >> differently.  It
> >> > >> >> >> >>> looks
> >> > >> >> >> >>> like LLVM does not
have a max intrinsic, so we'll need to
> >> > >> create
> >> > >> >> >> >>> one.
> >> > >> >> >> >>>  Have
> >> > >> >> >> >>> a look at
include/llvm/IntrinsicsPTX.td.  This file
> defines
> >> > >> the
> >> > >> >> >> >>> PTX-specific
> >> > >> >> >> >>> intrinsics.  You can
add an intrinsic for max here, and
> >> > >> >> >> >>> then
> >> > >> >> >> >>> implement
> >> > >> >> >> >>> a
> >> > >> >> >> >>> pattern-match in the
PTXInstrInfo.td file.  There is no
> >> > >> >> >> >>> need
> >> > >> to
> >> > >> >> >> >>> create
> >> > >> >> >> >>> a new
> >> > >> >> >> >>> SDNode type for
intrinsics, unless they require some
> >> > >> >> >> >>> special
> >> > >> >> >> >>> handling
> >> > >> >> >> >>> in the
> >> > >> >> >> >>> C++ code, which I do
not see being the case here.
> >> > >> >> >> >>
> >> > >> >> >> >> Sorry, there's a typo
here.  The intrinsic pattern
> matching
> >> > >> goes in
> >> > >> >> >> >>
PTXInstrinsicInstrInfo.td.
> >> > >> >> >> >>
> >> > >> >> >> >
> >> > >> >> >> > Thank you for the pointers I
will let you know when I have
> >> > >> >> >> > the
> >> > >> first
> >> > >> >> >> > patch.
> >> > >> >> >> >
> >> > >> >> >> >>>
> >> > >> >> >> >>> When you define a new
intrinsic, use the following
> template
> >> > >> as a
> >> > >> >> >> >>> name:
> >> > >> >> >> >>> int_ptx_max.  This
will define the LLVM intrinsic as
> >> > >> >> >> >>> @llvm.ptx.max().
> >> > >> >> >> >>>  Please follow the
same convention when naming the
> >> > >> __builtin_*
> >> > >> >> >> >>> function.
> >> > >> >> >> >>>
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> The test case I
am trying is the following:
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> define ptx_device
float @f(float %x, float %y) {
> >> > >> >> >> >>>> entry:
> >> > >> >> >> >>>>  %z = call float
@fmax(float %x, float %y)
> >> > >> >> >> >>>>  ret float %z
> >> > >> >> >> >>>> }
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> declare float
@fmax(float, float)
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> But at the moment
llc crashes saying that "calls are not
> >> > >> >> >> >>>> supported",
> >> > >> >> >> >>>> this does not
> >> > >> >> >> >>>> happens with llvm
builtins like llvm.sqrt.f32
> >> > >> >> >> >>>
> >> > >> >> >> >>> Which version of LLVM
are you using?  Calls to PTX device
> >> > >> functions
> >> > >> >> >> >>> have
> >> > >> >> >> >>> been implemented for
a little while now, so I'm surprised
> >> > >> >> >> >>> to
> >> > >> see
> >> > >> >> >> >>> that
> >> > >> >> >> >>> error.
> >> > >> >> >> >>>  Perhaps it's
because the fmax function is not defined as
> >> > >> >> >> >>> ptx_device.
> >> > >> >> >> >>>
> >> > >> >> >> >
> >> > >> >> >> > This is the testcase that I
am using to verify I the max
> >> > >> builtin
> >> > >> >> >> > function I am impementing
> >> > >> >> >> > is actually recognised. I
took inspiration from the llvm-
> >> > >> intrinsic.ll
> >> > >> >> >> > test case.
> >> > >> >> >> > The command I am using to
compile is:
> >> > >> >> >> >
> >> > >> >> >> > llc -march=ptx32
-mattr=+ptx22 fmax.ll
> >> > >> >> >> >
> >> > >> >> >> > The option -mattr does not
seem to have any effect.
> >> > >> >> >> > I tried also with the
ptx_device qualifier with the same
> >> > >> outcome.
> >> > >> >> >> > I am using llvm from the svn
repository.
> >> > >> >> >> >
> >> > >> >> >> > Bye,
> >> > >> >> >> >
> >> > >> >> >> > Alberto
> >> > >> >> >> >
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Can you please
give me a hint on what I am missing, or
> >> > >> >> >> >>>> some
> >> > >> >> >> >>>> general
> >> > >> >> >> >>>> advice on how
> >> > >> >> >> >>>> to add builtin
functions.
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Thank you in
advance,
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Alberto.
> >> > >> >> >> >>>>
> >> > >> >> >> >>>>
_______________________________________________
> >> > >> >> >> >>>> LLVM Developers
mailing list
> >> > >> >> >> >>>> LLVMdev at
cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> > >> >> >> >>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> > >> >> >> >>>>
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>> --
> >> > >> >> >> >>>
> >> > >> >> >> >>> Thanks,
> >> > >> >> >> >>> Justin Holewinski
> >> > >> >> >> >>
> >> > >> >> >> >>
> >> > >> >> >> >>
> >> > >> >> >> >> --
> >> > >> >> >> >>
> >> > >> >> >> >> Thanks,
> >> > >> >> >> >> Justin Holewinski
> >> > >> >> >> >>
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > --
> >> > >> >> >
> >> > >> >> > Thanks,
> >> > >> >> >
> >> > >> >> > Justin Holewinski
> >> > >> >> >
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > --
> >> > >> >
> >> > >> > Thanks,
> >> > >> >
> >> > >> > Justin Holewinski
> >> > >> >
> >> > >>
> >> > >> _______________________________________________
> >> > >> LLVM Developers mailing list
> >> > >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> > >
> >> > >
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111205/5621a5eb/attachment.html>

Villmow, Micah

2011-Dec-08 16:36 UTC

head link

[LLVMdev] PTX builtin functions.

It is my understanding that all you need to do is specify let isTarget = 1 in
your .td file and it will generate target specific intrinsics. This should allow
you to keep the IntrinsicsPTX.td file in the same location.

Micah

From: Justin Holewinski [mailto:justin.holewinski at gmail.com]
Sent: Monday, December 05, 2011 6:13 AM
To: Alberto Magni
Cc: Villmow, Micah; LLVM Developers Mailing List
Subject: Re: [LLVMdev] PTX builtin functions.

On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
Hi Justin,

sorry for the delay, I have been busy.

Micah's proposal requires to move the definitions of the intrinsics
from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
thus allowing the generation of the file PTXGenIntrinsics.inc which
will be included by PTXIntrinsicInfo.cpp.
This is a quite big modification, do you agree with this ?
Or do you have a better solution.

I'm opposed to this, mainly because we need the intrinsic definitions to be
defined during LLVM IR optimization and not just at code-gen time.  This is
particularly important for pure intrinsics, like llvm.ptx.read.tid.x(), where
the optimizers can fold multiple calls to these functions into a single call. 
Without the intrinsic definitions in include/llvm/IntrinsicsPTX.td, this
optimization would be illegal.

At the moment, I'm not seeing a clean solution to this.  Overloading the
intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a partial
solution, with the problems mentioned above.  In my mind, the cleanest solution
would be to just write out explicit intrinsics for each possible type.  We can
still use multiclasses to an extent:

multiclass PTXBinaryIntrinsic<string prefix> {
  def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty],
[InstrNoMem]>,
             GCCBuiltin<!strconcat(prefix, "_u16")>;
  // Repeat for s16, u32, s32, u64, s64, f32, f64
}

defm int_ptx_mad<"__builtin_ptx_mad">;

It's not the cleanest, but it gets the job done (unless I'm missing
something).


Also I don't know yet how to make llvm recognize the intrinsics
defined in lib/Target/PTX/PTXIntrinsics.td, the only other
backend that does so is MBlaze.

A tentative patch is attached.

Bye,
Alberto

On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
<justin.holewinski at gmail.com<mailto:justin.holewinski at
gmail.com>> wrote:>
> On Nov 23, 2011 8:33 AM, "Justin Holewinski"
<justin.holewinski at gmail.com<mailto:justin.holewinski at
gmail.com>>
> wrote:
>>
>>
>> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86
at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> wrote:
>> >
>> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow
at amd.com<mailto:Micah.Villmow at amd.com>>
>> > wrote:
>> > > Alberto,
>> > >  The AMDIL backend solves your problem with intrinsic
overloading this
>> > > way:
>> > > def int_AMDIL_mad     :
GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>> > >
>> > > Where TernaryIntFloat is defined as:
>> > > class TernaryIntFloat :
>> > >          Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,
>> > >          LLVMMatchType<0>, LLVMMatchType<0>],
[]>;
>> > >
>> > > This allows us to write a multi-def for int_AMDIL_mad like
so:
>> > > defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD,
int_AMDIL_mad>;
>> > >
>> > > Where TernaryIntrinsicFloat is defined as:
>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode,
Intrinsic intr>
>> > > {
>> > >  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>> > >      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > >      [(set GPRF32:$dst,
>> > >          (intr GPRF32:$src, GPRF32:$src2,
GPRF32:$src3))]>;
>> > >  def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>> > >      (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>> > >      !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > >      [(set GPRV2F32:$dst,
>> > >          (intr GPRV2F32:$src, GPRV2F32:$src2,
GPRV2F32:$src3))]>;
>> > > ...
>> > > }
>> > >
>> > > Now, this doesn't completely work, because LLVM does not
allow
>> > > overloading of intrinsics values, so there needs to be a
little coding in
>> > > *IntrinsicInfo class.
>> > > AMD always encodes builtin names as __amdil_mad_f32,
>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>> > > So in the function "*IntrinsicInfo::lookup_name",
when attempting to
>> > > find out what intrinsic the function maps to, the AMDIL
backend strips off
>> > > the type, and then looks up for just '__amdil_mad'.
>> > >
>> > > This is how you can do intrinsic overloading in LLVM.
>> > >
>> > > Hope this helps,
>> > > Micah
>> >
>> > Thank you Micah, it really does.
>> >
>> > At the moment the PTX backend does not have a PTXIntrinsicInfo
class,
>> > the only backend which does so is MBlaze.
>> > If Justin agrees with the approach I will look on how to generate
the
>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>> > required by PTXIntrinsicInfo and write the lookUp method.
>>
>> Looks good to me.  For OpenCL support in clang, we definitely need the
>> built-in function support.  And the total number of intrinsics like
this
>> should be relatively minimal.
>
> One thing I forgot to mention:  once these are implemented, it may be worth
> implementing some instruction selection patterns to collapse icmp/fcmp and
> select pairs into Max/min whenever it makes sense.
>
>>
>> >
>> > Cheers,
>> >
>> > Alberto
>> >
>> > >
>> > >> -----Original Message-----
>> > >> From: llvmdev-bounces at
cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>
>> > >> [mailto:llvmdev-bounces at
cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>]
>> > >> On Behalf Of Alberto Magni
>> > >> Sent: Tuesday, November 22, 2011 8:41 AM
>> > >> To: Justin Holewinski
>> > >> Cc: LLVM Developers Mailing List
>> > >> Subject: Re: [LLVMdev] PTX builtin functions.
>> > >>
>> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>> > >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>> > >> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> > wrote:
>> > >> >>
>> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin
Holewinski
>> > >> >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto
Magni
>> > >> >> > <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> > wrote:
>> > >> >> >>
>> > >> >> >> Hi Justin,
>> > >> >> >>
>> > >> >> >> attached you find the patch for the
integer max instruction.
>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in
file
>> > >> PTXIntrinsicInstrInfo.td
>> > >> >> >> is almost an exact copy of  PTX_INT3 in
PTXInstrInfo.td, maybe
>> > >> >> >> a modification of this class can be
defined in a separate file.
>> > >> >> >
>> > >> >> >
>> > >> >> > I'm copying llvmdev.  We should keep
discussions like this on
>> > >> >> > the
>> > >> list
>> > >> >> > for
>> > >> >> > the benefit of others.
>> > >> >>
>> > >> >> I always forget "Reply to All".
>> > >> >>
>> > >> >> > We can probably factor out a generic
description, or even just
>> > >> >> > use
>> > >> the
>> > >> >> > PTX_INT3 multiclass directly.  The
PTXIntrinsicInstrInfo.td file
>> > >> is
>> > >> >> > included
>> > >> >> > by PTXInstrInfo.td, so anything defined in
PTXInstrInfo.td is
>> > >> available
>> > >> >> > in
>> > >> >> > PTXIntrinsicInstrInfo.td.
>> > >> >>
>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3
works with an
>> > >> Intrinsic
>> > >> >> and not with a SDNode, like PTX_INT3.
>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of
the type of
>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> > >> >
>> > >> >
>> > >> > Alright, I'm fine with that.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Do you agree with this approach ?
>> > >> >> >> Also, do you think that a class like
PTX_INTRINSIC_INT3_SIGNED
>> > >> >> >> (a clone of PTX_INT3_SIGNED) is
required ?
>> > >> >> >
>> > >> >> >
>> > >> >> > Yes, I believe we should split these into
signed and unsigned
>> > >> variants.
>> > >> >> >  The
>> > >> >> > results of max/min operations can
definitely be different
>> > >> depending on
>> > >> >> > whether the operands are signed or
unsigned.  Since this
>> > >> information is
>> > >> >> > not
>> > >> >> > encoded in LLVM types, we may want to
create two versions for
>> > >> >> > each
>> > >> >> > integer
>> > >> >> > type; something like:
>> > >> >> >
>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> > >> >>
>> > >> >> Yes, this the only way.
>> > >> >
>> > >> >
>> > >> > A couple more comments:
>> > >> >
>> > >> > Please make sure to set TargetPrefix="ptx"
for the intrinsics
>> > >> (probably best
>> > >> > in the multiclass, see
PTXReadSpecialRegisterIntrinsic_r32)]
>> > >>
>> > >> Ok
>> > >>
>> > >> > I'm not sure how to define a GCCBuiltin for an
intrinsic that can
>> > >> take
>> > >> > multiple types, but it's probably worth looking
into so we can
>> > >> > expose
>> > >> this
>> > >> > intrinsic to Clang.
>> > >>
>> > >> This could be an issue. I looked for something similar in
other
>> > >> backends
>> > >> and I found no previous examples. It may be worth to ask
on the ML
>> > >> explicitly for this.
>> > >> The only fallback that I see is to define explicitly
every intrinsic
>> > >> for every data type,
>> > >> but this would prevent the usage of the multiclass for
the definition
>> > >> of the patterns.
>> > >>
>> > >>
>> > >> Bye.
>> > >>
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >
>> > >> >> > Otherwise, the patch looks good.
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >>
>> > >> >> >> Alberto
>> > >> >> >>
>> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM,
Alberto Magni
>> > >> >> >> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
>> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM,
Justin Holewinski
>> > >> >> >> > <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16
AM, Justin Holewinski
>> > >> >> >> >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, Nov 16, 2011 at
8:05 AM, Alberto Magni
>> > >> >> >> >>> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>> Dear Justin,
>> > >> >> >> >>>>
>> > >> >> >> >>>> I am trying to add the
support for some OpenCL builtin
>> > >> functions
>> > >> >> >> >>>> to
>> > >> >> >> >>>> the PTX backend.
>> > >> >> >> >>>> The attached file
represent the first stub of a patch for
>> > >> the fmax
>> > >> >> >> >>>> builtin function.
>> > >> >> >> >>>
>> > >> >> >> >>> First off, thanks for
helping to improve the PTX back-end!
>> > >> >> >> >>> There are really two main
issues here.  First, OpenCL
>> > >> >> >> >>> built-
>> > >> in
>> > >> >> >> >>> functions
>> > >> >> >> >>> do not belong in the PTX
back-end.  These will be
>> > >> >> >> >>> implemented
>> > >> in
>> > >> >> >> >>> the
>> > >> >> >> >>> libclc
>> > >> >> >> >>> library
(http://www.pcc.me.uk/~peter/libclc).  The back-end
>> > >> will
>> > >> >> >> >>> only
>> > >> >> >> >>> implement PTX intrinsics,
which may be used by the OpenCL
>> > >> built-in
>> > >> >> >> >>> functions
>> > >> >> >> >>> in libclc.  However, this
particular function (max)
>> > >> corresponds to
>> > >> >> >> >>> a
>> > >> >> >> >>> PTX
>> > >> >> >> >>> instruction, so it makes
sense to implement it as an
>> > >> intrinsic in
>> > >> >> >> >>> the
>> > >> >> >> >>> back-end.
>> > >> >> >> >>> Second, intrinsic
functions require a bit more work.
>> > >> >> >> >>>  You're
>> > >> off to
>> > >> >> >> >>> a
>> > >> >> >> >>> great start, but
intrinsics are implemented a bit
>> > >> differently.  It
>> > >> >> >> >>> looks
>> > >> >> >> >>> like LLVM does not have a
max intrinsic, so we'll need to
>> > >> create
>> > >> >> >> >>> one.
>> > >> >> >> >>>  Have
>> > >> >> >> >>> a look at
include/llvm/IntrinsicsPTX.td.  This file defines
>> > >> the
>> > >> >> >> >>> PTX-specific
>> > >> >> >> >>> intrinsics.  You can add
an intrinsic for max here, and
>> > >> >> >> >>> then
>> > >> >> >> >>> implement
>> > >> >> >> >>> a
>> > >> >> >> >>> pattern-match in the
PTXInstrInfo.td file.  There is no
>> > >> >> >> >>> need
>> > >> to
>> > >> >> >> >>> create
>> > >> >> >> >>> a new
>> > >> >> >> >>> SDNode type for
intrinsics, unless they require some
>> > >> >> >> >>> special
>> > >> >> >> >>> handling
>> > >> >> >> >>> in the
>> > >> >> >> >>> C++ code, which I do not
see being the case here.
>> > >> >> >> >>
>> > >> >> >> >> Sorry, there's a typo
here.  The intrinsic pattern matching
>> > >> goes in
>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >> > Thank you for the pointers I will
let you know when I have
>> > >> >> >> > the
>> > >> first
>> > >> >> >> > patch.
>> > >> >> >> >
>> > >> >> >> >>>
>> > >> >> >> >>> When you define a new
intrinsic, use the following template
>> > >> as a
>> > >> >> >> >>> name:
>> > >> >> >> >>> int_ptx_max.  This will
define the LLVM intrinsic as
>> > >> >> >> >>> @llvm.ptx.max().
>> > >> >> >> >>>  Please follow the same
convention when naming the
>> > >> __builtin_*
>> > >> >> >> >>> function.
>> > >> >> >> >>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> The test case I am
trying is the following:
>> > >> >> >> >>>>
>> > >> >> >> >>>> define ptx_device
float @f(float %x, float %y) {
>> > >> >> >> >>>> entry:
>> > >> >> >> >>>>  %z = call float
@fmax(float %x, float %y)
>> > >> >> >> >>>>  ret float %z
>> > >> >> >> >>>> }
>> > >> >> >> >>>>
>> > >> >> >> >>>> declare float
@fmax(float, float)
>> > >> >> >> >>>>
>> > >> >> >> >>>> But at the moment llc
crashes saying that "calls are not
>> > >> >> >> >>>> supported",
>> > >> >> >> >>>> this does not
>> > >> >> >> >>>> happens with llvm
builtins like llvm.sqrt.f32
>> > >> >> >> >>>
>> > >> >> >> >>> Which version of LLVM are
you using?  Calls to PTX device
>> > >> functions
>> > >> >> >> >>> have
>> > >> >> >> >>> been implemented for a
little while now, so I'm surprised
>> > >> >> >> >>> to
>> > >> see
>> > >> >> >> >>> that
>> > >> >> >> >>> error.
>> > >> >> >> >>>  Perhaps it's because
the fmax function is not defined as
>> > >> >> >> >>> ptx_device.
>> > >> >> >> >>>
>> > >> >> >> >
>> > >> >> >> > This is the testcase that I am
using to verify I the max
>> > >> builtin
>> > >> >> >> > function I am impementing
>> > >> >> >> > is actually recognised. I took
inspiration from the llvm-
>> > >> intrinsic.ll
>> > >> >> >> > test case.
>> > >> >> >> > The command I am using to compile
is:
>> > >> >> >> >
>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22
fmax.ll
>> > >> >> >> >
>> > >> >> >> > The option -mattr does not seem to
have any effect.
>> > >> >> >> > I tried also with the ptx_device
qualifier with the same
>> > >> outcome.
>> > >> >> >> > I am using llvm from the svn
repository.
>> > >> >> >> >
>> > >> >> >> > Bye,
>> > >> >> >> >
>> > >> >> >> > Alberto
>> > >> >> >> >
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you please give me
a hint on what I am missing, or
>> > >> >> >> >>>> some
>> > >> >> >> >>>> general
>> > >> >> >> >>>> advice on how
>> > >> >> >> >>>> to add builtin
functions.
>> > >> >> >> >>>>
>> > >> >> >> >>>> Thank you in advance,
>> > >> >> >> >>>>
>> > >> >> >> >>>> Alberto.
>> > >> >> >> >>>>
>> > >> >> >> >>>>
_______________________________________________
>> > >> >> >> >>>> LLVM Developers
mailing list
>> > >> >> >> >>>> LLVMdev at
cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu
>> > >> >> >> >>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> --
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>
>> > >> >> >> >> Thanks,
>> > >> >> >> >> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> >
>> > >> >> > Justin Holewinski
>> > >> >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Justin Holewinski
>> > >> >
>> > >>
>> > >> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at
cs.uiuc.edu>         http://llvm.cs.uiuc.edu
>> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >
>> > >


--
Thanks,

Justin Holewinski

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/29904c6f/attachment.html>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Dec 2011 - [LLVMdev] PTX builtin functions.

[LLVMdev] PTX builtin functions.

[LLVMdev] PTX builtin functions.

[LLVMdev] PTX builtin functions.

Seemingly Similar Threads