On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com> wrote:> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com>wrote:> > Alberto, > > The AMDIL backend solves your problem with intrinsic overloading thisway:> > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat; > > > > Where TernaryIntFloat is defined as: > > class TernaryIntFloat : > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, > > LLVMMatchType<0>, LLVMMatchType<0>], []>; > > > > This allows us to write a multi-def for int_AMDIL_mad like so: > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>; > > > > Where TernaryIntrinsicFloat is defined as: > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr> > > { > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst), > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3), > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > > [(set GPRF32:$dst, > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>; > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst), > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3), > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > > [(set GPRV2F32:$dst, > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>; > > ... > > } > > > > Now, this doesn't completely work, because LLVM does not allowoverloading of intrinsics values, so there needs to be a little coding in *IntrinsicInfo class.> > AMD always encodes builtin names as __amdil_mad_f32, __amdil_mad_v2f32,__amdil_mad_v4f32, etc....> > So in the function "*IntrinsicInfo::lookup_name", when attempting tofind out what intrinsic the function maps to, the AMDIL backend strips off the type, and then looks up for just '__amdil_mad'.> > > > This is how you can do intrinsic overloading in LLVM. > > > > Hope this helps, > > Micah > > Thank you Micah, it really does. > > At the moment the PTX backend does not have a PTXIntrinsicInfo class, > the only backend which does so is MBlaze. > If Justin agrees with the approach I will look on how to generate the > PTXGenIntrinsics.inc file (I am still learning TableGen) > required by PTXIntrinsicInfo and write the lookUp method.Looks good to me. For OpenCL support in clang, we definitely need the built-in function support. And the total number of intrinsics like this should be relatively minimal.> > Cheers, > > Alberto > > > > >> -----Original Message----- > >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > >> On Behalf Of Alberto Magni > >> Sent: Tuesday, November 22, 2011 8:41 AM > >> To: Justin Holewinski > >> Cc: LLVM Developers Mailing List > >> Subject: Re: [LLVMdev] PTX builtin functions. > >> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski > >> <justin.holewinski at gmail.com> wrote: > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni > >> <alberto.magni86 at gmail.com> > >> > wrote: > >> >> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski > >> >> <justin.holewinski at gmail.com> wrote: > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni > >> >> > <alberto.magni86 at gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> Hi Justin, > >> >> >> > >> >> >> attached you find the patch for the integer max instruction. > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file > >> PTXIntrinsicInstrInfo.td > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe > >> >> >> a modification of this class can be defined in a separate file. > >> >> > > >> >> > > >> >> > I'm copying llvmdev. We should keep discussions like this on the > >> list > >> >> > for > >> >> > the benefit of others. > >> >> > >> >> I always forget "Reply to All". > >> >> > >> >> > We can probably factor out a generic description, or even just use > >> the > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file > >> is > >> >> > included > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is > >> available > >> >> > in > >> >> > PTXIntrinsicInstrInfo.td. > >> >> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an > >> Intrinsic > >> >> and not with a SDNode, like PTX_INT3. > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of > >> >> the immediate in the pattern, e.g. (i32 imm:$b). > >> > > >> > > >> > Alright, I'm fine with that. > >> > > >> >> > >> >> > >> >> >> > >> >> >> > >> >> >> Do you agree with this approach ? > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED > >> >> >> (a clone of PTX_INT3_SIGNED) is required ? > >> >> > > >> >> > > >> >> > Yes, I believe we should split these into signed and unsigned > >> variants. > >> >> > The > >> >> > results of max/min operations can definitely be different > >> depending on > >> >> > whether the operands are signed or unsigned. Since this > >> information is > >> >> > not > >> >> > encoded in LLVM types, we may want to create two versions for each > >> >> > integer > >> >> > type; something like: > >> >> > > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32) > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32) > >> >> > >> >> Yes, this the only way. > >> > > >> > > >> > A couple more comments: > >> > > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics > >> (probably best > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)] > >> > >> Ok > >> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can > >> take > >> > multiple types, but it's probably worth looking into so we can expose > >> this > >> > intrinsic to Clang. > >> > >> This could be an issue. I looked for something similar in other > >> backends > >> and I found no previous examples. It may be worth to ask on the ML > >> explicitly for this. > >> The only fallback that I see is to define explicitly every intrinsic > >> for every data type, > >> but this would prevent the usage of the multiclass for the definition > >> of the patterns. > >> > >> > >> Bye. > >> > >> > > >> > > >> >> > >> >> > >> >> > > >> >> > Otherwise, the patch looks good. > >> >> > > >> >> >> > >> >> >> > >> >> >> Thanks, > >> >> >> > >> >> >> Alberto > >> >> >> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni > >> >> >> <alberto.magni86 at gmail.com> wrote: > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski > >> >> >> > <justin.holewinski at gmail.com> wrote: > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski > >> >> >> >> <justin.holewinski at gmail.com> wrote: > >> >> >> >>> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni > >> >> >> >>> <alberto.magni86 at gmail.com> > >> >> >> >>> wrote: > >> >> >> >>>> > >> >> >> >>>> Dear Justin, > >> >> >> >>>> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin > >> functions > >> >> >> >>>> to > >> >> >> >>>> the PTX backend. > >> >> >> >>>> The attached file represent the first stub of a patch for > >> the fmax > >> >> >> >>>> builtin function. > >> >> >> >>> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end! > >> >> >> >>> There are really two main issues here. First, OpenCL built- > >> in > >> >> >> >>> functions > >> >> >> >>> do not belong in the PTX back-end. These will be implemented > >> in > >> >> >> >>> the > >> >> >> >>> libclc > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end > >> will > >> >> >> >>> only > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL > >> built-in > >> >> >> >>> functions > >> >> >> >>> in libclc. However, this particular function (max) > >> corresponds to > >> >> >> >>> a > >> >> >> >>> PTX > >> >> >> >>> instruction, so it makes sense to implement it as an > >> intrinsic in > >> >> >> >>> the > >> >> >> >>> back-end. > >> >> >> >>> Second, intrinsic functions require a bit more work. You're > >> off to > >> >> >> >>> a > >> >> >> >>> great start, but intrinsics are implemented a bit > >> differently. It > >> >> >> >>> looks > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to > >> create > >> >> >> >>> one. > >> >> >> >>> Have > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines > >> the > >> >> >> >>> PTX-specific > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and then > >> >> >> >>> implement > >> >> >> >>> a > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no need > >> to > >> >> >> >>> create > >> >> >> >>> a new > >> >> >> >>> SDNode type for intrinsics, unless they require some special > >> >> >> >>> handling > >> >> >> >>> in the > >> >> >> >>> C++ code, which I do not see being the case here. > >> >> >> >> > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern matching > >> goes in > >> >> >> >> PTXInstrinsicInstrInfo.td. > >> >> >> >> > >> >> >> > > >> >> >> > Thank you for the pointers I will let you know when I have the > >> first > >> >> >> > patch. > >> >> >> > > >> >> >> >>> > >> >> >> >>> When you define a new intrinsic, use the following template > >> as a > >> >> >> >>> name: > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as > >> >> >> >>> @llvm.ptx.max(). > >> >> >> >>> Please follow the same convention when naming the > >> __builtin_* > >> >> >> >>> function. > >> >> >> >>> > >> >> >> >>>> > >> >> >> >>>> The test case I am trying is the following: > >> >> >> >>>> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { > >> >> >> >>>> entry: > >> >> >> >>>> %z = call float @fmax(float %x, float %y) > >> >> >> >>>> ret float %z > >> >> >> >>>> } > >> >> >> >>>> > >> >> >> >>>> declare float @fmax(float, float) > >> >> >> >>>> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not > >> >> >> >>>> supported", > >> >> >> >>>> this does not > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32 > >> >> >> >>> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device > >> functions > >> >> >> >>> have > >> >> >> >>> been implemented for a little while now, so I'm surprised to > >> see > >> >> >> >>> that > >> >> >> >>> error. > >> >> >> >>> Perhaps it's because the fmax function is not defined as > >> >> >> >>> ptx_device. > >> >> >> >>> > >> >> >> > > >> >> >> > This is the testcase that I am using to verify I the max > >> builtin > >> >> >> > function I am impementing > >> >> >> > is actually recognised. I took inspiration from the llvm- > >> intrinsic.ll > >> >> >> > test case. > >> >> >> > The command I am using to compile is: > >> >> >> > > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll > >> >> >> > > >> >> >> > The option -mattr does not seem to have any effect. > >> >> >> > I tried also with the ptx_device qualifier with the same > >> outcome. > >> >> >> > I am using llvm from the svn repository. > >> >> >> > > >> >> >> > Bye, > >> >> >> > > >> >> >> > Alberto > >> >> >> > > >> >> >> >>>> > >> >> >> >>>> Can you please give me a hint on what I am missing, or some > >> >> >> >>>> general > >> >> >> >>>> advice on how > >> >> >> >>>> to add builtin functions. > >> >> >> >>>> > >> >> >> >>>> Thank you in advance, > >> >> >> >>>> > >> >> >> >>>> Alberto. > >> >> >> >>>> > >> >> >> >>>> _______________________________________________ > >> >> >> >>>> LLVM Developers mailing list > >> >> >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> >> >> >>>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> -- > >> >> >> >>> > >> >> >> >>> Thanks, > >> >> >> >>> Justin Holewinski > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> Thanks, > >> >> >> >> Justin Holewinski > >> >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Justin Holewinski > >> >> > > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > Thanks, > >> > > >> > Justin Holewinski > >> > > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111123/843e0541/attachment.html>
On Nov 23, 2011 8:33 AM, "Justin Holewinski" <justin.holewinski at gmail.com> wrote:> > > On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com>wrote:> > > > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com>wrote:> > > Alberto, > > > The AMDIL backend solves your problem with intrinsic overloadingthis way:> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat; > > > > > > Where TernaryIntFloat is defined as: > > > class TernaryIntFloat : > > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, > > > LLVMMatchType<0>, LLVMMatchType<0>], []>; > > > > > > This allows us to write a multi-def for int_AMDIL_mad like so: > > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>; > > > > > > Where TernaryIntrinsicFloat is defined as: > > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr> > > > { > > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst), > > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3), > > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > > > [(set GPRF32:$dst, > > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>; > > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst), > > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3), > > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > > > [(set GPRV2F32:$dst, > > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>; > > > ... > > > } > > > > > > Now, this doesn't completely work, because LLVM does not allowoverloading of intrinsics values, so there needs to be a little coding in *IntrinsicInfo class.> > > AMD always encodes builtin names as __amdil_mad_f32,__amdil_mad_v2f32, __amdil_mad_v4f32, etc....> > > So in the function "*IntrinsicInfo::lookup_name", when attempting tofind out what intrinsic the function maps to, the AMDIL backend strips off the type, and then looks up for just '__amdil_mad'.> > > > > > This is how you can do intrinsic overloading in LLVM. > > > > > > Hope this helps, > > > Micah > > > > Thank you Micah, it really does. > > > > At the moment the PTX backend does not have a PTXIntrinsicInfo class, > > the only backend which does so is MBlaze. > > If Justin agrees with the approach I will look on how to generate the > > PTXGenIntrinsics.inc file (I am still learning TableGen) > > required by PTXIntrinsicInfo and write the lookUp method. > > Looks good to me. For OpenCL support in clang, we definitely need thebuilt-in function support. And the total number of intrinsics like this should be relatively minimal. One thing I forgot to mention: once these are implemented, it may be worth implementing some instruction selection patterns to collapse icmp/fcmp and select pairs into Max/min whenever it makes sense.> > > > > Cheers, > > > > Alberto > > > > > > > >> -----Original Message----- > > >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]> > >> On Behalf Of Alberto Magni > > >> Sent: Tuesday, November 22, 2011 8:41 AM > > >> To: Justin Holewinski > > >> Cc: LLVM Developers Mailing List > > >> Subject: Re: [LLVMdev] PTX builtin functions. > > >> > > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski > > >> <justin.holewinski at gmail.com> wrote: > > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni > > >> <alberto.magni86 at gmail.com> > > >> > wrote: > > >> >> > > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski > > >> >> <justin.holewinski at gmail.com> wrote: > > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni > > >> >> > <alberto.magni86 at gmail.com> > > >> >> > wrote: > > >> >> >> > > >> >> >> Hi Justin, > > >> >> >> > > >> >> >> attached you find the patch for the integer max instruction. > > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file > > >> PTXIntrinsicInstrInfo.td > > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe > > >> >> >> a modification of this class can be defined in a separate file. > > >> >> > > > >> >> > > > >> >> > I'm copying llvmdev. We should keep discussions like this onthe> > >> list > > >> >> > for > > >> >> > the benefit of others. > > >> >> > > >> >> I always forget "Reply to All". > > >> >> > > >> >> > We can probably factor out a generic description, or even justuse> > >> the > > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file > > >> is > > >> >> > included > > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is > > >> available > > >> >> > in > > >> >> > PTXIntrinsicInstrInfo.td. > > >> >> > > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an > > >> Intrinsic > > >> >> and not with a SDNode, like PTX_INT3. > > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of > > >> >> the immediate in the pattern, e.g. (i32 imm:$b). > > >> > > > >> > > > >> > Alright, I'm fine with that. > > >> > > > >> >> > > >> >> > > >> >> >> > > >> >> >> > > >> >> >> Do you agree with this approach ? > > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED > > >> >> >> (a clone of PTX_INT3_SIGNED) is required ? > > >> >> > > > >> >> > > > >> >> > Yes, I believe we should split these into signed and unsigned > > >> variants. > > >> >> > The > > >> >> > results of max/min operations can definitely be different > > >> depending on > > >> >> > whether the operands are signed or unsigned. Since this > > >> information is > > >> >> > not > > >> >> > encoded in LLVM types, we may want to create two versions foreach> > >> >> > integer > > >> >> > type; something like: > > >> >> > > > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32) > > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32) > > >> >> > > >> >> Yes, this the only way. > > >> > > > >> > > > >> > A couple more comments: > > >> > > > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics > > >> (probably best > > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)] > > >> > > >> Ok > > >> > > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can > > >> take > > >> > multiple types, but it's probably worth looking into so we canexpose> > >> this > > >> > intrinsic to Clang. > > >> > > >> This could be an issue. I looked for something similar in other > > >> backends > > >> and I found no previous examples. It may be worth to ask on the ML > > >> explicitly for this. > > >> The only fallback that I see is to define explicitly every intrinsic > > >> for every data type, > > >> but this would prevent the usage of the multiclass for the definition > > >> of the patterns. > > >> > > >> > > >> Bye. > > >> > > >> > > > >> > > > >> >> > > >> >> > > >> >> > > > >> >> > Otherwise, the patch looks good. > > >> >> > > > >> >> >> > > >> >> >> > > >> >> >> Thanks, > > >> >> >> > > >> >> >> Alberto > > >> >> >> > > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni > > >> >> >> <alberto.magni86 at gmail.com> wrote: > > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski > > >> >> >> > <justin.holewinski at gmail.com> wrote: > > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski > > >> >> >> >> <justin.holewinski at gmail.com> wrote: > > >> >> >> >>> > > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni > > >> >> >> >>> <alberto.magni86 at gmail.com> > > >> >> >> >>> wrote: > > >> >> >> >>>> > > >> >> >> >>>> Dear Justin, > > >> >> >> >>>> > > >> >> >> >>>> I am trying to add the support for some OpenCL builtin > > >> functions > > >> >> >> >>>> to > > >> >> >> >>>> the PTX backend. > > >> >> >> >>>> The attached file represent the first stub of a patch for > > >> the fmax > > >> >> >> >>>> builtin function. > > >> >> >> >>> > > >> >> >> >>> First off, thanks for helping to improve the PTX back-end! > > >> >> >> >>> There are really two main issues here. First, OpenCLbuilt-> > >> in > > >> >> >> >>> functions > > >> >> >> >>> do not belong in the PTX back-end. These will beimplemented> > >> in > > >> >> >> >>> the > > >> >> >> >>> libclc > > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end > > >> will > > >> >> >> >>> only > > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL > > >> built-in > > >> >> >> >>> functions > > >> >> >> >>> in libclc. However, this particular function (max) > > >> corresponds to > > >> >> >> >>> a > > >> >> >> >>> PTX > > >> >> >> >>> instruction, so it makes sense to implement it as an > > >> intrinsic in > > >> >> >> >>> the > > >> >> >> >>> back-end. > > >> >> >> >>> Second, intrinsic functions require a bit more work.You're> > >> off to > > >> >> >> >>> a > > >> >> >> >>> great start, but intrinsics are implemented a bit > > >> differently. It > > >> >> >> >>> looks > > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to > > >> create > > >> >> >> >>> one. > > >> >> >> >>> Have > > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines > > >> the > > >> >> >> >>> PTX-specific > > >> >> >> >>> intrinsics. You can add an intrinsic for max here, andthen> > >> >> >> >>> implement > > >> >> >> >>> a > > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is noneed> > >> to > > >> >> >> >>> create > > >> >> >> >>> a new > > >> >> >> >>> SDNode type for intrinsics, unless they require somespecial> > >> >> >> >>> handling > > >> >> >> >>> in the > > >> >> >> >>> C++ code, which I do not see being the case here. > > >> >> >> >> > > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern matching > > >> goes in > > >> >> >> >> PTXInstrinsicInstrInfo.td. > > >> >> >> >> > > >> >> >> > > > >> >> >> > Thank you for the pointers I will let you know when I havethe> > >> first > > >> >> >> > patch. > > >> >> >> > > > >> >> >> >>> > > >> >> >> >>> When you define a new intrinsic, use the following template > > >> as a > > >> >> >> >>> name: > > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as > > >> >> >> >>> @llvm.ptx.max(). > > >> >> >> >>> Please follow the same convention when naming the > > >> __builtin_* > > >> >> >> >>> function. > > >> >> >> >>> > > >> >> >> >>>> > > >> >> >> >>>> The test case I am trying is the following: > > >> >> >> >>>> > > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { > > >> >> >> >>>> entry: > > >> >> >> >>>> %z = call float @fmax(float %x, float %y) > > >> >> >> >>>> ret float %z > > >> >> >> >>>> } > > >> >> >> >>>> > > >> >> >> >>>> declare float @fmax(float, float) > > >> >> >> >>>> > > >> >> >> >>>> But at the moment llc crashes saying that "calls are not > > >> >> >> >>>> supported", > > >> >> >> >>>> this does not > > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32 > > >> >> >> >>> > > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device > > >> functions > > >> >> >> >>> have > > >> >> >> >>> been implemented for a little while now, so I'm surprisedto> > >> see > > >> >> >> >>> that > > >> >> >> >>> error. > > >> >> >> >>> Perhaps it's because the fmax function is not defined as > > >> >> >> >>> ptx_device. > > >> >> >> >>> > > >> >> >> > > > >> >> >> > This is the testcase that I am using to verify I the max > > >> builtin > > >> >> >> > function I am impementing > > >> >> >> > is actually recognised. I took inspiration from the llvm- > > >> intrinsic.ll > > >> >> >> > test case. > > >> >> >> > The command I am using to compile is: > > >> >> >> > > > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll > > >> >> >> > > > >> >> >> > The option -mattr does not seem to have any effect. > > >> >> >> > I tried also with the ptx_device qualifier with the same > > >> outcome. > > >> >> >> > I am using llvm from the svn repository. > > >> >> >> > > > >> >> >> > Bye, > > >> >> >> > > > >> >> >> > Alberto > > >> >> >> > > > >> >> >> >>>> > > >> >> >> >>>> Can you please give me a hint on what I am missing, orsome> > >> >> >> >>>> general > > >> >> >> >>>> advice on how > > >> >> >> >>>> to add builtin functions. > > >> >> >> >>>> > > >> >> >> >>>> Thank you in advance, > > >> >> >> >>>> > > >> >> >> >>>> Alberto. > > >> >> >> >>>> > > >> >> >> >>>> _______________________________________________ > > >> >> >> >>>> LLVM Developers mailing list > > >> >> >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >> >> >> >>>> > > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> > > >> >> >> >>> -- > > >> >> >> >>> > > >> >> >> >>> Thanks, > > >> >> >> >>> Justin Holewinski > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> -- > > >> >> >> >> > > >> >> >> >> Thanks, > > >> >> >> >> Justin Holewinski > > >> >> >> >> > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > > > >> >> > Thanks, > > >> >> > > > >> >> > Justin Holewinski > > >> >> > > > >> > > > >> > > > >> > > > >> > > > >> > -- > > >> > > > >> > Thanks, > > >> > > > >> > Justin Holewinski > > >> > > > >> > > >> _______________________________________________ > > >> LLVM Developers mailing list > > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111123/e6e4e73a/attachment.html>
Hi Justin, sorry for the delay, I have been busy. Micah's proposal requires to move the definitions of the intrinsics from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td thus allowing the generation of the file PTXGenIntrinsics.inc which will be included by PTXIntrinsicInfo.cpp. This is a quite big modification, do you agree with this ? Or do you have a better solution. Also I don't know yet how to make llvm recognize the intrinsics defined in lib/Target/PTX/PTXIntrinsics.td, the only other backend that does so is MBlaze. A tentative patch is attached. Bye, Alberto On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski <justin.holewinski at gmail.com> wrote:> > On Nov 23, 2011 8:33 AM, "Justin Holewinski" <justin.holewinski at gmail.com> > wrote: >> >> >> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com> >> wrote: >> > >> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com> >> > wrote: >> > > Alberto, >> > > The AMDIL backend solves your problem with intrinsic overloading this >> > > way: >> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat; >> > > >> > > Where TernaryIntFloat is defined as: >> > > class TernaryIntFloat : >> > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, >> > > LLVMMatchType<0>, LLVMMatchType<0>], []>; >> > > >> > > This allows us to write a multi-def for int_AMDIL_mad like so: >> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>; >> > > >> > > Where TernaryIntrinsicFloat is defined as: >> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr> >> > > { >> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst), >> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3), >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), >> > > [(set GPRF32:$dst, >> > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>; >> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst), >> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3), >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), >> > > [(set GPRV2F32:$dst, >> > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>; >> > > ... >> > > } >> > > >> > > Now, this doesn't completely work, because LLVM does not allow >> > > overloading of intrinsics values, so there needs to be a little coding in >> > > *IntrinsicInfo class. >> > > AMD always encodes builtin names as __amdil_mad_f32, >> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc.... >> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to >> > > find out what intrinsic the function maps to, the AMDIL backend strips off >> > > the type, and then looks up for just '__amdil_mad'. >> > > >> > > This is how you can do intrinsic overloading in LLVM. >> > > >> > > Hope this helps, >> > > Micah >> > >> > Thank you Micah, it really does. >> > >> > At the moment the PTX backend does not have a PTXIntrinsicInfo class, >> > the only backend which does so is MBlaze. >> > If Justin agrees with the approach I will look on how to generate the >> > PTXGenIntrinsics.inc file (I am still learning TableGen) >> > required by PTXIntrinsicInfo and write the lookUp method. >> >> Looks good to me. For OpenCL support in clang, we definitely need the >> built-in function support. And the total number of intrinsics like this >> should be relatively minimal. > > One thing I forgot to mention: once these are implemented, it may be worth > implementing some instruction selection patterns to collapse icmp/fcmp and > select pairs into Max/min whenever it makes sense. > >> >> > >> > Cheers, >> > >> > Alberto >> > >> > > >> > >> -----Original Message----- >> > >> From: llvmdev-bounces at cs.uiuc.edu >> > >> [mailto:llvmdev-bounces at cs.uiuc.edu] >> > >> On Behalf Of Alberto Magni >> > >> Sent: Tuesday, November 22, 2011 8:41 AM >> > >> To: Justin Holewinski >> > >> Cc: LLVM Developers Mailing List >> > >> Subject: Re: [LLVMdev] PTX builtin functions. >> > >> >> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski >> > >> <justin.holewinski at gmail.com> wrote: >> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni >> > >> <alberto.magni86 at gmail.com> >> > >> > wrote: >> > >> >> >> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski >> > >> >> <justin.holewinski at gmail.com> wrote: >> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni >> > >> >> > <alberto.magni86 at gmail.com> >> > >> >> > wrote: >> > >> >> >> >> > >> >> >> Hi Justin, >> > >> >> >> >> > >> >> >> attached you find the patch for the integer max instruction. >> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file >> > >> PTXIntrinsicInstrInfo.td >> > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe >> > >> >> >> a modification of this class can be defined in a separate file. >> > >> >> > >> > >> >> > >> > >> >> > I'm copying llvmdev. We should keep discussions like this on >> > >> >> > the >> > >> list >> > >> >> > for >> > >> >> > the benefit of others. >> > >> >> >> > >> >> I always forget "Reply to All". >> > >> >> >> > >> >> > We can probably factor out a generic description, or even just >> > >> >> > use >> > >> the >> > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file >> > >> is >> > >> >> > included >> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is >> > >> available >> > >> >> > in >> > >> >> > PTXIntrinsicInstrInfo.td. >> > >> >> >> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an >> > >> Intrinsic >> > >> >> and not with a SDNode, like PTX_INT3. >> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of >> > >> >> the immediate in the pattern, e.g. (i32 imm:$b). >> > >> > >> > >> > >> > >> > Alright, I'm fine with that. >> > >> > >> > >> >> >> > >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> Do you agree with this approach ? >> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED >> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ? >> > >> >> > >> > >> >> > >> > >> >> > Yes, I believe we should split these into signed and unsigned >> > >> variants. >> > >> >> > The >> > >> >> > results of max/min operations can definitely be different >> > >> depending on >> > >> >> > whether the operands are signed or unsigned. Since this >> > >> information is >> > >> >> > not >> > >> >> > encoded in LLVM types, we may want to create two versions for >> > >> >> > each >> > >> >> > integer >> > >> >> > type; something like: >> > >> >> > >> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32) >> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32) >> > >> >> >> > >> >> Yes, this the only way. >> > >> > >> > >> > >> > >> > A couple more comments: >> > >> > >> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics >> > >> (probably best >> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)] >> > >> >> > >> Ok >> > >> >> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can >> > >> take >> > >> > multiple types, but it's probably worth looking into so we can >> > >> > expose >> > >> this >> > >> > intrinsic to Clang. >> > >> >> > >> This could be an issue. I looked for something similar in other >> > >> backends >> > >> and I found no previous examples. It may be worth to ask on the ML >> > >> explicitly for this. >> > >> The only fallback that I see is to define explicitly every intrinsic >> > >> for every data type, >> > >> but this would prevent the usage of the multiclass for the definition >> > >> of the patterns. >> > >> >> > >> >> > >> Bye. >> > >> >> > >> > >> > >> > >> > >> >> >> > >> >> >> > >> >> > >> > >> >> > Otherwise, the patch looks good. >> > >> >> > >> > >> >> >> >> > >> >> >> >> > >> >> >> Thanks, >> > >> >> >> >> > >> >> >> Alberto >> > >> >> >> >> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni >> > >> >> >> <alberto.magni86 at gmail.com> wrote: >> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski >> > >> >> >> > <justin.holewinski at gmail.com> wrote: >> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski >> > >> >> >> >> <justin.holewinski at gmail.com> wrote: >> > >> >> >> >>> >> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni >> > >> >> >> >>> <alberto.magni86 at gmail.com> >> > >> >> >> >>> wrote: >> > >> >> >> >>>> >> > >> >> >> >>>> Dear Justin, >> > >> >> >> >>>> >> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin >> > >> functions >> > >> >> >> >>>> to >> > >> >> >> >>>> the PTX backend. >> > >> >> >> >>>> The attached file represent the first stub of a patch for >> > >> the fmax >> > >> >> >> >>>> builtin function. >> > >> >> >> >>> >> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end! >> > >> >> >> >>> There are really two main issues here. First, OpenCL >> > >> >> >> >>> built- >> > >> in >> > >> >> >> >>> functions >> > >> >> >> >>> do not belong in the PTX back-end. These will be >> > >> >> >> >>> implemented >> > >> in >> > >> >> >> >>> the >> > >> >> >> >>> libclc >> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end >> > >> will >> > >> >> >> >>> only >> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL >> > >> built-in >> > >> >> >> >>> functions >> > >> >> >> >>> in libclc. However, this particular function (max) >> > >> corresponds to >> > >> >> >> >>> a >> > >> >> >> >>> PTX >> > >> >> >> >>> instruction, so it makes sense to implement it as an >> > >> intrinsic in >> > >> >> >> >>> the >> > >> >> >> >>> back-end. >> > >> >> >> >>> Second, intrinsic functions require a bit more work. >> > >> >> >> >>> You're >> > >> off to >> > >> >> >> >>> a >> > >> >> >> >>> great start, but intrinsics are implemented a bit >> > >> differently. It >> > >> >> >> >>> looks >> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to >> > >> create >> > >> >> >> >>> one. >> > >> >> >> >>> Have >> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines >> > >> the >> > >> >> >> >>> PTX-specific >> > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and >> > >> >> >> >>> then >> > >> >> >> >>> implement >> > >> >> >> >>> a >> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no >> > >> >> >> >>> need >> > >> to >> > >> >> >> >>> create >> > >> >> >> >>> a new >> > >> >> >> >>> SDNode type for intrinsics, unless they require some >> > >> >> >> >>> special >> > >> >> >> >>> handling >> > >> >> >> >>> in the >> > >> >> >> >>> C++ code, which I do not see being the case here. >> > >> >> >> >> >> > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern matching >> > >> goes in >> > >> >> >> >> PTXInstrinsicInstrInfo.td. >> > >> >> >> >> >> > >> >> >> > >> > >> >> >> > Thank you for the pointers I will let you know when I have >> > >> >> >> > the >> > >> first >> > >> >> >> > patch. >> > >> >> >> > >> > >> >> >> >>> >> > >> >> >> >>> When you define a new intrinsic, use the following template >> > >> as a >> > >> >> >> >>> name: >> > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as >> > >> >> >> >>> @llvm.ptx.max(). >> > >> >> >> >>> Please follow the same convention when naming the >> > >> __builtin_* >> > >> >> >> >>> function. >> > >> >> >> >>> >> > >> >> >> >>>> >> > >> >> >> >>>> The test case I am trying is the following: >> > >> >> >> >>>> >> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { >> > >> >> >> >>>> entry: >> > >> >> >> >>>> %z = call float @fmax(float %x, float %y) >> > >> >> >> >>>> ret float %z >> > >> >> >> >>>> } >> > >> >> >> >>>> >> > >> >> >> >>>> declare float @fmax(float, float) >> > >> >> >> >>>> >> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not >> > >> >> >> >>>> supported", >> > >> >> >> >>>> this does not >> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32 >> > >> >> >> >>> >> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device >> > >> functions >> > >> >> >> >>> have >> > >> >> >> >>> been implemented for a little while now, so I'm surprised >> > >> >> >> >>> to >> > >> see >> > >> >> >> >>> that >> > >> >> >> >>> error. >> > >> >> >> >>> Perhaps it's because the fmax function is not defined as >> > >> >> >> >>> ptx_device. >> > >> >> >> >>> >> > >> >> >> > >> > >> >> >> > This is the testcase that I am using to verify I the max >> > >> builtin >> > >> >> >> > function I am impementing >> > >> >> >> > is actually recognised. I took inspiration from the llvm- >> > >> intrinsic.ll >> > >> >> >> > test case. >> > >> >> >> > The command I am using to compile is: >> > >> >> >> > >> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll >> > >> >> >> > >> > >> >> >> > The option -mattr does not seem to have any effect. >> > >> >> >> > I tried also with the ptx_device qualifier with the same >> > >> outcome. >> > >> >> >> > I am using llvm from the svn repository. >> > >> >> >> > >> > >> >> >> > Bye, >> > >> >> >> > >> > >> >> >> > Alberto >> > >> >> >> > >> > >> >> >> >>>> >> > >> >> >> >>>> Can you please give me a hint on what I am missing, or >> > >> >> >> >>>> some >> > >> >> >> >>>> general >> > >> >> >> >>>> advice on how >> > >> >> >> >>>> to add builtin functions. >> > >> >> >> >>>> >> > >> >> >> >>>> Thank you in advance, >> > >> >> >> >>>> >> > >> >> >> >>>> Alberto. >> > >> >> >> >>>> >> > >> >> >> >>>> _______________________________________________ >> > >> >> >> >>>> LLVM Developers mailing list >> > >> >> >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> >> >> >>>> >> > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> -- >> > >> >> >> >>> >> > >> >> >> >>> Thanks, >> > >> >> >> >>> Justin Holewinski >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> -- >> > >> >> >> >> >> > >> >> >> >> Thanks, >> > >> >> >> >> Justin Holewinski >> > >> >> >> >> >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > -- >> > >> >> > >> > >> >> > Thanks, >> > >> >> > >> > >> >> > Justin Holewinski >> > >> >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > >> > >> > Thanks, >> > >> > >> > >> > Justin Holewinski >> > >> > >> > >> >> > >> _______________________________________________ >> > >> LLVM Developers mailing list >> > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > >> > >-------------- next part -------------- A non-text attachment was scrubbed... Name: max_builtin.patch Type: text/x-patch Size: 21573 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111204/6eaa9ab4/attachment.bin>