Hi Justin, sorry for the delay, I have been busy. Micah's proposal requires to move the definitions of the intrinsics from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td thus allowing the generation of the file PTXGenIntrinsics.inc which will be included by PTXIntrinsicInfo.cpp. This is a quite big modification, do you agree with this ? Or do you have a better solution. Also I don't know yet how to make llvm recognize the intrinsics defined in lib/Target/PTX/PTXIntrinsics.td, the only other backend that does so is MBlaze. A tentative patch is attached. Bye, Alberto On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski <justin.holewinski at gmail.com> wrote:> > On Nov 23, 2011 8:33 AM, "Justin Holewinski" <justin.holewinski at gmail.com> > wrote: >> >> >> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com> >> wrote: >> > >> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com> >> > wrote: >> > > Alberto, >> > > The AMDIL backend solves your problem with intrinsic overloading this >> > > way: >> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat; >> > > >> > > Where TernaryIntFloat is defined as: >> > > class TernaryIntFloat : >> > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, >> > > LLVMMatchType<0>, LLVMMatchType<0>], []>; >> > > >> > > This allows us to write a multi-def for int_AMDIL_mad like so: >> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>; >> > > >> > > Where TernaryIntrinsicFloat is defined as: >> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr> >> > > { >> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst), >> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3), >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), >> > > [(set GPRF32:$dst, >> > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>; >> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst), >> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3), >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), >> > > [(set GPRV2F32:$dst, >> > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>; >> > > ... >> > > } >> > > >> > > Now, this doesn't completely work, because LLVM does not allow >> > > overloading of intrinsics values, so there needs to be a little coding in >> > > *IntrinsicInfo class. >> > > AMD always encodes builtin names as __amdil_mad_f32, >> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc.... >> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to >> > > find out what intrinsic the function maps to, the AMDIL backend strips off >> > > the type, and then looks up for just '__amdil_mad'. >> > > >> > > This is how you can do intrinsic overloading in LLVM. >> > > >> > > Hope this helps, >> > > Micah >> > >> > Thank you Micah, it really does. >> > >> > At the moment the PTX backend does not have a PTXIntrinsicInfo class, >> > the only backend which does so is MBlaze. >> > If Justin agrees with the approach I will look on how to generate the >> > PTXGenIntrinsics.inc file (I am still learning TableGen) >> > required by PTXIntrinsicInfo and write the lookUp method. >> >> Looks good to me. For OpenCL support in clang, we definitely need the >> built-in function support. And the total number of intrinsics like this >> should be relatively minimal. > > One thing I forgot to mention: once these are implemented, it may be worth > implementing some instruction selection patterns to collapse icmp/fcmp and > select pairs into Max/min whenever it makes sense. > >> >> > >> > Cheers, >> > >> > Alberto >> > >> > > >> > >> -----Original Message----- >> > >> From: llvmdev-bounces at cs.uiuc.edu >> > >> [mailto:llvmdev-bounces at cs.uiuc.edu] >> > >> On Behalf Of Alberto Magni >> > >> Sent: Tuesday, November 22, 2011 8:41 AM >> > >> To: Justin Holewinski >> > >> Cc: LLVM Developers Mailing List >> > >> Subject: Re: [LLVMdev] PTX builtin functions. >> > >> >> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski >> > >> <justin.holewinski at gmail.com> wrote: >> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni >> > >> <alberto.magni86 at gmail.com> >> > >> > wrote: >> > >> >> >> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski >> > >> >> <justin.holewinski at gmail.com> wrote: >> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni >> > >> >> > <alberto.magni86 at gmail.com> >> > >> >> > wrote: >> > >> >> >> >> > >> >> >> Hi Justin, >> > >> >> >> >> > >> >> >> attached you find the patch for the integer max instruction. >> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file >> > >> PTXIntrinsicInstrInfo.td >> > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe >> > >> >> >> a modification of this class can be defined in a separate file. >> > >> >> > >> > >> >> > >> > >> >> > I'm copying llvmdev. We should keep discussions like this on >> > >> >> > the >> > >> list >> > >> >> > for >> > >> >> > the benefit of others. >> > >> >> >> > >> >> I always forget "Reply to All". >> > >> >> >> > >> >> > We can probably factor out a generic description, or even just >> > >> >> > use >> > >> the >> > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file >> > >> is >> > >> >> > included >> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is >> > >> available >> > >> >> > in >> > >> >> > PTXIntrinsicInstrInfo.td. >> > >> >> >> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an >> > >> Intrinsic >> > >> >> and not with a SDNode, like PTX_INT3. >> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of >> > >> >> the immediate in the pattern, e.g. (i32 imm:$b). >> > >> > >> > >> > >> > >> > Alright, I'm fine with that. >> > >> > >> > >> >> >> > >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> Do you agree with this approach ? >> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED >> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ? >> > >> >> > >> > >> >> > >> > >> >> > Yes, I believe we should split these into signed and unsigned >> > >> variants. >> > >> >> > The >> > >> >> > results of max/min operations can definitely be different >> > >> depending on >> > >> >> > whether the operands are signed or unsigned. Since this >> > >> information is >> > >> >> > not >> > >> >> > encoded in LLVM types, we may want to create two versions for >> > >> >> > each >> > >> >> > integer >> > >> >> > type; something like: >> > >> >> > >> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32) >> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32) >> > >> >> >> > >> >> Yes, this the only way. >> > >> > >> > >> > >> > >> > A couple more comments: >> > >> > >> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics >> > >> (probably best >> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)] >> > >> >> > >> Ok >> > >> >> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can >> > >> take >> > >> > multiple types, but it's probably worth looking into so we can >> > >> > expose >> > >> this >> > >> > intrinsic to Clang. >> > >> >> > >> This could be an issue. I looked for something similar in other >> > >> backends >> > >> and I found no previous examples. It may be worth to ask on the ML >> > >> explicitly for this. >> > >> The only fallback that I see is to define explicitly every intrinsic >> > >> for every data type, >> > >> but this would prevent the usage of the multiclass for the definition >> > >> of the patterns. >> > >> >> > >> >> > >> Bye. >> > >> >> > >> > >> > >> > >> > >> >> >> > >> >> >> > >> >> > >> > >> >> > Otherwise, the patch looks good. >> > >> >> > >> > >> >> >> >> > >> >> >> >> > >> >> >> Thanks, >> > >> >> >> >> > >> >> >> Alberto >> > >> >> >> >> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni >> > >> >> >> <alberto.magni86 at gmail.com> wrote: >> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski >> > >> >> >> > <justin.holewinski at gmail.com> wrote: >> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski >> > >> >> >> >> <justin.holewinski at gmail.com> wrote: >> > >> >> >> >>> >> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni >> > >> >> >> >>> <alberto.magni86 at gmail.com> >> > >> >> >> >>> wrote: >> > >> >> >> >>>> >> > >> >> >> >>>> Dear Justin, >> > >> >> >> >>>> >> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin >> > >> functions >> > >> >> >> >>>> to >> > >> >> >> >>>> the PTX backend. >> > >> >> >> >>>> The attached file represent the first stub of a patch for >> > >> the fmax >> > >> >> >> >>>> builtin function. >> > >> >> >> >>> >> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end! >> > >> >> >> >>> There are really two main issues here. First, OpenCL >> > >> >> >> >>> built- >> > >> in >> > >> >> >> >>> functions >> > >> >> >> >>> do not belong in the PTX back-end. These will be >> > >> >> >> >>> implemented >> > >> in >> > >> >> >> >>> the >> > >> >> >> >>> libclc >> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end >> > >> will >> > >> >> >> >>> only >> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL >> > >> built-in >> > >> >> >> >>> functions >> > >> >> >> >>> in libclc. However, this particular function (max) >> > >> corresponds to >> > >> >> >> >>> a >> > >> >> >> >>> PTX >> > >> >> >> >>> instruction, so it makes sense to implement it as an >> > >> intrinsic in >> > >> >> >> >>> the >> > >> >> >> >>> back-end. >> > >> >> >> >>> Second, intrinsic functions require a bit more work. >> > >> >> >> >>> You're >> > >> off to >> > >> >> >> >>> a >> > >> >> >> >>> great start, but intrinsics are implemented a bit >> > >> differently. It >> > >> >> >> >>> looks >> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to >> > >> create >> > >> >> >> >>> one. >> > >> >> >> >>> Have >> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines >> > >> the >> > >> >> >> >>> PTX-specific >> > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and >> > >> >> >> >>> then >> > >> >> >> >>> implement >> > >> >> >> >>> a >> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no >> > >> >> >> >>> need >> > >> to >> > >> >> >> >>> create >> > >> >> >> >>> a new >> > >> >> >> >>> SDNode type for intrinsics, unless they require some >> > >> >> >> >>> special >> > >> >> >> >>> handling >> > >> >> >> >>> in the >> > >> >> >> >>> C++ code, which I do not see being the case here. >> > >> >> >> >> >> > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern matching >> > >> goes in >> > >> >> >> >> PTXInstrinsicInstrInfo.td. >> > >> >> >> >> >> > >> >> >> > >> > >> >> >> > Thank you for the pointers I will let you know when I have >> > >> >> >> > the >> > >> first >> > >> >> >> > patch. >> > >> >> >> > >> > >> >> >> >>> >> > >> >> >> >>> When you define a new intrinsic, use the following template >> > >> as a >> > >> >> >> >>> name: >> > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as >> > >> >> >> >>> @llvm.ptx.max(). >> > >> >> >> >>> Please follow the same convention when naming the >> > >> __builtin_* >> > >> >> >> >>> function. >> > >> >> >> >>> >> > >> >> >> >>>> >> > >> >> >> >>>> The test case I am trying is the following: >> > >> >> >> >>>> >> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { >> > >> >> >> >>>> entry: >> > >> >> >> >>>> %z = call float @fmax(float %x, float %y) >> > >> >> >> >>>> ret float %z >> > >> >> >> >>>> } >> > >> >> >> >>>> >> > >> >> >> >>>> declare float @fmax(float, float) >> > >> >> >> >>>> >> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not >> > >> >> >> >>>> supported", >> > >> >> >> >>>> this does not >> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32 >> > >> >> >> >>> >> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device >> > >> functions >> > >> >> >> >>> have >> > >> >> >> >>> been implemented for a little while now, so I'm surprised >> > >> >> >> >>> to >> > >> see >> > >> >> >> >>> that >> > >> >> >> >>> error. >> > >> >> >> >>> Perhaps it's because the fmax function is not defined as >> > >> >> >> >>> ptx_device. >> > >> >> >> >>> >> > >> >> >> > >> > >> >> >> > This is the testcase that I am using to verify I the max >> > >> builtin >> > >> >> >> > function I am impementing >> > >> >> >> > is actually recognised. I took inspiration from the llvm- >> > >> intrinsic.ll >> > >> >> >> > test case. >> > >> >> >> > The command I am using to compile is: >> > >> >> >> > >> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll >> > >> >> >> > >> > >> >> >> > The option -mattr does not seem to have any effect. >> > >> >> >> > I tried also with the ptx_device qualifier with the same >> > >> outcome. >> > >> >> >> > I am using llvm from the svn repository. >> > >> >> >> > >> > >> >> >> > Bye, >> > >> >> >> > >> > >> >> >> > Alberto >> > >> >> >> > >> > >> >> >> >>>> >> > >> >> >> >>>> Can you please give me a hint on what I am missing, or >> > >> >> >> >>>> some >> > >> >> >> >>>> general >> > >> >> >> >>>> advice on how >> > >> >> >> >>>> to add builtin functions. >> > >> >> >> >>>> >> > >> >> >> >>>> Thank you in advance, >> > >> >> >> >>>> >> > >> >> >> >>>> Alberto. >> > >> >> >> >>>> >> > >> >> >> >>>> _______________________________________________ >> > >> >> >> >>>> LLVM Developers mailing list >> > >> >> >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> >> >> >>>> >> > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> -- >> > >> >> >> >>> >> > >> >> >> >>> Thanks, >> > >> >> >> >>> Justin Holewinski >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> -- >> > >> >> >> >> >> > >> >> >> >> Thanks, >> > >> >> >> >> Justin Holewinski >> > >> >> >> >> >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > -- >> > >> >> > >> > >> >> > Thanks, >> > >> >> > >> > >> >> > Justin Holewinski >> > >> >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > >> > >> > Thanks, >> > >> > >> > >> > Justin Holewinski >> > >> > >> > >> >> > >> _______________________________________________ >> > >> LLVM Developers mailing list >> > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > >> > >-------------- next part -------------- A non-text attachment was scrubbed... Name: max_builtin.patch Type: text/x-patch Size: 21573 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111204/6eaa9ab4/attachment.bin>
On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at gmail.com>wrote:> Hi Justin, > > sorry for the delay, I have been busy. > > Micah's proposal requires to move the definitions of the intrinsics > from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td > thus allowing the generation of the file PTXGenIntrinsics.inc which > will be included by PTXIntrinsicInfo.cpp. > This is a quite big modification, do you agree with this ? > Or do you have a better solution. >I'm opposed to this, mainly because we need the intrinsic definitions to be defined during LLVM IR optimization and not just at code-gen time. This is particularly important for pure intrinsics, like llvm.ptx.read.tid.x(), where the optimizers can fold multiple calls to these functions into a single call. Without the intrinsic definitions in include/llvm/IntrinsicsPTX.td, this optimization would be illegal. At the moment, I'm not seeing a clean solution to this. Overloading the intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a partial solution, with the problems mentioned above. In my mind, the cleanest solution would be to just write out explicit intrinsics for each possible type. We can still use multiclasses to an extent: multiclass PTXBinaryIntrinsic<string prefix> { def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty], [InstrNoMem]>, GCCBuiltin<!strconcat(prefix, "_u16")>; // Repeat for s16, u32, s32, u64, s64, f32, f64 } defm int_ptx_mad<"__builtin_ptx_mad">; It's not the cleanest, but it gets the job done (unless I'm missing something).> > Also I don't know yet how to make llvm recognize the intrinsics > defined in lib/Target/PTX/PTXIntrinsics.td, the only other > backend that does so is MBlaze. > > A tentative patch is attached. > > Bye, > Alberto > > On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski > <justin.holewinski at gmail.com> wrote: > > > > On Nov 23, 2011 8:33 AM, "Justin Holewinski" < > justin.holewinski at gmail.com> > > wrote: > >> > >> > >> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com> > >> wrote: > >> > > >> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah < > Micah.Villmow at amd.com> > >> > wrote: > >> > > Alberto, > >> > > The AMDIL backend solves your problem with intrinsic overloading > this > >> > > way: > >> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat; > >> > > > >> > > Where TernaryIntFloat is defined as: > >> > > class TernaryIntFloat : > >> > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, > >> > > LLVMMatchType<0>, LLVMMatchType<0>], []>; > >> > > > >> > > This allows us to write a multi-def for int_AMDIL_mad like so: > >> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>; > >> > > > >> > > Where TernaryIntrinsicFloat is defined as: > >> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr> > >> > > { > >> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst), > >> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3), > >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > >> > > [(set GPRF32:$dst, > >> > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>; > >> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst), > >> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3), > >> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"), > >> > > [(set GPRV2F32:$dst, > >> > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>; > >> > > ... > >> > > } > >> > > > >> > > Now, this doesn't completely work, because LLVM does not allow > >> > > overloading of intrinsics values, so there needs to be a little > coding in > >> > > *IntrinsicInfo class. > >> > > AMD always encodes builtin names as __amdil_mad_f32, > >> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc.... > >> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to > >> > > find out what intrinsic the function maps to, the AMDIL backend > strips off > >> > > the type, and then looks up for just '__amdil_mad'. > >> > > > >> > > This is how you can do intrinsic overloading in LLVM. > >> > > > >> > > Hope this helps, > >> > > Micah > >> > > >> > Thank you Micah, it really does. > >> > > >> > At the moment the PTX backend does not have a PTXIntrinsicInfo class, > >> > the only backend which does so is MBlaze. > >> > If Justin agrees with the approach I will look on how to generate the > >> > PTXGenIntrinsics.inc file (I am still learning TableGen) > >> > required by PTXIntrinsicInfo and write the lookUp method. > >> > >> Looks good to me. For OpenCL support in clang, we definitely need the > >> built-in function support. And the total number of intrinsics like this > >> should be relatively minimal. > > > > One thing I forgot to mention: once these are implemented, it may be > worth > > implementing some instruction selection patterns to collapse icmp/fcmp > and > > select pairs into Max/min whenever it makes sense. > > > >> > >> > > >> > Cheers, > >> > > >> > Alberto > >> > > >> > > > >> > >> -----Original Message----- > >> > >> From: llvmdev-bounces at cs.uiuc.edu > >> > >> [mailto:llvmdev-bounces at cs.uiuc.edu] > >> > >> On Behalf Of Alberto Magni > >> > >> Sent: Tuesday, November 22, 2011 8:41 AM > >> > >> To: Justin Holewinski > >> > >> Cc: LLVM Developers Mailing List > >> > >> Subject: Re: [LLVMdev] PTX builtin functions. > >> > >> > >> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski > >> > >> <justin.holewinski at gmail.com> wrote: > >> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni > >> > >> <alberto.magni86 at gmail.com> > >> > >> > wrote: > >> > >> >> > >> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski > >> > >> >> <justin.holewinski at gmail.com> wrote: > >> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni > >> > >> >> > <alberto.magni86 at gmail.com> > >> > >> >> > wrote: > >> > >> >> >> > >> > >> >> >> Hi Justin, > >> > >> >> >> > >> > >> >> >> attached you find the patch for the integer max instruction. > >> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file > >> > >> PTXIntrinsicInstrInfo.td > >> > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, > maybe > >> > >> >> >> a modification of this class can be defined in a separate > file. > >> > >> >> > > >> > >> >> > > >> > >> >> > I'm copying llvmdev. We should keep discussions like this on > >> > >> >> > the > >> > >> list > >> > >> >> > for > >> > >> >> > the benefit of others. > >> > >> >> > >> > >> >> I always forget "Reply to All". > >> > >> >> > >> > >> >> > We can probably factor out a generic description, or even just > >> > >> >> > use > >> > >> the > >> > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td > file > >> > >> is > >> > >> >> > included > >> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is > >> > >> available > >> > >> >> > in > >> > >> >> > PTXIntrinsicInstrInfo.td. > >> > >> >> > >> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an > >> > >> Intrinsic > >> > >> >> and not with a SDNode, like PTX_INT3. > >> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of > >> > >> >> the immediate in the pattern, e.g. (i32 imm:$b). > >> > >> > > >> > >> > > >> > >> > Alright, I'm fine with that. > >> > >> > > >> > >> >> > >> > >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> Do you agree with this approach ? > >> > >> >> >> Also, do you think that a class like > PTX_INTRINSIC_INT3_SIGNED > >> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ? > >> > >> >> > > >> > >> >> > > >> > >> >> > Yes, I believe we should split these into signed and unsigned > >> > >> variants. > >> > >> >> > The > >> > >> >> > results of max/min operations can definitely be different > >> > >> depending on > >> > >> >> > whether the operands are signed or unsigned. Since this > >> > >> information is > >> > >> >> > not > >> > >> >> > encoded in LLVM types, we may want to create two versions for > >> > >> >> > each > >> > >> >> > integer > >> > >> >> > type; something like: > >> > >> >> > > >> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32) > >> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32) > >> > >> >> > >> > >> >> Yes, this the only way. > >> > >> > > >> > >> > > >> > >> > A couple more comments: > >> > >> > > >> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics > >> > >> (probably best > >> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)] > >> > >> > >> > >> Ok > >> > >> > >> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can > >> > >> take > >> > >> > multiple types, but it's probably worth looking into so we can > >> > >> > expose > >> > >> this > >> > >> > intrinsic to Clang. > >> > >> > >> > >> This could be an issue. I looked for something similar in other > >> > >> backends > >> > >> and I found no previous examples. It may be worth to ask on the ML > >> > >> explicitly for this. > >> > >> The only fallback that I see is to define explicitly every > intrinsic > >> > >> for every data type, > >> > >> but this would prevent the usage of the multiclass for the > definition > >> > >> of the patterns. > >> > >> > >> > >> > >> > >> Bye. > >> > >> > >> > >> > > >> > >> > > >> > >> >> > >> > >> >> > >> > >> >> > > >> > >> >> > Otherwise, the patch looks good. > >> > >> >> > > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> Thanks, > >> > >> >> >> > >> > >> >> >> Alberto > >> > >> >> >> > >> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni > >> > >> >> >> <alberto.magni86 at gmail.com> wrote: > >> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski > >> > >> >> >> > <justin.holewinski at gmail.com> wrote: > >> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski > >> > >> >> >> >> <justin.holewinski at gmail.com> wrote: > >> > >> >> >> >>> > >> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni > >> > >> >> >> >>> <alberto.magni86 at gmail.com> > >> > >> >> >> >>> wrote: > >> > >> >> >> >>>> > >> > >> >> >> >>>> Dear Justin, > >> > >> >> >> >>>> > >> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin > >> > >> functions > >> > >> >> >> >>>> to > >> > >> >> >> >>>> the PTX backend. > >> > >> >> >> >>>> The attached file represent the first stub of a patch > for > >> > >> the fmax > >> > >> >> >> >>>> builtin function. > >> > >> >> >> >>> > >> > >> >> >> >>> First off, thanks for helping to improve the PTX > back-end! > >> > >> >> >> >>> There are really two main issues here. First, OpenCL > >> > >> >> >> >>> built- > >> > >> in > >> > >> >> >> >>> functions > >> > >> >> >> >>> do not belong in the PTX back-end. These will be > >> > >> >> >> >>> implemented > >> > >> in > >> > >> >> >> >>> the > >> > >> >> >> >>> libclc > >> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The > back-end > >> > >> will > >> > >> >> >> >>> only > >> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL > >> > >> built-in > >> > >> >> >> >>> functions > >> > >> >> >> >>> in libclc. However, this particular function (max) > >> > >> corresponds to > >> > >> >> >> >>> a > >> > >> >> >> >>> PTX > >> > >> >> >> >>> instruction, so it makes sense to implement it as an > >> > >> intrinsic in > >> > >> >> >> >>> the > >> > >> >> >> >>> back-end. > >> > >> >> >> >>> Second, intrinsic functions require a bit more work. > >> > >> >> >> >>> You're > >> > >> off to > >> > >> >> >> >>> a > >> > >> >> >> >>> great start, but intrinsics are implemented a bit > >> > >> differently. It > >> > >> >> >> >>> looks > >> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to > >> > >> create > >> > >> >> >> >>> one. > >> > >> >> >> >>> Have > >> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file > defines > >> > >> the > >> > >> >> >> >>> PTX-specific > >> > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and > >> > >> >> >> >>> then > >> > >> >> >> >>> implement > >> > >> >> >> >>> a > >> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no > >> > >> >> >> >>> need > >> > >> to > >> > >> >> >> >>> create > >> > >> >> >> >>> a new > >> > >> >> >> >>> SDNode type for intrinsics, unless they require some > >> > >> >> >> >>> special > >> > >> >> >> >>> handling > >> > >> >> >> >>> in the > >> > >> >> >> >>> C++ code, which I do not see being the case here. > >> > >> >> >> >> > >> > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern > matching > >> > >> goes in > >> > >> >> >> >> PTXInstrinsicInstrInfo.td. > >> > >> >> >> >> > >> > >> >> >> > > >> > >> >> >> > Thank you for the pointers I will let you know when I have > >> > >> >> >> > the > >> > >> first > >> > >> >> >> > patch. > >> > >> >> >> > > >> > >> >> >> >>> > >> > >> >> >> >>> When you define a new intrinsic, use the following > template > >> > >> as a > >> > >> >> >> >>> name: > >> > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as > >> > >> >> >> >>> @llvm.ptx.max(). > >> > >> >> >> >>> Please follow the same convention when naming the > >> > >> __builtin_* > >> > >> >> >> >>> function. > >> > >> >> >> >>> > >> > >> >> >> >>>> > >> > >> >> >> >>>> The test case I am trying is the following: > >> > >> >> >> >>>> > >> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { > >> > >> >> >> >>>> entry: > >> > >> >> >> >>>> %z = call float @fmax(float %x, float %y) > >> > >> >> >> >>>> ret float %z > >> > >> >> >> >>>> } > >> > >> >> >> >>>> > >> > >> >> >> >>>> declare float @fmax(float, float) > >> > >> >> >> >>>> > >> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not > >> > >> >> >> >>>> supported", > >> > >> >> >> >>>> this does not > >> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32 > >> > >> >> >> >>> > >> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device > >> > >> functions > >> > >> >> >> >>> have > >> > >> >> >> >>> been implemented for a little while now, so I'm surprised > >> > >> >> >> >>> to > >> > >> see > >> > >> >> >> >>> that > >> > >> >> >> >>> error. > >> > >> >> >> >>> Perhaps it's because the fmax function is not defined as > >> > >> >> >> >>> ptx_device. > >> > >> >> >> >>> > >> > >> >> >> > > >> > >> >> >> > This is the testcase that I am using to verify I the max > >> > >> builtin > >> > >> >> >> > function I am impementing > >> > >> >> >> > is actually recognised. I took inspiration from the llvm- > >> > >> intrinsic.ll > >> > >> >> >> > test case. > >> > >> >> >> > The command I am using to compile is: > >> > >> >> >> > > >> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll > >> > >> >> >> > > >> > >> >> >> > The option -mattr does not seem to have any effect. > >> > >> >> >> > I tried also with the ptx_device qualifier with the same > >> > >> outcome. > >> > >> >> >> > I am using llvm from the svn repository. > >> > >> >> >> > > >> > >> >> >> > Bye, > >> > >> >> >> > > >> > >> >> >> > Alberto > >> > >> >> >> > > >> > >> >> >> >>>> > >> > >> >> >> >>>> Can you please give me a hint on what I am missing, or > >> > >> >> >> >>>> some > >> > >> >> >> >>>> general > >> > >> >> >> >>>> advice on how > >> > >> >> >> >>>> to add builtin functions. > >> > >> >> >> >>>> > >> > >> >> >> >>>> Thank you in advance, > >> > >> >> >> >>>> > >> > >> >> >> >>>> Alberto. > >> > >> >> >> >>>> > >> > >> >> >> >>>> _______________________________________________ > >> > >> >> >> >>>> LLVM Developers mailing list > >> > >> >> >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> >> >> >>>> > >> > >> >> >> >>> > >> > >> >> >> >>> > >> > >> >> >> >>> > >> > >> >> >> >>> -- > >> > >> >> >> >>> > >> > >> >> >> >>> Thanks, > >> > >> >> >> >>> Justin Holewinski > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> > >> > >> >> >> >> -- > >> > >> >> >> >> > >> > >> >> >> >> Thanks, > >> > >> >> >> >> Justin Holewinski > >> > >> >> >> >> > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > -- > >> > >> >> > > >> > >> >> > Thanks, > >> > >> >> > > >> > >> >> > Justin Holewinski > >> > >> >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > -- > >> > >> > > >> > >> > Thanks, > >> > >> > > >> > >> > Justin Holewinski > >> > >> > > >> > >> > >> > >> _______________________________________________ > >> > >> LLVM Developers mailing list > >> > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > > >> > > >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111205/5621a5eb/attachment.html>
It is my understanding that all you need to do is specify let isTarget = 1 in
your .td file and it will generate target specific intrinsics. This should allow
you to keep the IntrinsicsPTX.td file in the same location.
Micah
From: Justin Holewinski [mailto:justin.holewinski at gmail.com]
Sent: Monday, December 05, 2011 6:13 AM
To: Alberto Magni
Cc: Villmow, Micah; LLVM Developers Mailing List
Subject: Re: [LLVMdev] PTX builtin functions.
On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
Hi Justin,
sorry for the delay, I have been busy.
Micah's proposal requires to move the definitions of the intrinsics
from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
thus allowing the generation of the file PTXGenIntrinsics.inc which
will be included by PTXIntrinsicInfo.cpp.
This is a quite big modification, do you agree with this ?
Or do you have a better solution.
I'm opposed to this, mainly because we need the intrinsic definitions to be
defined during LLVM IR optimization and not just at code-gen time. This is
particularly important for pure intrinsics, like llvm.ptx.read.tid.x(), where
the optimizers can fold multiple calls to these functions into a single call.
Without the intrinsic definitions in include/llvm/IntrinsicsPTX.td, this
optimization would be illegal.
At the moment, I'm not seeing a clean solution to this. Overloading the
intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a partial
solution, with the problems mentioned above. In my mind, the cleanest solution
would be to just write out explicit intrinsics for each possible type. We can
still use multiclasses to an extent:
multiclass PTXBinaryIntrinsic<string prefix> {
def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty],
[InstrNoMem]>,
GCCBuiltin<!strconcat(prefix, "_u16")>;
// Repeat for s16, u32, s32, u64, s64, f32, f64
}
defm int_ptx_mad<"__builtin_ptx_mad">;
It's not the cleanest, but it gets the job done (unless I'm missing
something).
Also I don't know yet how to make llvm recognize the intrinsics
defined in lib/Target/PTX/PTXIntrinsics.td, the only other
backend that does so is MBlaze.
A tentative patch is attached.
Bye,
Alberto
On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
<justin.holewinski at gmail.com<mailto:justin.holewinski at
gmail.com>> wrote:>
> On Nov 23, 2011 8:33 AM, "Justin Holewinski"
<justin.holewinski at gmail.com<mailto:justin.holewinski at
gmail.com>>
> wrote:
>>
>>
>> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86
at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> wrote:
>> >
>> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow
at amd.com<mailto:Micah.Villmow at amd.com>>
>> > wrote:
>> > > Alberto,
>> > > The AMDIL backend solves your problem with intrinsic
overloading this
>> > > way:
>> > > def int_AMDIL_mad :
GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>> > >
>> > > Where TernaryIntFloat is defined as:
>> > > class TernaryIntFloat :
>> > > Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,
>> > > LLVMMatchType<0>, LLVMMatchType<0>],
[]>;
>> > >
>> > > This allows us to write a multi-def for int_AMDIL_mad like
so:
>> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD,
int_AMDIL_mad>;
>> > >
>> > > Where TernaryIntrinsicFloat is defined as:
>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode,
Intrinsic intr>
>> > > {
>> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > > [(set GPRF32:$dst,
>> > > (intr GPRF32:$src, GPRF32:$src2,
GPRF32:$src3))]>;
>> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2,
$src3"),
>> > > [(set GPRV2F32:$dst,
>> > > (intr GPRV2F32:$src, GPRV2F32:$src2,
GPRV2F32:$src3))]>;
>> > > ...
>> > > }
>> > >
>> > > Now, this doesn't completely work, because LLVM does not
allow
>> > > overloading of intrinsics values, so there needs to be a
little coding in
>> > > *IntrinsicInfo class.
>> > > AMD always encodes builtin names as __amdil_mad_f32,
>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>> > > So in the function "*IntrinsicInfo::lookup_name",
when attempting to
>> > > find out what intrinsic the function maps to, the AMDIL
backend strips off
>> > > the type, and then looks up for just '__amdil_mad'.
>> > >
>> > > This is how you can do intrinsic overloading in LLVM.
>> > >
>> > > Hope this helps,
>> > > Micah
>> >
>> > Thank you Micah, it really does.
>> >
>> > At the moment the PTX backend does not have a PTXIntrinsicInfo
class,
>> > the only backend which does so is MBlaze.
>> > If Justin agrees with the approach I will look on how to generate
the
>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>> > required by PTXIntrinsicInfo and write the lookUp method.
>>
>> Looks good to me. For OpenCL support in clang, we definitely need the
>> built-in function support. And the total number of intrinsics like
this
>> should be relatively minimal.
>
> One thing I forgot to mention: once these are implemented, it may be worth
> implementing some instruction selection patterns to collapse icmp/fcmp and
> select pairs into Max/min whenever it makes sense.
>
>>
>> >
>> > Cheers,
>> >
>> > Alberto
>> >
>> > >
>> > >> -----Original Message-----
>> > >> From: llvmdev-bounces at
cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>
>> > >> [mailto:llvmdev-bounces at
cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>]
>> > >> On Behalf Of Alberto Magni
>> > >> Sent: Tuesday, November 22, 2011 8:41 AM
>> > >> To: Justin Holewinski
>> > >> Cc: LLVM Developers Mailing List
>> > >> Subject: Re: [LLVMdev] PTX builtin functions.
>> > >>
>> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>> > >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>> > >> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> > wrote:
>> > >> >>
>> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin
Holewinski
>> > >> >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto
Magni
>> > >> >> > <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> > wrote:
>> > >> >> >>
>> > >> >> >> Hi Justin,
>> > >> >> >>
>> > >> >> >> attached you find the patch for the
integer max instruction.
>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in
file
>> > >> PTXIntrinsicInstrInfo.td
>> > >> >> >> is almost an exact copy of PTX_INT3 in
PTXInstrInfo.td, maybe
>> > >> >> >> a modification of this class can be
defined in a separate file.
>> > >> >> >
>> > >> >> >
>> > >> >> > I'm copying llvmdev. We should keep
discussions like this on
>> > >> >> > the
>> > >> list
>> > >> >> > for
>> > >> >> > the benefit of others.
>> > >> >>
>> > >> >> I always forget "Reply to All".
>> > >> >>
>> > >> >> > We can probably factor out a generic
description, or even just
>> > >> >> > use
>> > >> the
>> > >> >> > PTX_INT3 multiclass directly. The
PTXIntrinsicInstrInfo.td file
>> > >> is
>> > >> >> > included
>> > >> >> > by PTXInstrInfo.td, so anything defined in
PTXInstrInfo.td is
>> > >> available
>> > >> >> > in
>> > >> >> > PTXIntrinsicInstrInfo.td.
>> > >> >>
>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3
works with an
>> > >> Intrinsic
>> > >> >> and not with a SDNode, like PTX_INT3.
>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of
the type of
>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> > >> >
>> > >> >
>> > >> > Alright, I'm fine with that.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Do you agree with this approach ?
>> > >> >> >> Also, do you think that a class like
PTX_INTRINSIC_INT3_SIGNED
>> > >> >> >> (a clone of PTX_INT3_SIGNED) is
required ?
>> > >> >> >
>> > >> >> >
>> > >> >> > Yes, I believe we should split these into
signed and unsigned
>> > >> variants.
>> > >> >> > The
>> > >> >> > results of max/min operations can
definitely be different
>> > >> depending on
>> > >> >> > whether the operands are signed or
unsigned. Since this
>> > >> information is
>> > >> >> > not
>> > >> >> > encoded in LLVM types, we may want to
create two versions for
>> > >> >> > each
>> > >> >> > integer
>> > >> >> > type; something like:
>> > >> >> >
>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> > >> >>
>> > >> >> Yes, this the only way.
>> > >> >
>> > >> >
>> > >> > A couple more comments:
>> > >> >
>> > >> > Please make sure to set TargetPrefix="ptx"
for the intrinsics
>> > >> (probably best
>> > >> > in the multiclass, see
PTXReadSpecialRegisterIntrinsic_r32)]
>> > >>
>> > >> Ok
>> > >>
>> > >> > I'm not sure how to define a GCCBuiltin for an
intrinsic that can
>> > >> take
>> > >> > multiple types, but it's probably worth looking
into so we can
>> > >> > expose
>> > >> this
>> > >> > intrinsic to Clang.
>> > >>
>> > >> This could be an issue. I looked for something similar in
other
>> > >> backends
>> > >> and I found no previous examples. It may be worth to ask
on the ML
>> > >> explicitly for this.
>> > >> The only fallback that I see is to define explicitly
every intrinsic
>> > >> for every data type,
>> > >> but this would prevent the usage of the multiclass for
the definition
>> > >> of the patterns.
>> > >>
>> > >>
>> > >> Bye.
>> > >>
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >
>> > >> >> > Otherwise, the patch looks good.
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >>
>> > >> >> >> Alberto
>> > >> >> >>
>> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM,
Alberto Magni
>> > >> >> >> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
>> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM,
Justin Holewinski
>> > >> >> >> > <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16
AM, Justin Holewinski
>> > >> >> >> >> <justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, Nov 16, 2011 at
8:05 AM, Alberto Magni
>> > >> >> >> >>> <alberto.magni86 at
gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>> Dear Justin,
>> > >> >> >> >>>>
>> > >> >> >> >>>> I am trying to add the
support for some OpenCL builtin
>> > >> functions
>> > >> >> >> >>>> to
>> > >> >> >> >>>> the PTX backend.
>> > >> >> >> >>>> The attached file
represent the first stub of a patch for
>> > >> the fmax
>> > >> >> >> >>>> builtin function.
>> > >> >> >> >>>
>> > >> >> >> >>> First off, thanks for
helping to improve the PTX back-end!
>> > >> >> >> >>> There are really two main
issues here. First, OpenCL
>> > >> >> >> >>> built-
>> > >> in
>> > >> >> >> >>> functions
>> > >> >> >> >>> do not belong in the PTX
back-end. These will be
>> > >> >> >> >>> implemented
>> > >> in
>> > >> >> >> >>> the
>> > >> >> >> >>> libclc
>> > >> >> >> >>> library
(http://www.pcc.me.uk/~peter/libclc). The back-end
>> > >> will
>> > >> >> >> >>> only
>> > >> >> >> >>> implement PTX intrinsics,
which may be used by the OpenCL
>> > >> built-in
>> > >> >> >> >>> functions
>> > >> >> >> >>> in libclc. However, this
particular function (max)
>> > >> corresponds to
>> > >> >> >> >>> a
>> > >> >> >> >>> PTX
>> > >> >> >> >>> instruction, so it makes
sense to implement it as an
>> > >> intrinsic in
>> > >> >> >> >>> the
>> > >> >> >> >>> back-end.
>> > >> >> >> >>> Second, intrinsic
functions require a bit more work.
>> > >> >> >> >>> You're
>> > >> off to
>> > >> >> >> >>> a
>> > >> >> >> >>> great start, but
intrinsics are implemented a bit
>> > >> differently. It
>> > >> >> >> >>> looks
>> > >> >> >> >>> like LLVM does not have a
max intrinsic, so we'll need to
>> > >> create
>> > >> >> >> >>> one.
>> > >> >> >> >>> Have
>> > >> >> >> >>> a look at
include/llvm/IntrinsicsPTX.td. This file defines
>> > >> the
>> > >> >> >> >>> PTX-specific
>> > >> >> >> >>> intrinsics. You can add
an intrinsic for max here, and
>> > >> >> >> >>> then
>> > >> >> >> >>> implement
>> > >> >> >> >>> a
>> > >> >> >> >>> pattern-match in the
PTXInstrInfo.td file. There is no
>> > >> >> >> >>> need
>> > >> to
>> > >> >> >> >>> create
>> > >> >> >> >>> a new
>> > >> >> >> >>> SDNode type for
intrinsics, unless they require some
>> > >> >> >> >>> special
>> > >> >> >> >>> handling
>> > >> >> >> >>> in the
>> > >> >> >> >>> C++ code, which I do not
see being the case here.
>> > >> >> >> >>
>> > >> >> >> >> Sorry, there's a typo
here. The intrinsic pattern matching
>> > >> goes in
>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >> > Thank you for the pointers I will
let you know when I have
>> > >> >> >> > the
>> > >> first
>> > >> >> >> > patch.
>> > >> >> >> >
>> > >> >> >> >>>
>> > >> >> >> >>> When you define a new
intrinsic, use the following template
>> > >> as a
>> > >> >> >> >>> name:
>> > >> >> >> >>> int_ptx_max. This will
define the LLVM intrinsic as
>> > >> >> >> >>> @llvm.ptx.max().
>> > >> >> >> >>> Please follow the same
convention when naming the
>> > >> __builtin_*
>> > >> >> >> >>> function.
>> > >> >> >> >>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> The test case I am
trying is the following:
>> > >> >> >> >>>>
>> > >> >> >> >>>> define ptx_device
float @f(float %x, float %y) {
>> > >> >> >> >>>> entry:
>> > >> >> >> >>>> %z = call float
@fmax(float %x, float %y)
>> > >> >> >> >>>> ret float %z
>> > >> >> >> >>>> }
>> > >> >> >> >>>>
>> > >> >> >> >>>> declare float
@fmax(float, float)
>> > >> >> >> >>>>
>> > >> >> >> >>>> But at the moment llc
crashes saying that "calls are not
>> > >> >> >> >>>> supported",
>> > >> >> >> >>>> this does not
>> > >> >> >> >>>> happens with llvm
builtins like llvm.sqrt.f32
>> > >> >> >> >>>
>> > >> >> >> >>> Which version of LLVM are
you using? Calls to PTX device
>> > >> functions
>> > >> >> >> >>> have
>> > >> >> >> >>> been implemented for a
little while now, so I'm surprised
>> > >> >> >> >>> to
>> > >> see
>> > >> >> >> >>> that
>> > >> >> >> >>> error.
>> > >> >> >> >>> Perhaps it's because
the fmax function is not defined as
>> > >> >> >> >>> ptx_device.
>> > >> >> >> >>>
>> > >> >> >> >
>> > >> >> >> > This is the testcase that I am
using to verify I the max
>> > >> builtin
>> > >> >> >> > function I am impementing
>> > >> >> >> > is actually recognised. I took
inspiration from the llvm-
>> > >> intrinsic.ll
>> > >> >> >> > test case.
>> > >> >> >> > The command I am using to compile
is:
>> > >> >> >> >
>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22
fmax.ll
>> > >> >> >> >
>> > >> >> >> > The option -mattr does not seem to
have any effect.
>> > >> >> >> > I tried also with the ptx_device
qualifier with the same
>> > >> outcome.
>> > >> >> >> > I am using llvm from the svn
repository.
>> > >> >> >> >
>> > >> >> >> > Bye,
>> > >> >> >> >
>> > >> >> >> > Alberto
>> > >> >> >> >
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you please give me
a hint on what I am missing, or
>> > >> >> >> >>>> some
>> > >> >> >> >>>> general
>> > >> >> >> >>>> advice on how
>> > >> >> >> >>>> to add builtin
functions.
>> > >> >> >> >>>>
>> > >> >> >> >>>> Thank you in advance,
>> > >> >> >> >>>>
>> > >> >> >> >>>> Alberto.
>> > >> >> >> >>>>
>> > >> >> >> >>>>
_______________________________________________
>> > >> >> >> >>>> LLVM Developers
mailing list
>> > >> >> >> >>>> LLVMdev at
cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
>> > >> >> >> >>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> --
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>
>> > >> >> >> >> Thanks,
>> > >> >> >> >> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> >
>> > >> >> > Justin Holewinski
>> > >> >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Justin Holewinski
>> > >> >
>> > >>
>> > >> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at
cs.uiuc.edu> http://llvm.cs.uiuc.edu
>> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >
>> > >
--
Thanks,
Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/29904c6f/attachment.html>