Eric Christopher
2014-Jun-17 21:22 UTC
[LLVMdev] Attaching range metadata to IntrinsicInst
Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote:> Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar to what we did for "target triple"? > > Thanks, > Jingyue > > > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky <nlewycky at google.com> wrote: >> >> On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: >>> >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: >>>> >>>> Chandler Carruth wrote: >>>>> >>>>> This seems fine to me, but I'd like to make sure it looks OK to Nick as >>>>> well. >>>> >>>> >>>> I strongly prefer baking in knowledge about the intrinsics themselves >>>> into the passes if possible. Metadata will always be secondary. >>> >>> >>> So you're saying that in this particular case you'd prefer LLVM passes to >>> know about the range of these PTX intrinsics, rather than Clang adding them >>> as metadata? >> >> >> Yep. >> >>> ValueTracking.cpp already has some iffy target knowledge (someone sneaked >>> a direct Intrinsic::x86_sse42_crc32_64_64 check in there), but extending it >>> to other intrinsics in other targets seems like too much... >> >> >> That's not iffy. That's exactly how it should work, and we should have >> more of that. There is a major gotcha and that's dealing with the case where >> the intrinsics don't exist because the backend wasn't compiled in. If >> x86_sse42_crc32_64_64 is in there (and also in instcombine btw), presumably >> that problem is solved somehow? Or does llvm actually not build if you don't >> enable the x86 target? I feel like we would've heard about that. >> >> Nick >> >>> So should target info be passed into it in some way? Any suggestions >>> where to put it? TargetLibraryInfo? TargetTransformInfo? In any case this >>> seems like the target interface will have to be augmented, and we'll have to >>> carry an object around into ValueTracking's compute* functions. If this is >>> the right way, then this is the way it will be done - design ideas are >>> appreciated. >>> >>> Eli >>> >>> >>> >>>> >>>> >>>> Separately, should value tracking look use range metadata when it's >>>> available? Absolutely. >>>> >>>> I think it should apply to all CallInst not just IntrinsicInst (which is >>>> derived from CallInst). >>>> >>>> Nick >>>> >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue Wu <jingyue at google.com >>>>> <mailto:jingyue at google.com>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> The range metadata can only be attached to LoadInst for now. I am >>>>> considering extending its usage to IntrinsicInst so that the >>>>> frontend can annotate the range of the return value of an intrinsic >>>>> call. e.g., >>>>> %a = call i32 @llvm.xxx(), !range !0 >>>>> !0 = metadata !{ i32 0, i23 1024 } >>>>> >>>>> The motivation behind this extension is some optimizations we are >>>>> working on for CUDA programs. Some special registers in CUDA (e.g., >>>>> threadIdx.x) are bounded per CUDA programming guide, and knowing >>>>> their ranges can improve the precision of ValueTracking and benefit >>>>> optimizations such as InstCombine. >>>>> >>>>> To implement this idea, we need ValueTracking to be aware of the >>>>> ranges of these special variables. These special registers are so >>>>> far read-only and accessed using intrinsics. e.g., >>>>> %threadIdx.x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(). >>>>> >>>>> One possible approach is to have ValueTracking compute the known >>>>> bits of these intrinsics as special cases. This approach is already >>>>> taken for the x86_sse42_crc32_64_64 intrinsic. However, this >>>>> approach may not be elegant because the ranges of these CUDA >>>>> special >>>>> registers depend on the GPU compute capability specified by >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is >>>>> probably discouraged. >>>>> >>>>> Therefore, the approach I am considering is to have clang annotate >>>>> the ranges of these CUDA special registers according to the >>>>> -target-cpu flag, and have ValueTracking pick the range metadata >>>>> for >>>>> optimization. By doing so, we hide the target-specific info from >>>>> ValueTracking. >>>>> >>>>> The code change in llvm minus clang won't be large. The core change >>>>> is only a few lines: >>>>> http://reviews.llvm.org/differential/diff/10464/. If this extension >>>>> sounds good to you, I'll definitely add more tests and revise the >>>>> documents on range metadata. >>>>> >>>>> Best, >>>>> Jingyue >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>>>> http://llvm.cs.uiuc.edu >>>>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Eric, In the IR, besides "target datalayout" and "target triple", we have a special "target cpu" string which is set by the Clang front-end according to its -target-cpu flag. We also write a Module::getTargetCPU() method to retrieve this string from the IR. Jingyue On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher <echristo at gmail.com> wrote:> Eh? How do you envision this? > > -eric > > On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > > Hi Nick, > > > > That makes sense. I think a main issue here is that the ranges of these > PTX > > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > > visible to clang and llc. Would you mind we specify "target cpu" in the > IR > > similar to what we did for "target triple"? > > > > Thanks, > > Jingyue > > > > > > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky <nlewycky at google.com> > wrote: > >> > >> On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: > >>> > >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > >>>> > >>>> Chandler Carruth wrote: > >>>>> > >>>>> This seems fine to me, but I'd like to make sure it looks OK to Nick > as > >>>>> well. > >>>> > >>>> > >>>> I strongly prefer baking in knowledge about the intrinsics themselves > >>>> into the passes if possible. Metadata will always be secondary. > >>> > >>> > >>> So you're saying that in this particular case you'd prefer LLVM passes > to > >>> know about the range of these PTX intrinsics, rather than Clang adding > them > >>> as metadata? > >> > >> > >> Yep. > >> > >>> ValueTracking.cpp already has some iffy target knowledge (someone > sneaked > >>> a direct Intrinsic::x86_sse42_crc32_64_64 check in there), but > extending it > >>> to other intrinsics in other targets seems like too much... > >> > >> > >> That's not iffy. That's exactly how it should work, and we should have > >> more of that. There is a major gotcha and that's dealing with the case > where > >> the intrinsics don't exist because the backend wasn't compiled in. If > >> x86_sse42_crc32_64_64 is in there (and also in instcombine btw), > presumably > >> that problem is solved somehow? Or does llvm actually not build if you > don't > >> enable the x86 target? I feel like we would've heard about that. > >> > >> Nick > >> > >>> So should target info be passed into it in some way? Any suggestions > >>> where to put it? TargetLibraryInfo? TargetTransformInfo? In any case > this > >>> seems like the target interface will have to be augmented, and we'll > have to > >>> carry an object around into ValueTracking's compute* functions. If > this is > >>> the right way, then this is the way it will be done - design ideas are > >>> appreciated. > >>> > >>> Eli > >>> > >>> > >>> > >>>> > >>>> > >>>> Separately, should value tracking look use range metadata when it's > >>>> available? Absolutely. > >>>> > >>>> I think it should apply to all CallInst not just IntrinsicInst (which > is > >>>> derived from CallInst). > >>>> > >>>> Nick > >>>> > >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue Wu <jingyue at google.com > >>>>> <mailto:jingyue at google.com>> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> The range metadata can only be attached to LoadInst for now. I am > >>>>> considering extending its usage to IntrinsicInst so that the > >>>>> frontend can annotate the range of the return value of an > intrinsic > >>>>> call. e.g., > >>>>> %a = call i32 @llvm.xxx(), !range !0 > >>>>> !0 = metadata !{ i32 0, i23 1024 } > >>>>> > >>>>> The motivation behind this extension is some optimizations we are > >>>>> working on for CUDA programs. Some special registers in CUDA > (e.g., > >>>>> threadIdx.x) are bounded per CUDA programming guide, and knowing > >>>>> their ranges can improve the precision of ValueTracking and > benefit > >>>>> optimizations such as InstCombine. > >>>>> > >>>>> To implement this idea, we need ValueTracking to be aware of the > >>>>> ranges of these special variables. These special registers are so > >>>>> far read-only and accessed using intrinsics. e.g., > >>>>> %threadIdx.x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(). > >>>>> > >>>>> One possible approach is to have ValueTracking compute the known > >>>>> bits of these intrinsics as special cases. This approach is > already > >>>>> taken for the x86_sse42_crc32_64_64 intrinsic. However, this > >>>>> approach may not be elegant because the ranges of these CUDA > >>>>> special > >>>>> registers depend on the GPU compute capability specified by > >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in > sm_20 > >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is > >>>>> probably discouraged. > >>>>> > >>>>> Therefore, the approach I am considering is to have clang > annotate > >>>>> the ranges of these CUDA special registers according to the > >>>>> -target-cpu flag, and have ValueTracking pick the range metadata > >>>>> for > >>>>> optimization. By doing so, we hide the target-specific info from > >>>>> ValueTracking. > >>>>> > >>>>> The code change in llvm minus clang won't be large. The core > change > >>>>> is only a few lines: > >>>>> http://reviews.llvm.org/differential/diff/10464/. If this > extension > >>>>> sounds good to you, I'll definitely add more tests and revise the > >>>>> documents on range metadata. > >>>>> > >>>>> Best, > >>>>> Jingyue > >>>>> > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > >>>>> http://llvm.cs.uiuc.edu > >>>>> > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>>> > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140617/816fe91c/attachment.html>
Eric Christopher
2014-Jun-17 21:43 UTC
[LLVMdev] Attaching range metadata to IntrinsicInst
On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote:> Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the IR. >Not sure that I like this. Each function can have a target cpu though. That each subtarget cares about the value of the target cpu for how the intrinsic works sounds a lot like TargetTransformInfo to me. -eric> Jingyue > > > On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher <echristo at gmail.com> > wrote: >> >> Eh? How do you envision this? >> >> -eric >> >> On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: >> > Hi Nick, >> > >> > That makes sense. I think a main issue here is that the ranges of these >> > PTX >> > special registers (e.g., threadIdx.x) depend on -target-cpu which is >> > only >> > visible to clang and llc. Would you mind we specify "target cpu" in the >> > IR >> > similar to what we did for "target triple"? >> > >> > Thanks, >> > Jingyue >> > >> > >> > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky <nlewycky at google.com> >> > wrote: >> >> >> >> On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: >> >>> >> >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: >> >>>> >> >>>> Chandler Carruth wrote: >> >>>>> >> >>>>> This seems fine to me, but I'd like to make sure it looks OK to Nick >> >>>>> as >> >>>>> well. >> >>>> >> >>>> >> >>>> I strongly prefer baking in knowledge about the intrinsics themselves >> >>>> into the passes if possible. Metadata will always be secondary. >> >>> >> >>> >> >>> So you're saying that in this particular case you'd prefer LLVM passes >> >>> to >> >>> know about the range of these PTX intrinsics, rather than Clang adding >> >>> them >> >>> as metadata? >> >> >> >> >> >> Yep. >> >> >> >>> ValueTracking.cpp already has some iffy target knowledge (someone >> >>> sneaked >> >>> a direct Intrinsic::x86_sse42_crc32_64_64 check in there), but >> >>> extending it >> >>> to other intrinsics in other targets seems like too much... >> >> >> >> >> >> That's not iffy. That's exactly how it should work, and we should have >> >> more of that. There is a major gotcha and that's dealing with the case >> >> where >> >> the intrinsics don't exist because the backend wasn't compiled in. If >> >> x86_sse42_crc32_64_64 is in there (and also in instcombine btw), >> >> presumably >> >> that problem is solved somehow? Or does llvm actually not build if you >> >> don't >> >> enable the x86 target? I feel like we would've heard about that. >> >> >> >> Nick >> >> >> >>> So should target info be passed into it in some way? Any suggestions >> >>> where to put it? TargetLibraryInfo? TargetTransformInfo? In any case >> >>> this >> >>> seems like the target interface will have to be augmented, and we'll >> >>> have to >> >>> carry an object around into ValueTracking's compute* functions. If >> >>> this is >> >>> the right way, then this is the way it will be done - design ideas are >> >>> appreciated. >> >>> >> >>> Eli >> >>> >> >>> >> >>> >> >>>> >> >>>> >> >>>> Separately, should value tracking look use range metadata when it's >> >>>> available? Absolutely. >> >>>> >> >>>> I think it should apply to all CallInst not just IntrinsicInst (which >> >>>> is >> >>>> derived from CallInst). >> >>>> >> >>>> Nick >> >>>> >> >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue Wu <jingyue at google.com >> >>>>> <mailto:jingyue at google.com>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> The range metadata can only be attached to LoadInst for now. I >> >>>>> am >> >>>>> considering extending its usage to IntrinsicInst so that the >> >>>>> frontend can annotate the range of the return value of an >> >>>>> intrinsic >> >>>>> call. e.g., >> >>>>> %a = call i32 @llvm.xxx(), !range !0 >> >>>>> !0 = metadata !{ i32 0, i23 1024 } >> >>>>> >> >>>>> The motivation behind this extension is some optimizations we >> >>>>> are >> >>>>> working on for CUDA programs. Some special registers in CUDA >> >>>>> (e.g., >> >>>>> threadIdx.x) are bounded per CUDA programming guide, and knowing >> >>>>> their ranges can improve the precision of ValueTracking and >> >>>>> benefit >> >>>>> optimizations such as InstCombine. >> >>>>> >> >>>>> To implement this idea, we need ValueTracking to be aware of the >> >>>>> ranges of these special variables. These special registers are >> >>>>> so >> >>>>> far read-only and accessed using intrinsics. e.g., >> >>>>> %threadIdx.x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x(). >> >>>>> >> >>>>> One possible approach is to have ValueTracking compute the known >> >>>>> bits of these intrinsics as special cases. This approach is >> >>>>> already >> >>>>> taken for the x86_sse42_crc32_64_64 intrinsic. However, this >> >>>>> approach may not be elegant because the ranges of these CUDA >> >>>>> special >> >>>>> registers depend on the GPU compute capability specified by >> >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in >> >>>>> sm_20 >> >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is >> >>>>> probably discouraged. >> >>>>> >> >>>>> Therefore, the approach I am considering is to have clang >> >>>>> annotate >> >>>>> the ranges of these CUDA special registers according to the >> >>>>> -target-cpu flag, and have ValueTracking pick the range metadata >> >>>>> for >> >>>>> optimization. By doing so, we hide the target-specific info from >> >>>>> ValueTracking. >> >>>>> >> >>>>> The code change in llvm minus clang won't be large. The core >> >>>>> change >> >>>>> is only a few lines: >> >>>>> http://reviews.llvm.org/differential/diff/10464/. If this >> >>>>> extension >> >>>>> sounds good to you, I'll definitely add more tests and revise >> >>>>> the >> >>>>> documents on range metadata. >> >>>>> >> >>>>> Best, >> >>>>> Jingyue >> >>>>> >> >>>>> _______________________________________________ >> >>>>> LLVM Developers mailing list >> >>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >> >>>>> http://llvm.cs.uiuc.edu >> >>>>> >> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> LLVM Developers mailing list >> >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> LLVM Developers mailing list >> >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >>> >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > >
Possibly Parallel Threads
- [LLVMdev] Attaching range metadata to IntrinsicInst
- [LLVMdev] Attaching range metadata to IntrinsicInst
- [LLVMdev] Attaching range metadata to IntrinsicInst
- [LLVMdev] Attaching range metadata to IntrinsicInst
- [CUDA/NVPTX] is inlining __syncthreads allowed?