thr3ads.net - llvm dev - [LLVMdev] Attaching range metadata to IntrinsicInst [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Eric Christopher

2014-Jun-17 21:43 UTC

[LLVMdev] Attaching range metadata to IntrinsicInst

On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com>
wrote:> Hi Eric,
>
> In the IR, besides "target datalayout" and "target
triple", we have a
> special "target cpu" string which is set by the Clang front-end
according to
> its -target-cpu flag. We also write a Module::getTargetCPU() method to
> retrieve this string from the IR.
>
Not sure that I like this. Each function can have a target cpu though.
That each subtarget cares about the value of the target cpu for how
the intrinsic works sounds a lot like TargetTransformInfo to me.

-eric
> Jingyue
>
>
> On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>>
>> Eh? How do you envision this?
>>
>> -eric
>>
>> On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at
google.com> wrote:
>> > Hi Nick,
>> >
>> > That makes sense. I think a main issue here is that the ranges of
these
>> > PTX
>> > special registers (e.g., threadIdx.x) depend on -target-cpu which
is
>> > only
>> > visible to clang and llc. Would you mind we specify "target
cpu" in the
>> > IR
>> > similar to what we did for "target triple"?
>> >
>> > Thanks,
>> > Jingyue
>> >
>> >
>> > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky <nlewycky at
google.com>
>> > wrote:
>> >>
>> >> On 17 June 2014 06:41, Eli Bendersky <eliben at
google.com> wrote:
>> >>>
>> >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas
at mxc.ca> wrote:
>> >>>>
>> >>>> Chandler Carruth wrote:
>> >>>>>
>> >>>>> This seems fine to me, but I'd like to make
sure it looks OK to Nick
>> >>>>> as
>> >>>>> well.
>> >>>>
>> >>>>
>> >>>> I strongly prefer baking in knowledge about the
intrinsics themselves
>> >>>> into the passes if possible. Metadata will always be
secondary.
>> >>>
>> >>>
>> >>> So you're saying that in this particular case
you'd prefer LLVM passes
>> >>> to
>> >>> know about the range of these PTX intrinsics, rather than
Clang adding
>> >>> them
>> >>> as metadata?
>> >>
>> >>
>> >> Yep.
>> >>
>> >>> ValueTracking.cpp already has some iffy target knowledge
(someone
>> >>> sneaked
>> >>> a direct  Intrinsic::x86_sse42_crc32_64_64 check in
there), but
>> >>> extending it
>> >>> to other intrinsics in other targets seems like too
much...
>> >>
>> >>
>> >> That's not iffy. That's exactly how it should work,
and we should have
>> >> more of that. There is a major gotcha and that's dealing
with the case
>> >> where
>> >> the intrinsics don't exist because the backend wasn't
compiled in. If
>> >> x86_sse42_crc32_64_64 is in there (and also in instcombine
btw),
>> >> presumably
>> >> that problem is solved somehow? Or does llvm actually not
build if you
>> >> don't
>> >> enable the x86 target? I feel like we would've heard about
that.
>> >>
>> >> Nick
>> >>
>> >>> So should target info be passed into it in some way? Any
suggestions
>> >>> where to put it? TargetLibraryInfo? TargetTransformInfo?
In any case
>> >>> this
>> >>> seems like the target interface will have to be augmented,
and we'll
>> >>> have to
>> >>> carry an object around into ValueTracking's compute*
functions. If
>> >>> this is
>> >>> the right way, then this is the way it will be done -
design ideas are
>> >>> appreciated.
>> >>>
>> >>> Eli
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>>
>> >>>> Separately, should value tracking look use range
metadata when it's
>> >>>> available? Absolutely.
>> >>>>
>> >>>> I think it should apply to all CallInst not just
IntrinsicInst (which
>> >>>> is
>> >>>> derived from CallInst).
>> >>>>
>> >>>> Nick
>> >>>>
>> >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue Wu
<jingyue at google.com
>> >>>>> <mailto:jingyue at google.com>> wrote:
>> >>>>>
>> >>>>>     Hi,
>> >>>>>
>> >>>>>     The range metadata can only be attached to
LoadInst for now. I
>> >>>>> am
>> >>>>>     considering extending its usage to
IntrinsicInst so that the
>> >>>>>     frontend can annotate the range of the return
value of an
>> >>>>> intrinsic
>> >>>>>     call. e.g.,
>> >>>>>     %a = call i32 @llvm.xxx(), !range !0
>> >>>>>     !0 = metadata !{ i32 0, i23 1024 }
>> >>>>>
>> >>>>>     The motivation behind this extension is some
optimizations we
>> >>>>> are
>> >>>>>     working on for CUDA programs. Some special
registers in CUDA
>> >>>>> (e.g.,
>> >>>>>     threadIdx.x) are bounded per CUDA programming
guide, and knowing
>> >>>>>     their ranges can improve the precision of
ValueTracking and
>> >>>>> benefit
>> >>>>>     optimizations such as InstCombine.
>> >>>>>
>> >>>>>     To implement this idea, we need ValueTracking
to be aware of the
>> >>>>>     ranges of these special variables. These
special registers are
>> >>>>> so
>> >>>>>     far read-only and accessed using intrinsics.
e.g.,
>> >>>>>     %threadIdx.x = call i32
@llvm.nvvm.read.ptx.sreg.tid.x().
>> >>>>>
>> >>>>>     One possible approach is to have ValueTracking
compute the known
>> >>>>>     bits of these intrinsics as special cases.
This approach is
>> >>>>> already
>> >>>>>     taken for the x86_sse42_crc32_64_64 intrinsic.
However, this
>> >>>>>     approach may not be elegant because the ranges
of these CUDA
>> >>>>> special
>> >>>>>     registers depend on the GPU compute capability
specified by
>> >>>>>     -target-cpu. For instance, blockIdx.x is
bounded by 65535 in
>> >>>>> sm_20
>> >>>>>     but 2^31-1 in sm_30. Exposing -target-cpu to
ValueTracking is
>> >>>>>     probably discouraged.
>> >>>>>
>> >>>>>     Therefore, the approach I am considering is to
have clang
>> >>>>> annotate
>> >>>>>     the ranges of these CUDA special registers
according to the
>> >>>>>     -target-cpu flag, and have ValueTracking pick
the range metadata
>> >>>>> for
>> >>>>>     optimization. By doing so, we hide the
target-specific info from
>> >>>>>     ValueTracking.
>> >>>>>
>> >>>>>     The code change in llvm minus clang won't
be large. The core
>> >>>>> change
>> >>>>>     is only a few lines:
>> >>>>>    
http://reviews.llvm.org/differential/diff/10464/. If this
>> >>>>> extension
>> >>>>>     sounds good to you, I'll definitely add
more tests and revise
>> >>>>> the
>> >>>>>     documents on range metadata.
>> >>>>>
>> >>>>>     Best,
>> >>>>>     Jingyue
>> >>>>>
>> >>>>>    
_______________________________________________
>> >>>>>     LLVM Developers mailing list
>> >>>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>
>> >>>>> http://llvm.cs.uiuc.edu
>> >>>>>
>> >>>>>    
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> LLVM Developers mailing list
>> >>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> LLVM Developers mailing list
>> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>
>

Hal Finkel

2014-Jun-17 22:04 UTC

head link

[LLVMdev] Attaching range metadata to IntrinsicInst

----- Original Message -----> From: "Eric Christopher" <echristo at gmail.com>
> To: "Jingyue Wu" <jingyue at google.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, June 17, 2014 4:43:28 PM
> Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst
> 
> On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com>
> wrote:
> > Hi Eric,
> >
> > In the IR, besides "target datalayout" and "target
triple", we have
> > a
> > special "target cpu" string which is set by the Clang
front-end
> > according to
> > its -target-cpu flag. We also write a Module::getTargetCPU() method
> > to
> > retrieve this string from the IR.
> >
> 
> Not sure that I like this. Each function can have a target cpu
> though.
> That each subtarget cares about the value of the target cpu for how
> the intrinsic works sounds a lot like TargetTransformInfo to me.
> 
> -eric
I also think we should avoid this; as I said earlier, ValueTracking is used
during canonicalization, and the community consensus seems to be to try, to the
extent possible, to make this canonical form backend independent (even for
intrinsics). Having Clang add the range metadata seems preferable in this case
(and, as a side effect, gives us a new generally-useful capability).

 -Hal
> 
> > Jingyue
> >
> >
> > On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher
> > <echristo at gmail.com>
> > wrote:
> >>
> >> Eh? How do you envision this?
> >>
> >> -eric
> >>
> >> On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at
google.com>
> >> wrote:
> >> > Hi Nick,
> >> >
> >> > That makes sense. I think a main issue here is that the
ranges
> >> > of these
> >> > PTX
> >> > special registers (e.g., threadIdx.x) depend on -target-cpu
> >> > which is
> >> > only
> >> > visible to clang and llc. Would you mind we specify
"target cpu"
> >> > in the
> >> > IR
> >> > similar to what we did for "target triple"?
> >> >
> >> > Thanks,
> >> > Jingyue
> >> >
> >> >
> >> > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky
> >> > <nlewycky at google.com>
> >> > wrote:
> >> >>
> >> >> On 17 June 2014 06:41, Eli Bendersky <eliben at
google.com> wrote:
> >> >>>
> >> >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky
> >> >>> <nicholas at mxc.ca> wrote:
> >> >>>>
> >> >>>> Chandler Carruth wrote:
> >> >>>>>
> >> >>>>> This seems fine to me, but I'd like to
make sure it looks OK
> >> >>>>> to Nick
> >> >>>>> as
> >> >>>>> well.
> >> >>>>
> >> >>>>
> >> >>>> I strongly prefer baking in knowledge about the
intrinsics
> >> >>>> themselves
> >> >>>> into the passes if possible. Metadata will always
be
> >> >>>> secondary.
> >> >>>
> >> >>>
> >> >>> So you're saying that in this particular case
you'd prefer
> >> >>> LLVM passes
> >> >>> to
> >> >>> know about the range of these PTX intrinsics, rather
than
> >> >>> Clang adding
> >> >>> them
> >> >>> as metadata?
> >> >>
> >> >>
> >> >> Yep.
> >> >>
> >> >>> ValueTracking.cpp already has some iffy target
knowledge
> >> >>> (someone
> >> >>> sneaked
> >> >>> a direct  Intrinsic::x86_sse42_crc32_64_64 check in
there),
> >> >>> but
> >> >>> extending it
> >> >>> to other intrinsics in other targets seems like too
much...
> >> >>
> >> >>
> >> >> That's not iffy. That's exactly how it should
work, and we
> >> >> should have
> >> >> more of that. There is a major gotcha and that's
dealing with
> >> >> the case
> >> >> where
> >> >> the intrinsics don't exist because the backend
wasn't compiled
> >> >> in. If
> >> >> x86_sse42_crc32_64_64 is in there (and also in
instcombine
> >> >> btw),
> >> >> presumably
> >> >> that problem is solved somehow? Or does llvm actually not
build
> >> >> if you
> >> >> don't
> >> >> enable the x86 target? I feel like we would've heard
about
> >> >> that.
> >> >>
> >> >> Nick
> >> >>
> >> >>> So should target info be passed into it in some way?
Any
> >> >>> suggestions
> >> >>> where to put it? TargetLibraryInfo?
TargetTransformInfo? In
> >> >>> any case
> >> >>> this
> >> >>> seems like the target interface will have to be
augmented, and
> >> >>> we'll
> >> >>> have to
> >> >>> carry an object around into ValueTracking's
compute*
> >> >>> functions. If
> >> >>> this is
> >> >>> the right way, then this is the way it will be done -
design
> >> >>> ideas are
> >> >>> appreciated.
> >> >>>
> >> >>> Eli
> >> >>>
> >> >>>
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>> Separately, should value tracking look use range
metadata
> >> >>>> when it's
> >> >>>> available? Absolutely.
> >> >>>>
> >> >>>> I think it should apply to all CallInst not just
> >> >>>> IntrinsicInst (which
> >> >>>> is
> >> >>>> derived from CallInst).
> >> >>>>
> >> >>>> Nick
> >> >>>>
> >> >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue Wu
> >> >>>>> <jingyue at google.com
> >> >>>>> <mailto:jingyue at google.com>>
wrote:
> >> >>>>>
> >> >>>>>     Hi,
> >> >>>>>
> >> >>>>>     The range metadata can only be attached
to LoadInst for
> >> >>>>>     now. I
> >> >>>>> am
> >> >>>>>     considering extending its usage to
IntrinsicInst so that
> >> >>>>>     the
> >> >>>>>     frontend can annotate the range of the
return value of
> >> >>>>>     an
> >> >>>>> intrinsic
> >> >>>>>     call. e.g.,
> >> >>>>>     %a = call i32 @llvm.xxx(), !range !0
> >> >>>>>     !0 = metadata !{ i32 0, i23 1024 }
> >> >>>>>
> >> >>>>>     The motivation behind this extension is
some
> >> >>>>>     optimizations we
> >> >>>>> are
> >> >>>>>     working on for CUDA programs. Some
special registers in
> >> >>>>>     CUDA
> >> >>>>> (e.g.,
> >> >>>>>     threadIdx.x) are bounded per CUDA
programming guide, and
> >> >>>>>     knowing
> >> >>>>>     their ranges can improve the precision of
ValueTracking
> >> >>>>>     and
> >> >>>>> benefit
> >> >>>>>     optimizations such as InstCombine.
> >> >>>>>
> >> >>>>>     To implement this idea, we need
ValueTracking to be
> >> >>>>>     aware of the
> >> >>>>>     ranges of these special variables. These
special
> >> >>>>>     registers are
> >> >>>>> so
> >> >>>>>     far read-only and accessed using
intrinsics. e.g.,
> >> >>>>>     %threadIdx.x = call i32
> >> >>>>>     @llvm.nvvm.read.ptx.sreg.tid.x().
> >> >>>>>
> >> >>>>>     One possible approach is to have
ValueTracking compute
> >> >>>>>     the known
> >> >>>>>     bits of these intrinsics as special
cases. This approach
> >> >>>>>     is
> >> >>>>> already
> >> >>>>>     taken for the x86_sse42_crc32_64_64
intrinsic. However,
> >> >>>>>     this
> >> >>>>>     approach may not be elegant because the
ranges of these
> >> >>>>>     CUDA
> >> >>>>> special
> >> >>>>>     registers depend on the GPU compute
capability specified
> >> >>>>>     by
> >> >>>>>     -target-cpu. For instance, blockIdx.x is
bounded by
> >> >>>>>     65535 in
> >> >>>>> sm_20
> >> >>>>>     but 2^31-1 in sm_30. Exposing -target-cpu
to
> >> >>>>>     ValueTracking is
> >> >>>>>     probably discouraged.
> >> >>>>>
> >> >>>>>     Therefore, the approach I am considering
is to have
> >> >>>>>     clang
> >> >>>>> annotate
> >> >>>>>     the ranges of these CUDA special
registers according to
> >> >>>>>     the
> >> >>>>>     -target-cpu flag, and have ValueTracking
pick the range
> >> >>>>>     metadata
> >> >>>>> for
> >> >>>>>     optimization. By doing so, we hide the
target-specific
> >> >>>>>     info from
> >> >>>>>     ValueTracking.
> >> >>>>>
> >> >>>>>     The code change in llvm minus clang
won't be large. The
> >> >>>>>     core
> >> >>>>> change
> >> >>>>>     is only a few lines:
> >> >>>>>    
http://reviews.llvm.org/differential/diff/10464/. If
> >> >>>>>     this
> >> >>>>> extension
> >> >>>>>     sounds good to you, I'll definitely
add more tests and
> >> >>>>>     revise
> >> >>>>> the
> >> >>>>>     documents on range metadata.
> >> >>>>>
> >> >>>>>     Best,
> >> >>>>>     Jingyue
> >> >>>>>
> >> >>>>>    
_______________________________________________
> >> >>>>>     LLVM Developers mailing list
> >> >>>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev
at cs.uiuc.edu>
> >> >>>>> http://llvm.cs.uiuc.edu
> >> >>>>>
> >> >>>>>    
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
_______________________________________________
> >> >>>>> LLVM Developers mailing list
> >> >>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >> >>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>>>
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> LLVM Developers mailing list
> >> >>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>>
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> LLVM Developers mailing list
> >> >>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Eric Christopher

2014-Jun-17 22:27 UTC

head link

[LLVMdev] Attaching range metadata to IntrinsicInst

On Tue, Jun 17, 2014 at 3:04 PM, Hal Finkel <hfinkel at anl.gov>
wrote:> ----- Original Message -----
>> From: "Eric Christopher" <echristo at gmail.com>
>> To: "Jingyue Wu" <jingyue at google.com>
>> Cc: "LLVM Developers Mailing List" <llvmdev at
cs.uiuc.edu>
>> Sent: Tuesday, June 17, 2014 4:43:28 PM
>> Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst
>>
>> On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at
google.com>
>> wrote:
>> > Hi Eric,
>> >
>> > In the IR, besides "target datalayout" and "target
triple", we have
>> > a
>> > special "target cpu" string which is set by the Clang
front-end
>> > according to
>> > its -target-cpu flag. We also write a Module::getTargetCPU()
method
>> > to
>> > retrieve this string from the IR.
>> >
>>
>> Not sure that I like this. Each function can have a target cpu
>> though.
>> That each subtarget cares about the value of the target cpu for how
>> the intrinsic works sounds a lot like TargetTransformInfo to me.
>>
>> -eric
>
> I also think we should avoid this; as I said earlier, ValueTracking is used
during canonicalization, and the community consensus seems to be to try, to the
extent possible, to make this canonical form backend independent (even for
intrinsics). Having Clang add the range metadata seems preferable in this case
(and, as a side effect, gives us a new generally-useful capability).
>
Sure. No objections to that solution either.

-eric
>  -Hal
>
>>
>> > Jingyue
>> >
>> >
>> > On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher
>> > <echristo at gmail.com>
>> > wrote:
>> >>
>> >> Eh? How do you envision this?
>> >>
>> >> -eric
>> >>
>> >> On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at
google.com>
>> >> wrote:
>> >> > Hi Nick,
>> >> >
>> >> > That makes sense. I think a main issue here is that the
ranges
>> >> > of these
>> >> > PTX
>> >> > special registers (e.g., threadIdx.x) depend on
-target-cpu
>> >> > which is
>> >> > only
>> >> > visible to clang and llc. Would you mind we specify
"target cpu"
>> >> > in the
>> >> > IR
>> >> > similar to what we did for "target triple"?
>> >> >
>> >> > Thanks,
>> >> > Jingyue
>> >> >
>> >> >
>> >> > On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky
>> >> > <nlewycky at google.com>
>> >> > wrote:
>> >> >>
>> >> >> On 17 June 2014 06:41, Eli Bendersky <eliben at
google.com> wrote:
>> >> >>>
>> >> >>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky
>> >> >>> <nicholas at mxc.ca> wrote:
>> >> >>>>
>> >> >>>> Chandler Carruth wrote:
>> >> >>>>>
>> >> >>>>> This seems fine to me, but I'd like
to make sure it looks OK
>> >> >>>>> to Nick
>> >> >>>>> as
>> >> >>>>> well.
>> >> >>>>
>> >> >>>>
>> >> >>>> I strongly prefer baking in knowledge about
the intrinsics
>> >> >>>> themselves
>> >> >>>> into the passes if possible. Metadata will
always be
>> >> >>>> secondary.
>> >> >>>
>> >> >>>
>> >> >>> So you're saying that in this particular case
you'd prefer
>> >> >>> LLVM passes
>> >> >>> to
>> >> >>> know about the range of these PTX intrinsics,
rather than
>> >> >>> Clang adding
>> >> >>> them
>> >> >>> as metadata?
>> >> >>
>> >> >>
>> >> >> Yep.
>> >> >>
>> >> >>> ValueTracking.cpp already has some iffy target
knowledge
>> >> >>> (someone
>> >> >>> sneaked
>> >> >>> a direct  Intrinsic::x86_sse42_crc32_64_64 check
in there),
>> >> >>> but
>> >> >>> extending it
>> >> >>> to other intrinsics in other targets seems like
too much...
>> >> >>
>> >> >>
>> >> >> That's not iffy. That's exactly how it should
work, and we
>> >> >> should have
>> >> >> more of that. There is a major gotcha and that's
dealing with
>> >> >> the case
>> >> >> where
>> >> >> the intrinsics don't exist because the backend
wasn't compiled
>> >> >> in. If
>> >> >> x86_sse42_crc32_64_64 is in there (and also in
instcombine
>> >> >> btw),
>> >> >> presumably
>> >> >> that problem is solved somehow? Or does llvm actually
not build
>> >> >> if you
>> >> >> don't
>> >> >> enable the x86 target? I feel like we would've
heard about
>> >> >> that.
>> >> >>
>> >> >> Nick
>> >> >>
>> >> >>> So should target info be passed into it in some
way? Any
>> >> >>> suggestions
>> >> >>> where to put it? TargetLibraryInfo?
TargetTransformInfo? In
>> >> >>> any case
>> >> >>> this
>> >> >>> seems like the target interface will have to be
augmented, and
>> >> >>> we'll
>> >> >>> have to
>> >> >>> carry an object around into ValueTracking's
compute*
>> >> >>> functions. If
>> >> >>> this is
>> >> >>> the right way, then this is the way it will be
done - design
>> >> >>> ideas are
>> >> >>> appreciated.
>> >> >>>
>> >> >>> Eli
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Separately, should value tracking look use
range metadata
>> >> >>>> when it's
>> >> >>>> available? Absolutely.
>> >> >>>>
>> >> >>>> I think it should apply to all CallInst not
just
>> >> >>>> IntrinsicInst (which
>> >> >>>> is
>> >> >>>> derived from CallInst).
>> >> >>>>
>> >> >>>> Nick
>> >> >>>>
>> >> >>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue
Wu
>> >> >>>>> <jingyue at google.com
>> >> >>>>> <mailto:jingyue at google.com>>
wrote:
>> >> >>>>>
>> >> >>>>>     Hi,
>> >> >>>>>
>> >> >>>>>     The range metadata can only be
attached to LoadInst for
>> >> >>>>>     now. I
>> >> >>>>> am
>> >> >>>>>     considering extending its usage to
IntrinsicInst so that
>> >> >>>>>     the
>> >> >>>>>     frontend can annotate the range of
the return value of
>> >> >>>>>     an
>> >> >>>>> intrinsic
>> >> >>>>>     call. e.g.,
>> >> >>>>>     %a = call i32 @llvm.xxx(), !range !0
>> >> >>>>>     !0 = metadata !{ i32 0, i23 1024 }
>> >> >>>>>
>> >> >>>>>     The motivation behind this extension
is some
>> >> >>>>>     optimizations we
>> >> >>>>> are
>> >> >>>>>     working on for CUDA programs. Some
special registers in
>> >> >>>>>     CUDA
>> >> >>>>> (e.g.,
>> >> >>>>>     threadIdx.x) are bounded per CUDA
programming guide, and
>> >> >>>>>     knowing
>> >> >>>>>     their ranges can improve the
precision of ValueTracking
>> >> >>>>>     and
>> >> >>>>> benefit
>> >> >>>>>     optimizations such as InstCombine.
>> >> >>>>>
>> >> >>>>>     To implement this idea, we need
ValueTracking to be
>> >> >>>>>     aware of the
>> >> >>>>>     ranges of these special variables.
These special
>> >> >>>>>     registers are
>> >> >>>>> so
>> >> >>>>>     far read-only and accessed using
intrinsics. e.g.,
>> >> >>>>>     %threadIdx.x = call i32
>> >> >>>>>     @llvm.nvvm.read.ptx.sreg.tid.x().
>> >> >>>>>
>> >> >>>>>     One possible approach is to have
ValueTracking compute
>> >> >>>>>     the known
>> >> >>>>>     bits of these intrinsics as special
cases. This approach
>> >> >>>>>     is
>> >> >>>>> already
>> >> >>>>>     taken for the x86_sse42_crc32_64_64
intrinsic. However,
>> >> >>>>>     this
>> >> >>>>>     approach may not be elegant because
the ranges of these
>> >> >>>>>     CUDA
>> >> >>>>> special
>> >> >>>>>     registers depend on the GPU compute
capability specified
>> >> >>>>>     by
>> >> >>>>>     -target-cpu. For instance, blockIdx.x
is bounded by
>> >> >>>>>     65535 in
>> >> >>>>> sm_20
>> >> >>>>>     but 2^31-1 in sm_30. Exposing
-target-cpu to
>> >> >>>>>     ValueTracking is
>> >> >>>>>     probably discouraged.
>> >> >>>>>
>> >> >>>>>     Therefore, the approach I am
considering is to have
>> >> >>>>>     clang
>> >> >>>>> annotate
>> >> >>>>>     the ranges of these CUDA special
registers according to
>> >> >>>>>     the
>> >> >>>>>     -target-cpu flag, and have
ValueTracking pick the range
>> >> >>>>>     metadata
>> >> >>>>> for
>> >> >>>>>     optimization. By doing so, we hide
the target-specific
>> >> >>>>>     info from
>> >> >>>>>     ValueTracking.
>> >> >>>>>
>> >> >>>>>     The code change in llvm minus clang
won't be large. The
>> >> >>>>>     core
>> >> >>>>> change
>> >> >>>>>     is only a few lines:
>> >> >>>>>    
http://reviews.llvm.org/differential/diff/10464/. If
>> >> >>>>>     this
>> >> >>>>> extension
>> >> >>>>>     sounds good to you, I'll
definitely add more tests and
>> >> >>>>>     revise
>> >> >>>>> the
>> >> >>>>>     documents on range metadata.
>> >> >>>>>
>> >> >>>>>     Best,
>> >> >>>>>     Jingyue
>> >> >>>>>
>> >> >>>>>    
_______________________________________________
>> >> >>>>>     LLVM Developers mailing list
>> >> >>>>>     LLVMdev at cs.uiuc.edu
<mailto:LLVMdev at cs.uiuc.edu>
>> >> >>>>> http://llvm.cs.uiuc.edu
>> >> >>>>>
>> >> >>>>>    
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
_______________________________________________
>> >> >>>>> LLVM Developers mailing list
>> >> >>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >> >>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>>
>> >> >>>>
>> >> >>>>
_______________________________________________
>> >> >>>> LLVM Developers mailing list
>> >> >>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >> >>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> LLVM Developers mailing list
>> >> >>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> LLVM Developers mailing list
>> >> >> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >
>> >
>> >
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

Nick Lewycky

2014-Jun-18 19:57 UTC

head link

[LLVMdev] Attaching range metadata to IntrinsicInst

Hal Finkel wrote:> ----- Original Message -----
>> From: "Eric Christopher"<echristo at gmail.com>
>> To: "Jingyue Wu"<jingyue at google.com>
>> Cc: "LLVM Developers Mailing List"<llvmdev at
cs.uiuc.edu>
>> Sent: Tuesday, June 17, 2014 4:43:28 PM
>> Subject: Re: [LLVMdev] Attaching range metadata to IntrinsicInst
>>
>> On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu<jingyue at
google.com>
>> wrote:
>>> Hi Eric,
>>>
>>> In the IR, besides "target datalayout" and "target
triple", we have
>>> a
>>> special "target cpu" string which is set by the Clang
front-end
>>> according to
>>> its -target-cpu flag. We also write a Module::getTargetCPU() method
>>> to
>>> retrieve this string from the IR.
>>>
>>
>> Not sure that I like this. Each function can have a target cpu
>> though.
>> That each subtarget cares about the value of the target cpu for how
>> the intrinsic works sounds a lot like TargetTransformInfo to me.
>>
>> -eric
>
> I also think we should avoid this; as I said earlier, ValueTracking is used
during canonicalization, and the community consensus seems to be to try, to the
extent possible, to make this canonical form backend independent (even for
intrinsics).
The general optimizer parts recognize many functions for the behaviour 
they're guaranteed to have, from malloc to strlen to sqrt. Before we had 
a class that would answer the question "is the function named
'malloc'
really malloc", we relied on intrinsics to do this sort of thing. 
Grabbing target intrinsics and making the most of them is safe in the 
optimizer.

It may be a little weird that you can build llvm to only target one 
target and have IR that uses intrinsics for all targets, or even x86 
mips and arm intrinsics in a single function, but the mid-level 
optimizer doesn't need to care.

The distinction I'm drawing is that it's correct that the canonicalizers
shouldn't need to care what backend is being targeted, which is 
different from saying that they only work on target-independent IR. They 
can optimize target-specific intrinsics without needing to care which 
backend will be used later, so long as they don't produce such intrinsics.

Nick

  Having Clang add the range metadata seems preferable in this case 
(and, as a side effect, gives us a new generally-useful
capability).>
>   -Hal
>
>>
>>> Jingyue
>>>
>>>
>>> On Tue, Jun 17, 2014 at 2:22 PM, Eric Christopher
>>> <echristo at gmail.com>
>>> wrote:
>>>>
>>>> Eh? How do you envision this?
>>>>
>>>> -eric
>>>>
>>>> On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu<jingyue at
google.com>
>>>> wrote:
>>>>> Hi Nick,
>>>>>
>>>>> That makes sense. I think a main issue here is that the
ranges
>>>>> of these
>>>>> PTX
>>>>> special registers (e.g., threadIdx.x) depend on -target-cpu
>>>>> which is
>>>>> only
>>>>> visible to clang and llc. Would you mind we specify
"target cpu"
>>>>> in the
>>>>> IR
>>>>> similar to what we did for "target triple"?
>>>>>
>>>>> Thanks,
>>>>> Jingyue
>>>>>
>>>>>
>>>>> On Tue, Jun 17, 2014 at 12:19 PM, Nick Lewycky
>>>>> <nlewycky at google.com>
>>>>> wrote:
>>>>>>
>>>>>> On 17 June 2014 06:41, Eli Bendersky<eliben at
google.com>  wrote:
>>>>>>>
>>>>>>> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky
>>>>>>> <nicholas at mxc.ca>  wrote:
>>>>>>>>
>>>>>>>> Chandler Carruth wrote:
>>>>>>>>>
>>>>>>>>> This seems fine to me, but I'd like to
make sure it looks OK
>>>>>>>>> to Nick
>>>>>>>>> as
>>>>>>>>> well.
>>>>>>>>
>>>>>>>>
>>>>>>>> I strongly prefer baking in knowledge about the
intrinsics
>>>>>>>> themselves
>>>>>>>> into the passes if possible. Metadata will
always be
>>>>>>>> secondary.
>>>>>>>
>>>>>>>
>>>>>>> So you're saying that in this particular case
you'd prefer
>>>>>>> LLVM passes
>>>>>>> to
>>>>>>> know about the range of these PTX intrinsics,
rather than
>>>>>>> Clang adding
>>>>>>> them
>>>>>>> as metadata?
>>>>>>
>>>>>>
>>>>>> Yep.
>>>>>>
>>>>>>> ValueTracking.cpp already has some iffy target
knowledge
>>>>>>> (someone
>>>>>>> sneaked
>>>>>>> a direct  Intrinsic::x86_sse42_crc32_64_64 check in
there),
>>>>>>> but
>>>>>>> extending it
>>>>>>> to other intrinsics in other targets seems like too
much...
>>>>>>
>>>>>>
>>>>>> That's not iffy. That's exactly how it should
work, and we
>>>>>> should have
>>>>>> more of that. There is a major gotcha and that's
dealing with
>>>>>> the case
>>>>>> where
>>>>>> the intrinsics don't exist because the backend
wasn't compiled
>>>>>> in. If
>>>>>> x86_sse42_crc32_64_64 is in there (and also in
instcombine
>>>>>> btw),
>>>>>> presumably
>>>>>> that problem is solved somehow? Or does llvm actually
not build
>>>>>> if you
>>>>>> don't
>>>>>> enable the x86 target? I feel like we would've
heard about
>>>>>> that.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>>> So should target info be passed into it in some
way? Any
>>>>>>> suggestions
>>>>>>> where to put it? TargetLibraryInfo?
TargetTransformInfo? In
>>>>>>> any case
>>>>>>> this
>>>>>>> seems like the target interface will have to be
augmented, and
>>>>>>> we'll
>>>>>>> have to
>>>>>>> carry an object around into ValueTracking's
compute*
>>>>>>> functions. If
>>>>>>> this is
>>>>>>> the right way, then this is the way it will be done
- design
>>>>>>> ideas are
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Eli
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Separately, should value tracking look use
range metadata
>>>>>>>> when it's
>>>>>>>> available? Absolutely.
>>>>>>>>
>>>>>>>> I think it should apply to all CallInst not
just
>>>>>>>> IntrinsicInst (which
>>>>>>>> is
>>>>>>>> derived from CallInst).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>>> On Tue, Jun 17, 2014 at 12:37 AM, Jingyue
Wu
>>>>>>>>> <jingyue at google.com
>>>>>>>>> <mailto:jingyue at google.com>> 
wrote:
>>>>>>>>>
>>>>>>>>>      Hi,
>>>>>>>>>
>>>>>>>>>      The range metadata can only be
attached to LoadInst for
>>>>>>>>>      now. I
>>>>>>>>> am
>>>>>>>>>      considering extending its usage to
IntrinsicInst so that
>>>>>>>>>      the
>>>>>>>>>      frontend can annotate the range of the
return value of
>>>>>>>>>      an
>>>>>>>>> intrinsic
>>>>>>>>>      call. e.g.,
>>>>>>>>>      %a = call i32 @llvm.xxx(), !range !0
>>>>>>>>>      !0 = metadata !{ i32 0, i23 1024 }
>>>>>>>>>
>>>>>>>>>      The motivation behind this extension
is some
>>>>>>>>>      optimizations we
>>>>>>>>> are
>>>>>>>>>      working on for CUDA programs. Some
special registers in
>>>>>>>>>      CUDA
>>>>>>>>> (e.g.,
>>>>>>>>>      threadIdx.x) are bounded per CUDA
programming guide, and
>>>>>>>>>      knowing
>>>>>>>>>      their ranges can improve the precision
of ValueTracking
>>>>>>>>>      and
>>>>>>>>> benefit
>>>>>>>>>      optimizations such as InstCombine.
>>>>>>>>>
>>>>>>>>>      To implement this idea, we need
ValueTracking to be
>>>>>>>>>      aware of the
>>>>>>>>>      ranges of these special variables.
These special
>>>>>>>>>      registers are
>>>>>>>>> so
>>>>>>>>>      far read-only and accessed using
intrinsics. e.g.,
>>>>>>>>>      %threadIdx.x = call i32
>>>>>>>>>      @llvm.nvvm.read.ptx.sreg.tid.x().
>>>>>>>>>
>>>>>>>>>      One possible approach is to have
ValueTracking compute
>>>>>>>>>      the known
>>>>>>>>>      bits of these intrinsics as special
cases. This approach
>>>>>>>>>      is
>>>>>>>>> already
>>>>>>>>>      taken for the x86_sse42_crc32_64_64
intrinsic. However,
>>>>>>>>>      this
>>>>>>>>>      approach may not be elegant because
the ranges of these
>>>>>>>>>      CUDA
>>>>>>>>> special
>>>>>>>>>      registers depend on the GPU compute
capability specified
>>>>>>>>>      by
>>>>>>>>>      -target-cpu. For instance, blockIdx.x
is bounded by
>>>>>>>>>      65535 in
>>>>>>>>> sm_20
>>>>>>>>>      but 2^31-1 in sm_30. Exposing
-target-cpu to
>>>>>>>>>      ValueTracking is
>>>>>>>>>      probably discouraged.
>>>>>>>>>
>>>>>>>>>      Therefore, the approach I am
considering is to have
>>>>>>>>>      clang
>>>>>>>>> annotate
>>>>>>>>>      the ranges of these CUDA special
registers according to
>>>>>>>>>      the
>>>>>>>>>      -target-cpu flag, and have
ValueTracking pick the range
>>>>>>>>>      metadata
>>>>>>>>> for
>>>>>>>>>      optimization. By doing so, we hide the
target-specific
>>>>>>>>>      info from
>>>>>>>>>      ValueTracking.
>>>>>>>>>
>>>>>>>>>      The code change in llvm minus clang
won't be large. The
>>>>>>>>>      core
>>>>>>>>> change
>>>>>>>>>      is only a few lines:
>>>>>>>>>     
http://reviews.llvm.org/differential/diff/10464/. If
>>>>>>>>>      this
>>>>>>>>> extension
>>>>>>>>>      sounds good to you, I'll
definitely add more tests and
>>>>>>>>>      revise
>>>>>>>>> the
>>>>>>>>>      documents on range metadata.
>>>>>>>>>
>>>>>>>>>      Best,
>>>>>>>>>      Jingyue
>>>>>>>>>
>>>>>>>>>     
_______________________________________________
>>>>>>>>>      LLVM Developers mailing list
>>>>>>>>>      LLVMdev at
cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>
>>>>>>>>> http://llvm.cs.uiuc.edu
>>>>>>>>>
>>>>>>>>>     
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

llvm dev - Jun 2014 - [LLVMdev] Attaching range metadata to IntrinsicInst

[LLVMdev] Attaching range metadata to IntrinsicInst

[LLVMdev] Attaching range metadata to IntrinsicInst

[LLVMdev] Attaching range metadata to IntrinsicInst

[LLVMdev] Attaching range metadata to IntrinsicInst