thr3ads.net - llvm dev - [LLVMdev] introducing sign extending halfword loads into the LLVM IR [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Bjorn De Sutter

2013-Jan-21 14:16 UTC

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

On 21 Jan 2013, at 14:39, Justin Holewinski <justin.holewinski at
gmail.com> wrote:
> Instruction selection happens on a different IR: SelectionDAG.  In this IR,
there are sign-extending loads that the IR converter will use, and are used for
example to load 8/16-bit values into 32-bit registers (with sign or zero
extension).  Any optimizations performed during codegen will be in this
representation, or even MachineInstr form, which is post-isel and any
sign-extension information will already be folded into the machine instruction.
> 
> The problem with doing machine-level analysis on LLVM IR is that there is
no guarantee that LLVM IR will be an accurate representation of what the final
code will look like instruction-wise.  LLVM IR can express many operations that
are not legal for a given target and so must be expanded into more than one
operation.  Or the target may support combination instructions that allow
multiple LLVM IR instructions to be folded into one machine instruction.  All of
these are handled during SelectionDAG generation and instruction selection, and
complicate LLVM IR-level analysis of machine-level characteristics.  For the
load example you give, on any architecture that supports sign-extending loads,
the load and sext will be combined into a single instruction (a "load ...
<sext from i16>" in SelectionDAG).
That is indeed the case, and that poses no problem. The problem is that
optimizations long before instruction selection suffer.
> 
> We could definitely try to capture more of the optimization cases you
mention, but I'm not sure adding a sign-extending load to the IR would buy
us much.  In what cases would a front-end choose to generate a 16-bit load
sign-extended to i32 instead of an i16 load?
The front-end doesn't need to choose. It could simply generate an i16 load
followed by a sign-extension (or nothing or a zero extension) as it does now.
But as a very early optimization in opt, we could merge the load and the
sign-extension into one sign-extending load operation such that the
sign-extension (that will be merged into the load in the backend anyway) does
not hinder other optimizations like I described.
> This seems like it would only generate ambiguity.  Generally, we don't
extend the core IR if something is already expressible.
> For what it's worth, the max detection could fairly easily be done in a
back-end isel pattern.
> 
You mean by enabling the recognition of patterns like select (cmp gt src1 src2)
(sign-extend src1) (sign-extend src2)?

In my current backend, but maybe that is wrong, the sign-extend instructions
that become part of the loads are at that point converted into copy operations
(as the src and dst operands are not the same). The superfluous copy operations
are then later removed. But does that mean that during instruction selection, I
have to start looking for patterns like select (cmp gt src1 src2) (copy src1)
(copy src2) to find opportunities for max/min operations? Or should that not be
needed?

Thanks. 

Bjorn

> 
> On Mon, Jan 21, 2013 at 5:32 AM, Bjorn De Sutter <bjorn.desutter at
elis.ugent.be> wrote:
> Hi all,
> 
> when compiling code like
> 
> short ptr * = some_address;
> int val;
> 
> val = *ptr;
> if (val>2047)
>    val = 2047;
> else if (val<-2048)
>    val = -2048.
> // other things done that require val to be an int ...
> 
> The load operation is represented by a load and a sign extension operation
in the LLVM IR. On most target architectures, there exist signed halfword load
instructions, so the load and sign extension are effectively translated into a
single instruction during instruction selection. Nonetheless, this sign
extension operation in the IR prohibits a lot of optimizations:
> 
> - it counts as an instruction in heuristics based on instruction counts
(such as SimplifyCFG); as a result some simplifications are not performed;
> - the sign extension operation gets combined with other operations. In the
example, the sign extension gets combined with the 32-bit comparison
implementing the val>-2048, resulting in a 16-bit comparison on the
non-extended value; the result is a comparison operation on 16-bit operands,
followed by a select operation on 32-bit operands:
> 
>  %0 = load i16* %arrayidx2, align 2, !dbg !502
>   %conv = sext i16 %0 to i32, !dbg !502
>   %cmp16 = icmp sgt i16 %0, 2047, !dbg !510
>   br i1 %cmp16, label %if.end23, label %if.else, !dbg !510
> 
> if.else:                                          ; preds = %for.body
>   %cmp19 = icmp slt i16 %0, -2048, !dbg !511                               
<--- 16-bit comparison
>   %.conv = select i1 %cmp19, i32 -2048, i32 %conv, !dbg !511         
<--- 32-bit select
>   br label %if.end23, !dbg !511
> 
> if.end23:                                         ; preds = %if.else,
%for.body
>   %val1.0 = phi i32 [ 2047, %for.body ], [ %.conv, %if.else ]
>   %conv24 = trunc i32 %val1.0 to i16, !dbg !512
>   store i16 %conv24, i16* %arrayidx2, align 2, !dbg !512
> 
> The problem with this is that during instruction selection, the pair of
comparison and select is no longer recognized as max operation because the
operands of the two operations are not the same.
> 
> It seems to me that these and other limitations on the applied
optimizations could easily be avoided by introducing a sign-extending halfword
load operation into the IR.
> 
> Any ideas on this? And how big of an effort that might be?
> 
> Thanks,
> 
> Bjorn De Sutter
> Computer Systems Lab
> Ghent University
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> -- 
> 
> Thanks,
> 
> Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/76cb9056/attachment.html>

Justin Holewinski

2013-Jan-21 14:34 UTC

head link

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

On Mon, Jan 21, 2013 at 9:16 AM, Bjorn De Sutter <
bjorn.desutter at elis.ugent.be> wrote:
> On 21 Jan 2013, at 14:39, Justin Holewinski <justin.holewinski at
gmail.com>
> wrote:
>
> Instruction selection happens on a different IR: SelectionDAG.  In this
> IR, there are sign-extending loads that the IR converter will use, and are
> used for example to load 8/16-bit values into 32-bit registers (with sign
> or zero extension).  Any optimizations performed during codegen will be in
> this representation, or even MachineInstr form, which is post-isel and any
> sign-extension information will already be folded into the machine
> instruction.
>
> The problem with doing machine-level analysis on LLVM IR is that there is
> no guarantee that LLVM IR will be an accurate representation of what the
> final code will look like instruction-wise.  LLVM IR can express many
> operations that are not legal for a given target and so must be expanded
> into more than one operation.  Or the target may support combination
> instructions that allow multiple LLVM IR instructions to be folded into one
> machine instruction.  All of these are handled during SelectionDAG
> generation and instruction selection, and complicate LLVM IR-level analysis
> of machine-level characteristics.  For the load example you give, on any
> architecture that supports sign-extending loads, the load and sext will be
> combined into a single instruction (a "load ... <sext from
i16>" in
> SelectionDAG).
>
>
> That is indeed the case, and that poses no problem. The problem is that
> optimizations long before instruction selection suffer.
>
>
> We could definitely try to capture more of the optimization cases you
> mention, but I'm not sure adding a sign-extending load to the IR would
buy
> us much.  In what cases would a front-end choose to generate a 16-bit load
> sign-extended to i32 instead of an i16 load?
>
>
> The front-end doesn't need to choose. It could simply generate an i16
load
> followed by a sign-extension (or nothing or a zero extension) as it does
> now. But as a very early optimization in opt, we could merge the load and
> the sign-extension into one sign-extending load operation such that the
> sign-extension (that will be merged into the load in the backend anyway)
> does not hinder other optimizations like I described.
>
Such a transformation would need to also insert a truncate to i16 to feed
the other 16-bit instructions in your IR.  Unless you upcast everything to
i32.

>
> This seems like it would only generate ambiguity.  Generally, we don't
> extend the core IR if something is already expressible.
>
> For what it's worth, the max detection could fairly easily be done in a
> back-end isel pattern.
>
>
> You mean by enabling the recognition of patterns like select (cmp gt src1
> src2) (sign-extend src1) (sign-extend src2)?
>
> In my current backend, but maybe that is wrong, the sign-extend
> instructions that become part of the loads are at that point converted into
> copy operations (as the src and dst operands are not the same). The
> superfluous copy operations are then later removed. But does that mean that
> during instruction selection, I have to start looking for patterns
> like select (cmp gt src1 src2) (copy src1) (copy src2) to find
> opportunities for max/min operations? Or should that not be needed?
>
I was thinking just "(select (icmp op val1 val2), val1, val2)", but
you're
right that any copies would get in the way.  Though thinking about it more,
why do you get copy operations?  The sext should be folded into the load,
and any uses of the sext would become uses of the load.

>
> Thanks.
>
> Bjorn
>
>
>
> On Mon, Jan 21, 2013 at 5:32 AM, Bjorn De Sutter <
> bjorn.desutter at elis.ugent.be> wrote:
>
>> Hi all,
>>
>> when compiling code like
>>
>> short ptr * = some_address;
>> int val;
>>
>> val = *ptr;
>> if (val>2047)
>>    val = 2047;
>> else if (val<-2048)
>>    val = -2048.
>> // other things done that require val to be an int ...
>>
>> The load operation is represented by a load and a sign extension
>> operation in the LLVM IR. On most target architectures, there exist
signed
>> halfword load instructions, so the load and sign extension are
effectively
>> translated into a single instruction during instruction selection.
>> Nonetheless, this sign extension operation in the IR prohibits a lot of
>> optimizations:
>>
>> - it counts as an instruction in heuristics based on instruction counts
>> (such as SimplifyCFG); as a result some simplifications are not
performed;
>> - the sign extension operation gets combined with other operations. In
>> the example, the sign extension gets combined with the 32-bit
comparison
>> implementing the val>-2048, resulting in a 16-bit comparison on the
>> non-extended value; the result is a comparison operation on 16-bit
>> operands, followed by a select operation on 32-bit operands:
>>
>>  %0 = load i16* %arrayidx2, align 2, !dbg !502
>>   %conv = sext i16 %0 to i32, !dbg !502
>>   %cmp16 = icmp sgt i16 %0, 2047, !dbg !510
>>   br i1 %cmp16, label %if.end23, label %if.else, !dbg !510
>>
>> if.else:                                          ; preds = %for.body
>>   %cmp19 = icmp slt i16 %0, -2048, !dbg !511
>>            <--- 16-bit comparison
>>   %.conv = select i1 %cmp19, i32 -2048, i32 %conv, !dbg !511
>>  <--- 32-bit select
>>   br label %if.end23, !dbg !511
>>
>> if.end23:                                         ; preds = %if.else,
>> %for.body
>>   %val1.0 = phi i32 [ 2047, %for.body ], [ %.conv, %if.else ]
>>   %conv24 = trunc i32 %val1.0 to i16, !dbg !512
>>   store i16 %conv24, i16* %arrayidx2, align 2, !dbg !512
>>
>> The problem with this is that during instruction selection, the pair of
>> comparison and select is no longer recognized as max operation because
the
>> operands of the two operations are not the same.
>>
>> It seems to me that these and other limitations on the applied
>> optimizations could easily be avoided by introducing a sign-extending
>> halfword load operation into the IR.
>>
>> Any ideas on this? And how big of an effort that might be?
>>
>> Thanks,
>>
>> Bjorn De Sutter
>> Computer Systems Lab
>> Ghent University
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
> --
>
> Thanks,
>
> Justin Holewinski
>
>
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/b4607518/attachment.html>

Jim Grosbach

2013-Jan-21 23:18 UTC

head link

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

On Jan 21, 2013, at 6:34 AM, Justin Holewinski <justin.holewinski at
gmail.com> wrote:
> 
> On Mon, Jan 21, 2013 at 9:16 AM, Bjorn De Sutter <bjorn.desutter at
elis.ugent.be> wrote:
> On 21 Jan 2013, at 14:39, Justin Holewinski <justin.holewinski at
gmail.com> wrote:
> 
>> Instruction selection happens on a different IR: SelectionDAG.  In this
IR, there are sign-extending loads that the IR converter will use, and are used
for example to load 8/16-bit values into 32-bit registers (with sign or zero
extension).  Any optimizations performed during codegen will be in this
representation, or even MachineInstr form, which is post-isel and any
sign-extension information will already be folded into the machine instruction.
>> 
>> The problem with doing machine-level analysis on LLVM IR is that there
is no guarantee that LLVM IR will be an accurate representation of what the
final code will look like instruction-wise.  LLVM IR can express many operations
that are not legal for a given target and so must be expanded into more than one
operation.  Or the target may support combination instructions that allow
multiple LLVM IR instructions to be folded into one machine instruction.  All of
these are handled during SelectionDAG generation and instruction selection, and
complicate LLVM IR-level analysis of machine-level characteristics.  For the
load example you give, on any architecture that supports sign-extending loads,
the load and sext will be combined into a single instruction (a "load ...
<sext from i16>" in SelectionDAG).
> 
> That is indeed the case, and that poses no problem. The problem is that
optimizations long before instruction selection suffer.
> 
>> 
>> We could definitely try to capture more of the optimization cases you
mention, but I'm not sure adding a sign-extending load to the IR would buy
us much.  In what cases would a front-end choose to generate a 16-bit load
sign-extended to i32 instead of an i16 load?
> 
> The front-end doesn't need to choose. It could simply generate an i16
load followed by a sign-extension (or nothing or a zero extension) as it does
now. But as a very early optimization in opt, we could merge the load and the
sign-extension into one sign-extending load operation such that the
sign-extension (that will be merged into the load in the backend anyway) does
not hinder other optimizations like I described.
> 
> Such a transformation would need to also insert a truncate to i16 to feed
the other 16-bit instructions in your IR.  Unless you upcast everything to i32.
>  
> 
>> This seems like it would only generate ambiguity.  Generally, we
don't extend the core IR if something is already expressible.
>> For what it's worth, the max detection could fairly easily be done
in a back-end isel pattern.
>> 
> 
> You mean by enabling the recognition of patterns like select (cmp gt src1
src2) (sign-extend src1) (sign-extend src2)?
> 
> In my current backend, but maybe that is wrong, the sign-extend
instructions that become part of the loads are at that point converted into copy
operations (as the src and dst operands are not the same). The superfluous copy
operations are then later removed. But does that mean that during instruction
selection, I have to start looking for patterns like select (cmp gt src1 src2)
(copy src1) (copy src2) to find opportunities for max/min operations? Or should
that not be needed?
> 
> I was thinking just "(select (icmp op val1 val2), val1, val2)",
but you're right that any copies would get in the way.  Though thinking
about it more, why do you get copy operations?  The sext should be folded into
the load, and any uses of the sext would become uses of the load.
That is odd. Lots of in-tree targets have sext-loads that do the right thing.
ARM and X86, for example.

-Jim
>  
> 
> Thanks. 
> 
> Bjorn
> 
> 
>> 
>> On Mon, Jan 21, 2013 at 5:32 AM, Bjorn De Sutter <bjorn.desutter at
elis.ugent.be> wrote:
>> Hi all,
>> 
>> when compiling code like
>> 
>> short ptr * = some_address;
>> int val;
>> 
>> val = *ptr;
>> if (val>2047)
>>    val = 2047;
>> else if (val<-2048)
>>    val = -2048.
>> // other things done that require val to be an int ...
>> 
>> The load operation is represented by a load and a sign extension
operation in the LLVM IR. On most target architectures, there exist signed
halfword load instructions, so the load and sign extension are effectively
translated into a single instruction during instruction selection. Nonetheless,
this sign extension operation in the IR prohibits a lot of optimizations:
>> 
>> - it counts as an instruction in heuristics based on instruction counts
(such as SimplifyCFG); as a result some simplifications are not performed;
>> - the sign extension operation gets combined with other operations. In
the example, the sign extension gets combined with the 32-bit comparison
implementing the val>-2048, resulting in a 16-bit comparison on the
non-extended value; the result is a comparison operation on 16-bit operands,
followed by a select operation on 32-bit operands:
>> 
>>  %0 = load i16* %arrayidx2, align 2, !dbg !502
>>   %conv = sext i16 %0 to i32, !dbg !502
>>   %cmp16 = icmp sgt i16 %0, 2047, !dbg !510
>>   br i1 %cmp16, label %if.end23, label %if.else, !dbg !510
>> 
>> if.else:                                          ; preds = %for.body
>>   %cmp19 = icmp slt i16 %0, -2048, !dbg !511                           
<--- 16-bit comparison
>>   %.conv = select i1 %cmp19, i32 -2048, i32 %conv, !dbg !511         
<--- 32-bit select
>>   br label %if.end23, !dbg !511
>> 
>> if.end23:                                         ; preds = %if.else,
%for.body
>>   %val1.0 = phi i32 [ 2047, %for.body ], [ %.conv, %if.else ]
>>   %conv24 = trunc i32 %val1.0 to i16, !dbg !512
>>   store i16 %conv24, i16* %arrayidx2, align 2, !dbg !512
>> 
>> The problem with this is that during instruction selection, the pair of
comparison and select is no longer recognized as max operation because the
operands of the two operations are not the same.
>> 
>> It seems to me that these and other limitations on the applied
optimizations could easily be avoided by introducing a sign-extending halfword
load operation into the IR.
>> 
>> Any ideas on this? And how big of an effort that might be?
>> 
>> Thanks,
>> 
>> Bjorn De Sutter
>> Computer Systems Lab
>> Ghent University
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> 
>> -- 
>> 
>> Thanks,
>> 
>> Justin Holewinski
> 
> 
> 
> 
> -- 
> 
> Thanks,
> 
> Justin Holewinski
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130121/80d69b4c/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jan 2013 - [LLVMdev] introducing sign extending halfword loads into the LLVM IR

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

[LLVMdev] introducing sign extending halfword loads into the LLVM IR

Possibly Parallel Threads