thr3ads.net - llvm dev - [LLVMdev] global type legalization? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Bob Wilson

2010-Aug-18 17:27 UTC

[LLVMdev] global type legalization?

On Aug 18, 2010, at 9:56 AM, Chris Lattner wrote:
> On Aug 18, 2010, at 9:22 AM, Bob Wilson wrote:
>> I'm looking at llvm-generated ARM code that has some unnecessary
UXTB (zero extend) instructions, and it seems to me that doing type legalization
as an entirely local transformation is not the best approach.
> 
> That's true, but doing isel as a purely local approach isn't the
best either :-).  We'd really like to get to whole-function selection dags
at some point.
> 
>> I'm thinking in particular about legalizing integer types that need
to be promoted to the target register size, i.e., i8 and i16 for ARM promoting
to i32.
>> 
>> Currently we sign-extend or zero-extend values of these types at every
place where they are defined and used.  The definitions are no problem.  Loads
from memory can specify whether the value should be zero or sign extended, and
arithmetic operations are going to produce 32-bit values regardless.  Uses (in
different basic blocks from the defs) are a different matter.  We add explicit
zero extend and sign extend operations for every use, despite the fact that the
actual register values will have already been extended to 32 bits when they were
defined.
>> 
>> It seems like we ought to have a pass to globally promote such types to
the target register size.
>> 
>> Has anyone looked at this issue before?  Is there already a solution in
place that we just need to adopt for ARM?  Thoughts on what to do otherwise?
> 
> There are a couple of different tradeoffs you have to consider here. 
First, I'm going to assume that the defined value isn't already zero
extended (so that the zexts on the uses aren't purely redundant).  We have
code that is supposed to eliminate the purely redundant ones, at least when the
definition is emitted before the uses.
> 
> Some things to consider: When the input to the zext is spilled, the reload
can be folded into the zext on almost all targets, making the zext free.  When
the zext *isn't* folded into a load, what you're really looking for is a
code placement pass which tries to put the zexts in non-redundant (and
non-partially redundant) places.
> 
> This sort of code placement pass could be done at the LLVM IR level (as a
prelegalization like you mention), it could be done as a pre-regalloc machine
pass, or as a post-regalloc machine pass.
> 
> The right answer depends on what and how much you care about this.  If
you're seeing fully redundant zexts, then I'd look into why machinecse
isn't picking this up.  If you're seeing partially redundant cases, then
machine sink is missing something.  If you're seeing reextends of already
extended values, then it sounds like the heuristic to track that the live-out
vreg is extended isn't working.
> 
> I tend to think that it isn't worth the compile time to try to
microoptimize out every compare, but I could be convinced otherwise if there are
important use cases we're failing to handle.  I also do think that
whole-function selection dags will solve a lot of grossness (e.g. much of
codegen prepare) with a very clean model.
I'll take a look at Machine CSE and Machine Sink.  Where is the heuristic
for tracking live-out vregs that you mention?  I'm definitely seeing a
reextend of an already extended value.  Worse, the value is spilled and the zext
is not folded into the reload.

For ARM and possibly other RISC-like targets, you simply can't define an i8
or i16 value -- those aren't legal types.  Since those values will always be
extended at the point where they are defined, the code placement problem is
straightforward: you always want to fold the extends into the def, as long as
the value is always extended the same way (not mixed sign and zero extends). 
Whole function selection DAGs would make that easy.

Chris Lattner

2010-Aug-18 17:42 UTC

head link

[LLVMdev] global type legalization?

On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote:>> I tend to think that it isn't worth the compile time to try to
microoptimize out every compare, but I could be convinced otherwise if there are
important use cases we're failing to handle.  I also do think that
whole-function selection dags will solve a lot of grossness (e.g. much of
codegen prepare) with a very clean model.
> 
> I'll take a look at Machine CSE and Machine Sink.  Where is the
heuristic for tracking live-out vregs that you mention?  I'm definitely
seeing a reextend of an already extended value.  Worse, the value is spilled and
the zext is not folded into the reload.
The code I'm thinking of is in SelectionDAGISel::ComputeLiveOutVRegInfo
> For ARM and possibly other RISC-like targets, you simply can't define
an i8 or i16 value -- those aren't legal types.  Since those values will
always be extended at the point where they are defined, the code placement
problem is straightforward: you always want to fold the extends into the def, as
long as the value is always extended the same way (not mixed sign and zero
extends).  Whole function selection DAGs would make that easy.
Right.  This is a bit trickier than you make it sound though, because an
"i8" addition isn't neccessarily zero or sign extended when the
add is done in a 32-bit register.

-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100818/8c53ac53/attachment.html>

Bob Wilson

2010-Aug-18 17:59 UTC

head link

[LLVMdev] global type legalization?

On Aug 18, 2010, at 10:42 AM, Chris Lattner wrote:
> On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote:
>>> I tend to think that it isn't worth the compile time to try to
microoptimize out every compare, but I could be convinced otherwise if there are
important use cases we're failing to handle.  I also do think that
whole-function selection dags will solve a lot of grossness (e.g. much of
codegen prepare) with a very clean model.
>> 
>> I'll take a look at Machine CSE and Machine Sink.  Where is the
heuristic for tracking live-out vregs that you mention?  I'm definitely
seeing a reextend of an already extended value.  Worse, the value is spilled and
the zext is not folded into the reload.
> 
> The code I'm thinking of is in SelectionDAGISel::ComputeLiveOutVRegInfo
I will check it out.  Thanks!
> 
>> For ARM and possibly other RISC-like targets, you simply can't
define an i8 or i16 value -- those aren't legal types.  Since those values
will always be extended at the point where they are defined, the code placement
problem is straightforward: you always want to fold the extends into the def, as
long as the value is always extended the same way (not mixed sign and zero
extends).  Whole function selection DAGs would make that easy.
> 
> Right.  This is a bit trickier than you make it sound though, because an
"i8" addition isn't neccessarily zero or sign extended when the
add is done in a 32-bit register.
Oh right.  I was oversimplifying.  So, you do NOT always want to fold the
extends into the def.  E.G., if the only use is a store-byte, you wouldn't
want to add a separate extend instruction after an i8 add.  That really makes
whole-function selection DAGs sound like the right solution.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100818/38218bf3/attachment.html>

Evan Cheng

2010-Aug-18 20:00 UTC

head link

[LLVMdev] global type legalization?

On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote:
> 
> On Aug 18, 2010, at 9:56 AM, Chris Lattner wrote:
> 
>> On Aug 18, 2010, at 9:22 AM, Bob Wilson wrote:
>>> I'm looking at llvm-generated ARM code that has some
unnecessary UXTB (zero extend) instructions, and it seems to me that doing type
legalization as an entirely local transformation is not the best approach.
>> 
>> That's true, but doing isel as a purely local approach isn't
the best either :-).  We'd really like to get to whole-function selection
dags at some point.
>> 
>>> I'm thinking in particular about legalizing integer types that
need to be promoted to the target register size, i.e., i8 and i16 for ARM
promoting to i32.
>>> 
>>> Currently we sign-extend or zero-extend values of these types at
every place where they are defined and used.  The definitions are no problem. 
Loads from memory can specify whether the value should be zero or sign extended,
and arithmetic operations are going to produce 32-bit values regardless.  Uses
(in different basic blocks from the defs) are a different matter.  We add
explicit zero extend and sign extend operations for every use, despite the fact
that the actual register values will have already been extended to 32 bits when
they were defined.
>>> 
>>> It seems like we ought to have a pass to globally promote such
types to the target register size.
>>> 
>>> Has anyone looked at this issue before?  Is there already a
solution in place that we just need to adopt for ARM?  Thoughts on what to do
otherwise?
>> 
>> There are a couple of different tradeoffs you have to consider here. 
First, I'm going to assume that the defined value isn't already zero
extended (so that the zexts on the uses aren't purely redundant).  We have
code that is supposed to eliminate the purely redundant ones, at least when the
definition is emitted before the uses.
>> 
>> Some things to consider: When the input to the zext is spilled, the
reload can be folded into the zext on almost all targets, making the zext free. 
When the zext *isn't* folded into a load, what you're really looking for
is a code placement pass which tries to put the zexts in non-redundant (and
non-partially redundant) places.
>> 
>> This sort of code placement pass could be done at the LLVM IR level (as
a prelegalization like you mention), it could be done as a pre-regalloc machine
pass, or as a post-regalloc machine pass.
>> 
>> The right answer depends on what and how much you care about this.  If
you're seeing fully redundant zexts, then I'd look into why machinecse
isn't picking this up.  If you're seeing partially redundant cases, then
machine sink is missing something.  If you're seeing reextends of already
extended values, then it sounds like the heuristic to track that the live-out
vreg is extended isn't working.
>> 
>> I tend to think that it isn't worth the compile time to try to
microoptimize out every compare, but I could be convinced otherwise if there are
important use cases we're failing to handle.  I also do think that
whole-function selection dags will solve a lot of grossness (e.g. much of
codegen prepare) with a very clean model.
> 
> I'll take a look at Machine CSE and Machine Sink.  Where is the
heuristic for tracking live-out vregs that you mention?  I'm definitely
seeing a reextend of an already extended value.  Worse, the value is spilled and
the zext is not folded into the reload.
> 
> For ARM and possibly other RISC-like targets, you simply can't define
an i8 or i16 value -- those aren't legal types.  Since those values will
always be extended at the point where they are defined, the code placement
problem is straightforward: you always want to fold the extends into the def, as
long as the value is always extended the same way (not mixed sign and zero
extends).  Whole function selection DAGs would make that easy.
CodeGen/OptimizeExts.cpp (now folded into CodeGen/PeepholeOptimizer.cpp) can get
rid of some of the exts. Right now it only works for targets like x86 where the
pre-extended value is left in the part of the register. The common problem is
the pre-extended value is live out of the block so uses in other blocks will
re-extend them.

Evan

Evan
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Bob Wilson

2010-Sep-14 18:37 UTC

head link

[LLVMdev] global type legalization?

Returning to an old discussion here....

On Aug 18, 2010, at 10:42 AM, Chris Lattner wrote:
> On Aug 18, 2010, at 10:27 AM, Bob Wilson wrote:
>>> I tend to think that it isn't worth the compile time to try to
microoptimize out every compare, but I could be convinced otherwise if there are
important use cases we're failing to handle.  I also do think that
whole-function selection dags will solve a lot of grossness (e.g. much of
codegen prepare) with a very clean model.
>> 
>> I'll take a look at Machine CSE and Machine Sink.  Where is the
heuristic for tracking live-out vregs that you mention?  I'm definitely
seeing a reextend of an already extended value.  Worse, the value is spilled and
the zext is not folded into the reload.
> 
> The code I'm thinking of is in SelectionDAGISel::ComputeLiveOutVRegInfo
For the testcase I'm looking at, ComputeLiveOutVRegInfo does not help
because it is called prior to selection when the load is an "any_ext"
load.  It gets (arbitrarily) selected to LDRB, which zero-extends to 32 bits,
but that's too late to affect the live-out info.

MachineCSE and MachineSink do not help because the first zero-extend is folded
into the load (LDRB), so the redundant zero-extend (UXTB) does not appear to be
a CSE.  In another case, the zero-extend is also folded into an add (UXTAB),
which prevents the add from being selected to a better alternative (UXTAB does
not allow immediate operands).
> 
>> For ARM and possibly other RISC-like targets, you simply can't
define an i8 or i16 value -- those aren't legal types.  Since those values
will always be extended at the point where they are defined, the code placement
problem is straightforward: you always want to fold the extends into the def, as
long as the value is always extended the same way (not mixed sign and zero
extends).  Whole function selection DAGs would make that easy.
> 
> Right.  This is a bit trickier than you make it sound though, because an
"i8" addition isn't neccessarily zero or sign extended when the
add is done in a 32-bit register.
When you pointed this out earlier, I conceded that I was wrong, but on second
thought, I'm surprised that llvm uses i8 adds.  Other compilers that
I've worked on rely on the integral promotion rules for C/C++ to convert all
integer arithmetic to the default "int" or "unsigned" type. 
That tends to work out nicely for register-oriented targets.  Is there a reason
why llvm does not take that approach?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100914/d6d7a9aa/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2010 - [LLVMdev] global type legalization?

[LLVMdev] global type legalization?

[LLVMdev] global type legalization?

[LLVMdev] global type legalization?

[LLVMdev] global type legalization?

[LLVMdev] global type legalization?

Possibly Parallel Threads