thr3ads.net - llvm dev - [LLVMdev] Area for improvement [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2005-Feb-22 05:37 UTC

[LLVMdev] Area for improvement

On Mon, 21 Feb 2005, Jeff Cohen wrote:> I figured getelementptr exists as it does to facilitate data flow analysis,
> but it does need to be broken down before instruction selection.  It's
not
> just the missed optimization opportunities.  It also introduces a huge
amount
> of complexity into instruction selection as they deal with its complexity. 
> It would also take care of many of the FIXMEs in LoopStrengthReduce.
There are two issues here.  One is an important one that should be 
addressed on the LLVM level: we currently do not have a loop strength 
reduction pass.  Loop strength reduction is an important optimization that 
turns things like this:

   for (i = ...; ; ++i)
     A[i] = ...

into:

   for (p = &A[0]; ... ; ++p)
     *p = ...

This transforms multiplies in addressing arithmetic (e.g. 
&A+i*sizeof(A[0]) into additions (e.g. p = p + sizeof(A[0])).  Nate worked 
on a patch to do this a while back, but I'm not sure where it stands.

The second issue is that we need to do redundancy elimination and hoisting 
on operations that are only exposed once instruction selection is 
performed.  Getelementptr expansion is just ONE particular case of this, 
but it can happen with any instructions (including the use of large 
integer (or any FP) constants on RISC machines, addressing globals with 
PIC code, handling extended/truncated operations (for RISC machines with a 
single integer size), etc.  Note that hacks like "LowerMultiDimRefs"
and
the sparc Preselection pass sorta-kinda work around SOME of this, but they 
really are just as bad as the problem: they solve some cases and make 
other cases worse.  The only way to make preselection or lowermultidimrefs 
work is to duplication all of the knowledge of how the instruction 
selector will select the code (e.g. the advice about allowing constant 
indices to be grouped together).

This is *exactly* the reason why the new instruction selection framework 
was introduced.  It already handles local CSE trivially, and will be 
extended to do GCSE, LICM, and eventually PDCE as time permits.  Combined 
with loop strength reduction, these problems should go completely away. 
Also, with the new instruction selection frameworks, targets don't have to 
handle GEP instructions at all, so the "complexity" argument above 
doesn't apply.

If you're interested in working on this, I would suggest starting with the 
loop strength reduction code, as I've heard it's pretty close to
working.

-Chris

> Vikram Adve wrote:
>
>>> 
>>>  Now the problem is obvious.  A two dimensional array access is
being
>>> performed by a single instruction.  The arithmetic needed to
address the
>>> element is implicit, and therefore inaccessible to optimizations. 
The
>>> redundant calculations can not be eliminated, nor can strength
reduction
>>> be performed.  getelementptr needs to be broken down into its
constituent
>>> operations at some point.
>> 
>> 
>> Jeff,
>> 
>> This is exactly right -- any multi-dimensional array indexing
operations
>> need to decomposed into a sequence of single index operations so that
they
>> can be exposed to GCSE and LICM.  This transformation obscures the
indexing
>> behavior of the code, so the right place to do this is within each 
>> back-end, on LLVM code just before instruction selection.
>> 
>> There is a pass called DecomposeMultiDimRefs (which seems to have been 
>> incorrectly moved to lib/Target/SparcV9/) which does this
transformation.
>> It's used by the SparcV9 back end before instr. selection but is
not
>> specific to Sparc.  Running this, followed by LICM and GCSE should
address
>> this problem.
>> 
>> --Vikram
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-Chris

-- 
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/

Vikram S. Adve

2005-Feb-22 15:09 UTC

head link

[LLVMdev] Area for improvement

> The second issue is that we need to do redundancy elimination and 
> hoisting on operations that are only exposed once instruction 
> selection is performed.  Getelementptr expansion is just ONE 
> particular case of this, but it can happen with any instructions 
> (including the use of large integer (or any FP) constants on RISC 
> machines, addressing globals with PIC code, handling 
> extended/truncated operations (for RISC machines with a single integer 
> size), etc.  Note that hacks like "LowerMultiDimRefs" and the
sparc
> Preselection pass sorta-kinda work around SOME of this,
Actually, they are capable of working around much of this, and most 
such optimizations are actually quite architecture-independent, e.g., 
LowerMultiDimRefs, which is a major optimization for array-intensive 
languages.

I don't claim such choices are the right long-term solution but the 
right long-term solution is exactly what you're implying below - a new 
optimization framework within the back-end, working on it's own 
representation, etc.  This has taken 4 years and is still under 
development.  This is why what you call "hacks" are sometimes the
right
choice in the short term.

> The only way to make preselection or lowermultidimrefs work is to 
> duplication all of the knowledge of how the instruction selector will 
> select the code (e.g. the advice about allowing constant indices to be 
> grouped together).
This is why you need a separate, low-level optimization framework - the 
kind you were describing.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Vikram S. Adve

2005-Feb-22 15:36 UTC

head link

[LLVMdev] Area for improvement

On Feb 22, 2005, at 9:09 AM, Vikram S. Adve wrote:
>> The only way to make preselection or lowermultidimrefs work is to 
>> duplication all of the knowledge of how the instruction selector will 
>> select the code (e.g. the advice about allowing constant indices to 
>> be grouped together).
>
> This is why you need a separate, low-level optimization framework - 
> the kind you were describing.
Incidentally, for those interested in the big picture, the issues 
raised by Jeff and Chris are a fundamental trade-off between the 
so-called "low-level" model of compilation (like GCC and I believe 
IBM's  XL compiler family), and the mid-level model we chose for LLVM.  
In GCC and other low-level systems, code is lowered to expose all the 
detailed operations needed on a specific target early in the 
compilation process, so that they are all exposed to optimization.  Of 
course, doing this in a single IR capable of supporting multiple 
targets can lead to horrendously complex representations.  The 
mid-level model does not expose architecture-specific operations in the 
IR but this means that a separate back-end optimization framework is 
needed, in addition to the primary mid-level, architecture-neutral, 
framework.

My point is that this is a fundamental trade-off made very early on in 
the LLVM design, and I think it was the right one.  It greatly 
simplifies the higher-level optimizations, which are by far the most 
complex, and can even make more optimizations possible (if the 
mid-level IR is able to expose types and reference patterns that are 
obscured in the low-level IR).

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Jeff Cohen

2005-Feb-22 16:16 UTC

head link

[LLVMdev] Area for improvement

Vikram S. Adve wrote:
>> The only way to make preselection or lowermultidimrefs work is to 
>> duplication all of the knowledge of how the instruction selector will 
>> select the code (e.g. the advice about allowing constant indices to 
>> be grouped together).
>
>
> This is why you need a separate, low-level optimization framework - 
> the kind you were describing.
>
> --Vikram
> http://www.cs.uiuc.edu/~vadve
> http://llvm.cs.uiuc.edu/
Sounds reasonable to me.  It seems to me it would be best if LSR was 
part of the low-level optimization framework, otherwise it would be 
forced to duplicate too much work on its own.

Also, some of what LSR needs to decide is architecture dependent.  For 
example, it may not want to strength reduce a multiplication which 
multiplies by a small power of two, as this is handled by addressing 
modes on some architectures.

There is still probably a need for a mid-level LSR that handles the 
explicit multiplications, divisions, etc...

Chris Lattner

2005-Feb-22 16:29 UTC

head link

[LLVMdev] Area for improvement

On Tue, 22 Feb 2005, Vikram S. Adve wrote:
>> The second issue is that we need to do redundancy elimination and
hoisting
>> on operations that are only exposed once instruction selection is 
>> performed.  Getelementptr expansion is just ONE particular case of
this,
>> but it can happen with any instructions (including the use of large
integer
>> (or any FP) constants on RISC machines, addressing globals with PIC
code,
>> handling extended/truncated operations (for RISC machines with a single
>> integer size), etc.  Note that hacks like "LowerMultiDimRefs"
and the sparc
>> Preselection pass sorta-kinda work around SOME of this,
>
> Actually, they are capable of working around much of this, and most such 
> optimizations are actually quite architecture-independent, e.g., 
> LowerMultiDimRefs, which is a major optimization for array-intensive 
> languages.
I obviously agree that high-quality codegen of getelementptr instructions 
is very important!  I have two primary problems with LowerMultiDimRefs:

1. We don't want the code generator munching on the LLVM code as it
    generates code.  The code generator should only read the LLVM code.
    Right now if we do thinks like a "quick JIT" followed by a
"good code
    JIT" (when we find that a piece of code is hot), the second JIT gets
    different LLVM code than the first one had as input.  We are still not
    quite to the point where this is the case, but we are getting there.
2. The way LowerMultiDimRefs works is by decimating GEP instructions into
    pieces, then relying on GCSE and LICM to clean them up.  The problem
    with this is that GCSE and LICM are not the trivial passes that you'd
    like them to be for a simple cleanup job like this.  In particular,
    GCSE requires a value numbering implementation, LICM requires
    loop info and alias analysis.

Finally, while the *code* for MultiDimRefs isn't target specific, the 
decisions it makes *are*.  As a trivial example, it should decompose the 
following GEP on most risc machines:

   t2 = GEP P, 1000000, X
   load t2

Assuming the immediate doesn't fit into the immediate field of an
instruction, this GEP will lower to multiple instructions, hypothetically 
like this (assuming X indexes into a byte array):

   t1 = P + 900000
   load t1 + X + 100000

In this case, the addition of 90000 could be loop invariant, a common 
subexpression, etc.
> I don't claim such choices are the right long-term solution but the
right
> long-term solution is exactly what you're implying below - a new
optimization
> framework within the back-end, working on it's own representation, etc.
This
> has taken 4 years and is still under development.  This is why what you
call
> "hacks" are sometimes the right choice in the short term.
I can appreciate that the LowerMultiDimRefs pass made a *lot* of sense 4 
years ago, but I'm concerned about moving forward.  Now we have more of 
the infrastructure that is needed to move beyond it.  Also, the new 
instruction selection framework is only a couple months old, not 4 years. 
:)

-Chris

-- 
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Feb 2005 - [LLVMdev] Area for improvement

[LLVMdev] Area for improvement

[LLVMdev] Area for improvement

[LLVMdev] Area for improvement

[LLVMdev] Area for improvement

[LLVMdev] Area for improvement

Seemingly Similar Threads