thr3ads.net - llvm dev - [LLVMdev] Landing my new development on the trunk ... [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin

2010-Nov-14 05:05 UTC

[LLVMdev] Landing my new development on the trunk ...

>
> A big downside of the current LSR algorithm is it's slow.  I had
initially
> hoped that some of the heuristics would protect it better, but the problem
> is
> more complex than I had expected.  I haven't done any measurements,
> but it's likely that OSR is faster, which may interest some people
> regardless
> of how the output compares.
>

A few years ago, I implemented OSR (with some slight modifications) in GCC,
though it was never committed to mainline (it's on a branch somewhere)
It was significantly faster than ivopts (which does what you guys are using
LSR for), and found more cases than ivopts did, I just never integrated the
same target dependent stuff so it never made it into the mainline.

Note that as written in the paper, OSR is pretty target independent because
of the order of processing.  It expects to do it's processing on each SCC as
it completes the SCC, so it doesn't gather all the possible things it could
do before doing them, and then decide what is best.

It is also possible for a "do everything" OSR to completely blow up
register
pressure if there are a number of conditional iv updates + operations based
on them in the loop, since it will have to generate a new variable for each
of these cases that will end up live over the entire loop.


So i think you may see good things if you took the OSR code and used it as a
basis for LSR.

There is one thing both the original paper, the original MSCP implementation
did (too bad the links to this point to ftp.cs.rice.edu, which no longer
works, the web files were a great implementation resource) , and my GCC
implementation did, which is LFTR (Linear Function Test Replacement). LFTR
after OSR can help reduce register pressure since it enables eliminating the
IV's that no longer serve any useful purpose.  I don't see any
implementation in this code.

--Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101114/09479798/attachment.html>

Brian West

2010-Nov-15 15:38 UTC

head link

[LLVMdev] Landing my new development on the trunk ...

On 11/13/10 11:05 PM, Daniel Berlin wrote:>
>     A big downside of the current LSR algorithm is it's slow.  I had
>     initially
>     hoped that some of the heuristics would protect it better, but the
>     problem is
>     more complex than I had expected.  I haven't done any measurements,
>     but it's likely that OSR is faster, which may interest some people
>     regardless
>     of how the output compares.
>
>
> There is one thing both the original paper, the original MSCP 
> implementation did (too bad the links to this point to ftp.cs.rice.edu 
> <http://ftp.cs.rice.edu>, which no longer works, the web files were a
> great implementation resource) , and my GCC implementation did, which 
> is LFTR (Linear Function Test Replacement). LFTR after OSR can help 
> reduce register pressure since it enables eliminating the IV's that no 
> longer serve any useful purpose.  I don't see any implementation in 
> this code.
>
> --DanDan,

LFTR (Linear Function Test Replacement) was mentioned in the original 
paper.  I considered including LFTR with OSR, but decided to get OSR to 
trunk first and then add LFTR (-lftr) as a separate pass later.  The 
LLVM development documentation suggests that new work be committed 
piecemeal over time.

LLVM does have an optimization pass, -instcombine, which will delete 
unused induction variables. I recommend that -instcombine be run after 
OSR.  It is my understanding that LFTR would attempt to remove induction 
variables whose only use is to participate in an end-loop-of-loop test 
condition.

thanks for your comments,
Brian West

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101115/a1fb2e9b/attachment.html>

Daniel Berlin

2010-Nov-16 13:58 UTC

head link

[LLVMdev] Landing my new development on the trunk ...

On Mon, Nov 15, 2010 at 10:38 AM, Brian West <bnwest at rice.edu> wrote:
>  On 11/13/10 11:05 PM, Daniel Berlin wrote:
>
>  A big downside of the current LSR algorithm is it's slow.  I had
>> initially
>> hoped that some of the heuristics would protect it better, but the
problem
>> is
>> more complex than I had expected.  I haven't done any measurements,
>> but it's likely that OSR is faster, which may interest some people
>> regardless
>> of how the output compares.
>>
>
> There is one thing both the original paper, the original MSCP
> implementation did (too bad the links to this point to ftp.cs.rice.edu,
> which no longer works, the web files were a great implementation resource)
,
> and my GCC implementation did, which is LFTR (Linear Function Test
> Replacement). LFTR after OSR can help reduce register pressure since it
> enables eliminating the IV's that no longer serve any useful purpose. 
I
> don't see any implementation in this code.
>
>  --Dan
>
> Dan,
>
> LFTR (Linear Function Test Replacement) was mentioned in the original
> paper.  I considered including LFTR with OSR, but decided to get OSR to
> trunk first and then add LFTR (-lftr) as a separate pass later.
>
I'm not sure why you'd add it as a separate pass, it is about 80-150
lines
of code, and adding it as as a separate pass requires you to do things like
induction variable detection + etc all over again.

See http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01035.html, record_edge,
apply_lftr_edge, follow_lftr_edge and perform_lftr

The LLVM development documentation suggests that new work be
committed> piecemeal over time.
>
> Sure, but that doesn't mean you should commit something that is going
> LLVM does have an optimization pass, -instcombine, which will delete unused
> induction variables.
>
LFTR does not delete unused IV's directly, it does reductions to transform
the IV into something else.
instcombine could only do this if it knew the sequence of reductions we
applied to strength reduce the IV in the first place, or if it computed
equivalence of ivs itself.
Both of these are not cheap operations.

I recommend that -instcombine be run after OSR.>
In general, there is a careful balance between leaving the IR in a state
that requires expensive cleanup, and doing some of that cleanup yourself
where it's cheap and easy.
If you were to simply fall on the extreme of running cleanup passes after
every optimization, your compiler would be much slower.

> It is my understanding that LFTR would attempt to remove induction
> variables whose only use is to participate in an end-loop-of-loop test
> condition.
>
Well, no, any IV whose only use is a comparison and a linear function of an
existing IV.  This is mostly end of loop test conditions, but you'd be
surprised where else this pops up.

Logging or progress tracking, for example, where you do things like

if (i % 100 == 0) {
   printf(".");
}

etc

Anyway, it looks like the consensus so far is that you need to produce some
compile time and benchmark numbers showing OSR is worth it as it's own pass
as opposed to replacing/augmenting the LSR implementation that exists now
with it.

--Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101116/43822996/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Nov 2010 - [LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

Reasonably Related Threads