thr3ads.net - llvm dev - [LLVMdev] Landing my new development on the trunk ... [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Brian West

2010-Oct-28 16:38 UTC

[LLVMdev] Landing my new development on the trunk ...

On 10/27/10 8:34 PM, Eli Friedman wrote:> On Wed, Oct 27, 2010 at 1:29 PM, Brian West<bnwest at rice.edu> 
wrote:
>> Here is the patch for the new Operator Strength Reduction optimization
>> pass that I have written.  The bulk of the code is in
>>
>> lib/Transforms/Scalar/OperatorStrengthReduce.cpp
>>
>> The algorithm finds reduction opportunities in both array accesses and
>> explicit multiplications within loops.
>>
>> Next, I plan on writing the regression tests.
>
> Comments:
> 1. Please don't top-post.
> 2. Instruction::getOpcodeName instead of your OpcodeToString function.
Can do. I did not see getOpcodeName() in 
lib/Target/CBackend/CBackend.cpp, so I wrote my own. FWIW I am only 
using OpcodeToString() for debug output.

> 3. LLVM already has a significant amount of infrastructure for loop
> passes; why does this pass have its own code for finding loops?I saw the loop infrastructure for CFG loops. This algorithm finds loops 
in the data flow (more precisely: strongly-connected components in the 
SSA-graph), e.g.

bb5:
%20 = add nsw i32 %i.0, 1
br label %bb6

bb6:
%i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ]

There is a data flow loop between %20 and %i.0. The OSR paper has a nice 
figure showing data flow loops.

Here is small excerpt from the OSR paper:

"OSR is driven by a simple depth-first search of the SSA-graph, using 
Tarjan’s strongly-connected component finder. The SCC-finder lets OSR 
discover induction variables in topological order and process them as 
they are discovered. As the SCCs are discovered, they are processed by a 
set of mutually-recursive routines that test for induction variables and 
region constants, and then reduce the appropriate operations."

> 4. What's the use case for this pass?  Can this pass improve
> performance of generated code beyond the existing -loop-reduce pass?
> If not, could it usefully act as a faster replacement for -loop-reduce
> for fast code generation? (Perhaps someone doing fast code generation
> with LLVM could comment on whether -loop-reduce is useful, and whether
> the performance is an issue.)
-loop-reduce (LSR) only looks at implicit multiplication in array accesses. The
following code has an explicit multiplication within a loop that LSR will not
reduce:

extern float
loop4(float* a, int start, int stop)
{
int i;
float sum;
float *pEntry;
char *p;
unsigned long offset;
const unsigned int float_size = sizeof(float);

sum = 0.0;
p = (char*)a;
for (i=start-1; i<stop; ++i) {
offset = i * float_size;
pEntry = (float*) (p + offset);
sum += *pEntry;
}

return sum;
}

OSR also finds reductions in some esoteric array accesses that LSR 
misses. I found these LSR misses while writing less-than-real-world code 
to break OSR. I can include these examples, if requested.

For array accesses, OSR and LSR produces similar LLVM IR and I am 
guessing that their execution times should be similar.

Empirically the OSR optimization is compile-time faster than LSR. I have 
also noticed that OSR has more "analysis" requirements: Induction 
Variable User, Natural Loop Information, Canonicalize natural loops, and 
Scalar Evolution Analysis. Both OSR and LSR require the Dominator Tree 
Construction analysis pass.

I did not mention in the original email (and should have) that OSR needs 
-instcombine to be run after it for cleanup. Also -licm, -reassociate, 
-gvn and -sccp can be enabling optimizations for OSR.

> -EliEli,

Thanks for your comments.

Brian West

Eli Friedman

2010-Oct-28 17:56 UTC

head link

[LLVMdev] Landing my new development on the trunk ...

On Thu, Oct 28, 2010 at 9:38 AM, Brian West <bnwest at rice.edu>
wrote:>> 3. LLVM already has a significant amount of infrastructure for loop
>> passes; why does this pass have its own code for finding loops?
>
> I saw the loop infrastructure for CFG loops. This algorithm finds loops in
> the data flow (more precisely: strongly-connected components in the
> SSA-graph), e.g.
>
> bb5:
> %20 = add nsw i32 %i.0, 1
> br label %bb6
>
> bb6:
> %i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ]
>
> There is a data flow loop between %20 and %i.0. The OSR paper has a nice
> figure showing data flow loops.
>
> Here is small excerpt from the OSR paper:
>
> "OSR is driven by a simple depth-first search of the SSA-graph, using
> Tarjan’s strongly-connected component finder. The SCC-finder lets OSR
> discover induction variables in topological order and process them as they
> are discovered. As the SCCs are discovered, they are processed by a set of
> mutually-recursive routines that test for induction variables and region
> constants, and then reduce the appropriate operations."
Ah, okay, that makes sense.  I didn't really read the algorithm too
carefully...
>> 4. What's the use case for this pass?  Can this pass improve
>> performance of generated code beyond the existing -loop-reduce pass?
>> If not, could it usefully act as a faster replacement for -loop-reduce
>> for fast code generation? (Perhaps someone doing fast code generation
>> with LLVM could comment on whether -loop-reduce is useful, and whether
>> the performance is an issue.)
[...]> For array accesses, OSR and LSR produces similar LLVM IR and I am guessing
> that their execution times should be similar.
For normal compilation, LSR runs rather late (with the code generation
passes), and has some target-specific knowledge.  We really need
benchmarks here, I think.
> Empirically the OSR optimization is compile-time faster than LSR. I have
> also noticed that OSR has more "analysis" requirements: Induction
Variable
> User, Natural Loop Information, Canonicalize natural loops, and Scalar
> Evolution Analysis. Both OSR and LSR require the Dominator Tree
Construction
> analysis pass.
My primary concern here is that if this code gets committed without
anyone interested in actually using it, it will just end up orphaned,
so there's no point to contributing it.
> I did not mention in the original email (and should have) that OSR needs
> -instcombine to be run after it for cleanup. Also -licm, -reassociate, -gvn
> and -sccp can be enabling optimizations for OSR.
Hmm... perhaps that could be partially fixed using the InstSimplify
infrastructure.

-Eli

Brian West

2010-Oct-28 21:23 UTC

head link

[LLVMdev] Landing my new development on the trunk ...

Eli Friedman <eli.friedman <at> gmail.com>
writes:> > Empirically the OSR optimization is compile-time faster than LSR. I
have
> > also noticed that OSR has more "analysis" requirements:
Induction Variable
> > User, Natural Loop Information, Canonicalize natural loops, and Scalar
> > Evolution Analysis. Both OSR and LSR require the Dominator Tree
Construction
> > analysis pass.
> 
> My primary concern here is that if this code gets committed without
> anyone interested in actually using it, it will just end up orphaned,
> so there's no point to contributing it.
This work is part of the PACE project (http://pace.rice.edu/).  My team has
funding for another three years and is committed to maintaining this code.

> > I did not mention in the original email (and should have) that OSR
needs
> > -instcombine to be run after it for cleanup. Also -licm, -reassociate,
-gvn
> > and -sccp can be enabling optimizations for OSR.
> 
> Hmm... perhaps that could be partially fixed using the InstSimplify
> infrastructure.
OSR can produce induction variables that are not used.  -instcombine finds the
unused induction variables and deletes them (which is super cool btw :)).  I am
unsure if InstructionSimplify can fix that.

> 
> -Eli
> 
thanks,
Brian

Evan Cheng

2010-Oct-29 08:31 UTC

head link

[LLVMdev] Landing my new development on the trunk ...

Eli is right. We do need to see some benchmark numbers and understand how the
pass will fit in the target independent optimizer. While we encourage
contribution, we typically don't commit new passes unless it introduce new
functionalities that have active clients. It would also help if you provide us
with compile time numbers.

Evan

On Oct 28, 2010, at 10:56 AM, Eli Friedman wrote:
> On Thu, Oct 28, 2010 at 9:38 AM, Brian West <bnwest at rice.edu>
wrote:
>>> 3. LLVM already has a significant amount of infrastructure for loop
>>> passes; why does this pass have its own code for finding loops?
>> 
>> I saw the loop infrastructure for CFG loops. This algorithm finds loops
in
>> the data flow (more precisely: strongly-connected components in the
>> SSA-graph), e.g.
>> 
>> bb5:
>> %20 = add nsw i32 %i.0, 1
>> br label %bb6
>> 
>> bb6:
>> %i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ]
>> 
>> There is a data flow loop between %20 and %i.0. The OSR paper has a
nice
>> figure showing data flow loops.
>> 
>> Here is small excerpt from the OSR paper:
>> 
>> "OSR is driven by a simple depth-first search of the SSA-graph,
using
>> Tarjan’s strongly-connected component finder. The SCC-finder lets OSR
>> discover induction variables in topological order and process them as
they
>> are discovered. As the SCCs are discovered, they are processed by a set
of
>> mutually-recursive routines that test for induction variables and
region
>> constants, and then reduce the appropriate operations."
> 
> Ah, okay, that makes sense.  I didn't really read the algorithm too
carefully...
> 
>>> 4. What's the use case for this pass?  Can this pass improve
>>> performance of generated code beyond the existing -loop-reduce
pass?
>>> If not, could it usefully act as a faster replacement for
-loop-reduce
>>> for fast code generation? (Perhaps someone doing fast code
generation
>>> with LLVM could comment on whether -loop-reduce is useful, and
whether
>>> the performance is an issue.)
> [...]
>> For array accesses, OSR and LSR produces similar LLVM IR and I am
guessing
>> that their execution times should be similar.
> 
> For normal compilation, LSR runs rather late (with the code generation
> passes), and has some target-specific knowledge.  We really need
> benchmarks here, I think.
> 
>> Empirically the OSR optimization is compile-time faster than LSR. I
have
>> also noticed that OSR has more "analysis" requirements:
Induction Variable
>> User, Natural Loop Information, Canonicalize natural loops, and Scalar
>> Evolution Analysis. Both OSR and LSR require the Dominator Tree
Construction
>> analysis pass.
> 
> My primary concern here is that if this code gets committed without
> anyone interested in actually using it, it will just end up orphaned,
> so there's no point to contributing it.
> 
>> I did not mention in the original email (and should have) that OSR
needs
>> -instcombine to be run after it for cleanup. Also -licm, -reassociate,
-gvn
>> and -sccp can be enabling optimizations for OSR.
> 
> Hmm... perhaps that could be partially fixed using the InstSimplify
> infrastructure.
> 
> -Eli
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Oct 2010 - [LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

[LLVMdev] Landing my new development on the trunk ...

Reasonably Related Threads