thr3ads.net - llvm dev - [LLVMdev] mem2reg optimization [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2008-Oct-04 22:05 UTC

[LLVMdev] mem2reg optimization

On Oct 4, 2008, at 2:51 PM, Chris Lattner wrote:>>> I like your approach of using the use lists but I'm not sure
the
>>> ordering
>>> is guaranteed.  If it is, your approach is superior.
>>
>> I got my patch updated to work with TOT.  Here it is.  Comments
>> welcome.
>
> Hi Dave,
>
> Great.  I'd like to get this in, but would really like to do a profile
> first.  Could you please include a testcase?  Even if it is something
> contrived with llvm-gcc, I would really like to understand what is
> going on.
Looking more closely at the approach, I infer that the (logical in  
hindsight) issue are cases where mem2reg scans a BB to see the  
relative ordering of two instructions.  For example, the single store  
case.  I could imagine that if you saw lots of single store allocas in  
the middle of a large BB that mem2reg would be very very slow.

As far as the approach goes, here's a counterproposal that is a bit  
less complex.  I suggest adding a single DenseMap<Instruction*,  
unsigned> to Mem2reg.  If there is an entry in this map for a load/ 
store instruction, it would store the instruction's index in its basic  
block.  This means that to check dominance of two instrs in a block,  
you just need to compare their two indices (which is fast).  Really  
small blocks (< 10 instructions?) would never be entered into map, to  
keep the size of the map low.  The first time a query is made within a  
large block, it would compute the indices of all loads and stores  
within the block.

The nice thing about using a single map is that it makes updating the  
map really trivial.  Just use TheMap.erase(I) when deleting load and  
store instructions.  The numbering can be purely local to a block and  
the relative ordering of memops don't get invalidated when a load or  
store is nuked.

Does this seem like a reasonable approach?  If so, please split the  
various parts of mem2reg out into helper methods as a prepatch.  This  
will make review of the subsequent changes much easier.

-Chris

David Greene

2008-Oct-06 16:47 UTC

head link

[LLVMdev] mem2reg optimization

On Saturday 04 October 2008 17:05, Chris Lattner wrote:
> Looking more closely at the approach, I infer that the (logical in
> hindsight) issue are cases where mem2reg scans a BB to see the
> relative ordering of two instructions.  For example, the single store
Correct.
> case.  I could imagine that if you saw lots of single store allocas in
> the middle of a large BB that mem2reg would be very very slow.
Right.
> As far as the approach goes, here's a counterproposal that is a bit
> less complex.  I suggest adding a single DenseMap<Instruction*,
> unsigned> to Mem2reg.  If there is an entry in this map for a load/
> store instruction, it would store the instruction's index in its basic
> block.  This means that to check dominance of two instrs in a block,
> you just need to compare their two indices (which is fast).  Really
> small blocks (< 10 instructions?) would never be entered into map, to
> keep the size of the map low.  The first time a query is made within a
> large block, it would compute the indices of all loads and stores
> within the block.
This is a simpler approach and should have the same effect.  I'll code it
up.
> The nice thing about using a single map is that it makes updating the
> map really trivial.  Just use TheMap.erase(I) when deleting load and
> store instructions.  The numbering can be purely local to a block and
> the relative ordering of memops don't get invalidated when a load or
> store is nuked.
Good points.
> Does this seem like a reasonable approach?  If so, please split the
> various parts of mem2reg out into helper methods as a prepatch.  This
> will make review of the subsequent changes much easier.
Which various parts are you referring to?  You mean the stuff that populates 
and updates the map?

                                               -Dave

Chris Lattner

2008-Oct-06 17:22 UTC

head link

[LLVMdev] mem2reg optimization

On Oct 6, 2008, at 9:47 AM, David Greene wrote:>> As far as the approach goes, here's a counterproposal that is a bit
>> less complex.  I suggest adding a single DenseMap<Instruction*,
>> unsigned> to Mem2reg.  If there is an entry in this map for a load/
>> store instruction, it would store the instruction's index in its  
>> basic
>> block.  This means that to check dominance of two instrs in a block,
>> you just need to compare their two indices (which is fast).  Really
>> small blocks (< 10 instructions?) would never be entered into map,
to
>> keep the size of the map low.  The first time a query is made  
>> within a
>> large block, it would compute the indices of all loads and stores
>> within the block.
>
> This is a simpler approach and should have the same effect.  I'll  
> code it up.
Great! Thanks,
>> Does this seem like a reasonable approach?  If so, please split the
>> various parts of mem2reg out into helper methods as a prepatch.  This
>> will make review of the subsequent changes much easier.
>
> Which various parts are you referring to?  You mean the stuff that  
> populates
> and updates the map?
I'm criticizing the existing code :), which is just very poorly  
structured.  It would be nice to do a refactoring patch first which  
just splits the various "scan a BB" places out into helper methods,  
instead of them being inlined into larger pieces of code.  This will  
make the mem2reg algo easier to read and will make the algorithmic  
change easier to review.

If you want, I can change one of the places that does a scan to show  
what I mean,

-Chris

Nicolas Capens

2008-Oct-07 05:58 UTC

head link

[LLVMdev] mem2reg optimization

Hi Chris,

Sounds like an excellent generic approach. I'd love to lend a hand to
implement it but I'm afraid it involves a little more in-depth LLVM
knowledge than what I currently master, so I'll leave it to Dave and you for
now...

Talking about performance, for me this optimization was the difference
between mem2reg taking 63% of execution time, and having it totally
'vanish'. It also marked the transition between rather using a custom
JIT
compiler and getting really excited to use LLVM instead. :) Previously I
just assumed LLVM to be inherently slow but now I realize there's still
potential for optimization.

Do you consider it useful if I go ahead and further identify bottlenecks and
report them or would you rather concentrate on correctness bugs for now? Is
there any list of known optimization opportunities?

Thanks,

Nicolas

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chris Lattner
Sent: Sunday, 05 October, 2008 00:05
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] mem2reg optimization

On Oct 4, 2008, at 2:51 PM, Chris Lattner wrote:>>> I like your approach of using the use lists but I'm not sure
the
>>> ordering
>>> is guaranteed.  If it is, your approach is superior.
>>
>> I got my patch updated to work with TOT.  Here it is.  Comments
>> welcome.
>
> Hi Dave,
>
> Great.  I'd like to get this in, but would really like to do a profile
> first.  Could you please include a testcase?  Even if it is something
> contrived with llvm-gcc, I would really like to understand what is
> going on.
Looking more closely at the approach, I infer that the (logical in  
hindsight) issue are cases where mem2reg scans a BB to see the  
relative ordering of two instructions.  For example, the single store  
case.  I could imagine that if you saw lots of single store allocas in  
the middle of a large BB that mem2reg would be very very slow.

As far as the approach goes, here's a counterproposal that is a bit  
less complex.  I suggest adding a single DenseMap<Instruction*,  
unsigned> to Mem2reg.  If there is an entry in this map for a load/ 
store instruction, it would store the instruction's index in its basic  
block.  This means that to check dominance of two instrs in a block,  
you just need to compare their two indices (which is fast).  Really  
small blocks (< 10 instructions?) would never be entered into map, to  
keep the size of the map low.  The first time a query is made within a  
large block, it would compute the indices of all loads and stores  
within the block.

The nice thing about using a single map is that it makes updating the  
map really trivial.  Just use TheMap.erase(I) when deleting load and  
store instructions.  The numbering can be purely local to a block and  
the relative ordering of memops don't get invalidated when a load or  
store is nuked.

Does this seem like a reasonable approach?  If so, please split the  
various parts of mem2reg out into helper methods as a prepatch.  This  
will make review of the subsequent changes much easier.

-Chris

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2008-Oct-10 18:06 UTC

head link

[LLVMdev] mem2reg optimization

On Oct 6, 2008, at 10:58 PM, Nicolas Capens wrote:> Sounds like an excellent generic approach. I'd love to lend a hand to
> implement it but I'm afraid it involves a little more in-depth LLVM
> knowledge than what I currently master, so I'll leave it to Dave and  
> you for
> now...
Ok.
> Talking about performance, for me this optimization was the difference
> between mem2reg taking 63% of execution time, and having it totally
> 'vanish'. It also marked the transition between rather using a  
> custom JIT
> compiler and getting really excited to use LLVM instead. :)  
> Previously I
> just assumed LLVM to be inherently slow but now I realize there's  
> still
> potential for optimization.
Hehe, we can't fix problems we don't know about :)
> Do you consider it useful if I go ahead and further identify  
> bottlenecks and
> report them or would you rather concentrate on correctness bugs for  
> now? Is
> there any list of known optimization opportunities?
Absolutely.  Different people have different focuses, and I tend to  
consider correctness/miscompilation bugs a higher priority.  However,  
compile time speed is a very high second priority to me.  Bug reports  
with testcases are extremely useful,

-Chris

David Greene

2008-Oct-20 17:04 UTC

head link

[LLVMdev] mem2reg optimization

On Monday 06 October 2008 11:47, David Greene wrote:
> > As far as the approach goes, here's a counterproposal that is a
bit
> > less complex.  I suggest adding a single DenseMap<Instruction*,
> > unsigned> to Mem2reg.  If there is an entry in this map for a load/
> > store instruction, it would store the instruction's index in its
basic
> > block.  This means that to check dominance of two instrs in a block,
> > you just need to compare their two indices (which is fast).  Really
> > small blocks (< 10 instructions?) would never be entered into map,
to
> > keep the size of the map low.  The first time a query is made within a
> > large block, it would compute the indices of all loads and stores
> > within the block.
>
> This is a simpler approach and should have the same effect.  I'll code
it
> up.
Urk.  After looking at doing this today, I realized it won't work.  The 
problem is that RewriteSingleStoreAlloca has this loop:

      for (; &*I != OnlyStore; ++I) { // scan block for store.
        if (isa<LoadInst>(I) && I->getOperand(0) == AI)
          break;

It's looking for a load  that matches the address of the single store.  This
is the loop that's taking all the time.  What I did with the original patch 
was to make lookups of load and store instructions fast.  That won't be 
possible with a map indexing all instructions.

This loop is also a problem:

    // Otherwise, if this is a different block or if all uses happen
    // after the store, do a simple linear scan to replace loads with
    // the stored value.
    for (BasicBlock::iterator I = UseBlock->begin(), E = UseBlock->end();
         I != E; ) {
      if (LoadInst *LI = dyn_cast<LoadInst>(I++)) {
        if (LI->getOperand(0) == AI) {
          LI->replaceAllUsesWith(OnlyStore->getOperand(0));
          if (AST && isa<PointerType>(LI->getType()))
            AST->deleteValue(LI);
          LI->eraseFromParent();
        }
      }
    }

I suppose we could resctrict the map to containing only load and store 
instructions but if you've got a block with a lot of memory instructions
I'm
not sure it will be much faster, especially with the second loop.

The artificial testcase I've come up with does exactly this.  It basically 
does load-load-add-store-call over and over again.

I've got the original patch ready to go.  Shoudl I commit that?  I'm
thinking
if we want to make things simpler later on we can do that and we'll have the
testcase in the repository to catch whether we've simplified too much and
lost
the optimization.

                                                   -Dave

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2008 - [LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

Maybe Matching Threads