thr3ads.net - llvm dev - [LLVMdev] mem2reg optimization [Oct 2008]

If this information is useful, please help other people find it:
Share via:

David Greene

2008-Oct-21 17:16 UTC

[LLVMdev] mem2reg optimization

On Monday 20 October 2008 16:14, Chris Lattner wrote:
> Ok, if you need this direction, just use an
> std::vector<pair<unsigned,Instruction*>> as a forward map?  You
can do
> fast binary searches in it.
I don't see how that helps.  I need a map from BasicBlock to a list of loads
and stores.
> > I suppose we could resctrict the map to containing only load and store
> > instructions but if you've got a block with a lot of memory
> > instructions I'm
> > not sure it will be much faster, especially with the second loop.
>
> Yes, you only want load and stores that directly target allocas.  This
> won't impact the complexity of the routine but will be a nice speedup
> in practice I suspect.
But complexity is exactly what we have to attack.  We're talking LARGE basic
blocks here.
> > The artificial testcase I've come up with does exactly this.  It
> > basically
> > does load-load-add-store-call over and over again.
>
> It would be nice if you could share the artificial testcase, e.g. by
> attaching it to a bugzilla.
I've got it and will file the bug.
> > I've got the original patch ready to go.  Should I commit that? 
I'm
> > thinking
> > if we want to make things simpler later on we can do that and
we'll
> > have the
> > testcase in the repository to catch whether we've simplified too
> > much and lost
> > the optimization.
>
> I'd really rather stick with the simple approach, instead of checking
> in the complex one and hoping someone fixes it later.  Thanks Dave,
Fundamentally, we need a map from BasicBlock to a list of ordered loads and 
stores in that block.  So that patch creates two maps: one from BasicBlock to 
a list of loads tagged with ordering information and one from BasicBlock to a 
list of stores tagged with ordering information.  Now, I might be able to 
merge those maps but you'd still be walking through a lot of unnecessary 
instructions (skipping loads when looking for stores and vice-versa).  It's 
this iteration through lists of instructions that's killing compile time.  
Thus we should make it as quick as possible.

                                              -Dave

Chris Lattner

2008-Oct-27 06:09 UTC

head link

[LLVMdev] mem2reg optimization

On Oct 21, 2008, at 10:16 AM, David Greene wrote:
> On Monday 20 October 2008 16:14, Chris Lattner wrote:
>
>> Ok, if you need this direction, just use an
>> std::vector<pair<unsigned,Instruction*>> as a forward map? 
You can
>> do
>> fast binary searches in it.
>
> I don't see how that helps.  I need a map from BasicBlock to a list  
> of loads
> and stores.
This isn't needed after all.
>>> I suppose we could resctrict the map to containing only load and  
>>> store
>>> instructions but if you've got a block with a lot of memory
>>> instructions I'm
>>> not sure it will be much faster, especially with the second loop.
>>
>> Yes, you only want load and stores that directly target allocas.   
>> This
>> won't impact the complexity of the routine but will be a nice
speedup
>> in practice I suspect.
>
> But complexity is exactly what we have to attack.  We're talking  
> LARGE basic
> blocks here.
The point is to have low complexity and low constant factor.
>>> The artificial testcase I've come up with does exactly this. 
It
>>> basically
>>> does load-load-add-store-call over and over again.
>>
>> It would be nice if you could share the artificial testcase, e.g. by
>> attaching it to a bugzilla.
>
> I've got it and will file the bug.
Thank you!  I committed a patch that provides a ~230x speedup on that  
testcase.  If you run into other similar problems, please file another  
bugzilla.
>
>
>>> I've got the original patch ready to go.  Should I commit that?
I'm
>>> thinking
>>> if we want to make things simpler later on we can do that and
we'll
>>> have the
>>> testcase in the repository to catch whether we've simplified
too
>>> much and lost
>>> the optimization.
>>
>> I'd really rather stick with the simple approach, instead of
checking
>> in the complex one and hoping someone fixes it later.  Thanks Dave,
>
> Fundamentally, we need a map from BasicBlock to a list of ordered  
> loads and
> stores in that block.
I don't see why.  Please take a look at my patch, does it seem  
reasonable?  This is a very simple approach and seems like it works  
just fine:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081027/068967.html

Are you seeing cases where lots of time is spent in  
RewriteSingleStoreAlloca[s]?  If so, please provide a testcase and I  
can apply the obvious fix.

-Chris

Chris Lattner

2008-Oct-27 07:08 UTC

head link

[LLVMdev] mem2reg optimization

On Oct 26, 2008, at 11:09 PM, Chris Lattner wrote:>> Fundamentally, we need a map from BasicBlock to a list of ordered
>> loads and
>> stores in that block.
>
> Are you seeing cases where lots of time is spent in
> RewriteSingleStoreAlloca[s]?  If so, please provide a testcase and I
> can apply the obvious fix.
Okay, I decided to just go ahead and make the change, with the  
justification that it is a significant code cleanup to mem2reg.   
Here's the patch:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081027/068970.html

Are there any other places which exhibit bad behavior with large  
blocks in mem2reg?  I see a linear scan over instructions in  
RenamePass.  If you can produce a testcase which shows a problem  
there, I'd be happy to help fix it.

-Chris

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Oct 2008 - [LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

[LLVMdev] mem2reg optimization

Possibly Parallel Threads