On Monday 20 October 2008 16:14, Chris Lattner wrote:> Ok, if you need this direction, just use an > std::vector<pair<unsigned,Instruction*>> as a forward map? You can do > fast binary searches in it.I don't see how that helps. I need a map from BasicBlock to a list of loads and stores.> > I suppose we could resctrict the map to containing only load and store > > instructions but if you've got a block with a lot of memory > > instructions I'm > > not sure it will be much faster, especially with the second loop. > > Yes, you only want load and stores that directly target allocas. This > won't impact the complexity of the routine but will be a nice speedup > in practice I suspect.But complexity is exactly what we have to attack. We're talking LARGE basic blocks here.> > The artificial testcase I've come up with does exactly this. It > > basically > > does load-load-add-store-call over and over again. > > It would be nice if you could share the artificial testcase, e.g. by > attaching it to a bugzilla.I've got it and will file the bug.> > I've got the original patch ready to go. Should I commit that? I'm > > thinking > > if we want to make things simpler later on we can do that and we'll > > have the > > testcase in the repository to catch whether we've simplified too > > much and lost > > the optimization. > > I'd really rather stick with the simple approach, instead of checking > in the complex one and hoping someone fixes it later. Thanks Dave,Fundamentally, we need a map from BasicBlock to a list of ordered loads and stores in that block. So that patch creates two maps: one from BasicBlock to a list of loads tagged with ordering information and one from BasicBlock to a list of stores tagged with ordering information. Now, I might be able to merge those maps but you'd still be walking through a lot of unnecessary instructions (skipping loads when looking for stores and vice-versa). It's this iteration through lists of instructions that's killing compile time. Thus we should make it as quick as possible. -Dave
On Oct 21, 2008, at 10:16 AM, David Greene wrote:> On Monday 20 October 2008 16:14, Chris Lattner wrote: > >> Ok, if you need this direction, just use an >> std::vector<pair<unsigned,Instruction*>> as a forward map? You can >> do >> fast binary searches in it. > > I don't see how that helps. I need a map from BasicBlock to a list > of loads > and stores.This isn't needed after all.>>> I suppose we could resctrict the map to containing only load and >>> store >>> instructions but if you've got a block with a lot of memory >>> instructions I'm >>> not sure it will be much faster, especially with the second loop. >> >> Yes, you only want load and stores that directly target allocas. >> This >> won't impact the complexity of the routine but will be a nice speedup >> in practice I suspect. > > But complexity is exactly what we have to attack. We're talking > LARGE basic > blocks here.The point is to have low complexity and low constant factor.>>> The artificial testcase I've come up with does exactly this. It >>> basically >>> does load-load-add-store-call over and over again. >> >> It would be nice if you could share the artificial testcase, e.g. by >> attaching it to a bugzilla. > > I've got it and will file the bug.Thank you! I committed a patch that provides a ~230x speedup on that testcase. If you run into other similar problems, please file another bugzilla.> > >>> I've got the original patch ready to go. Should I commit that? I'm >>> thinking >>> if we want to make things simpler later on we can do that and we'll >>> have the >>> testcase in the repository to catch whether we've simplified too >>> much and lost >>> the optimization. >> >> I'd really rather stick with the simple approach, instead of checking >> in the complex one and hoping someone fixes it later. Thanks Dave, > > Fundamentally, we need a map from BasicBlock to a list of ordered > loads and > stores in that block.I don't see why. Please take a look at my patch, does it seem reasonable? This is a very simple approach and seems like it works just fine: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081027/068967.html Are you seeing cases where lots of time is spent in RewriteSingleStoreAlloca[s]? If so, please provide a testcase and I can apply the obvious fix. -Chris
On Oct 26, 2008, at 11:09 PM, Chris Lattner wrote:>> Fundamentally, we need a map from BasicBlock to a list of ordered >> loads and >> stores in that block. > > Are you seeing cases where lots of time is spent in > RewriteSingleStoreAlloca[s]? If so, please provide a testcase and I > can apply the obvious fix.Okay, I decided to just go ahead and make the change, with the justification that it is a significant code cleanup to mem2reg. Here's the patch: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081027/068970.html Are there any other places which exhibit bad behavior with large blocks in mem2reg? I see a linear scan over instructions in RenamePass. If you can produce a testcase which shows a problem there, I'd be happy to help fix it. -Chris