On Monday 06 October 2008 11:47, David Greene wrote:> > As far as the approach goes, here's a counterproposal that is a bit > > less complex. I suggest adding a single DenseMap<Instruction*, > > unsigned> to Mem2reg. If there is an entry in this map for a load/ > > store instruction, it would store the instruction's index in its basic > > block. This means that to check dominance of two instrs in a block, > > you just need to compare their two indices (which is fast). Really > > small blocks (< 10 instructions?) would never be entered into map, to > > keep the size of the map low. The first time a query is made within a > > large block, it would compute the indices of all loads and stores > > within the block. > > This is a simpler approach and should have the same effect. I'll code it > up.Urk. After looking at doing this today, I realized it won't work. The problem is that RewriteSingleStoreAlloca has this loop: for (; &*I != OnlyStore; ++I) { // scan block for store. if (isa<LoadInst>(I) && I->getOperand(0) == AI) break; It's looking for a load that matches the address of the single store. This is the loop that's taking all the time. What I did with the original patch was to make lookups of load and store instructions fast. That won't be possible with a map indexing all instructions. This loop is also a problem: // Otherwise, if this is a different block or if all uses happen // after the store, do a simple linear scan to replace loads with // the stored value. for (BasicBlock::iterator I = UseBlock->begin(), E = UseBlock->end(); I != E; ) { if (LoadInst *LI = dyn_cast<LoadInst>(I++)) { if (LI->getOperand(0) == AI) { LI->replaceAllUsesWith(OnlyStore->getOperand(0)); if (AST && isa<PointerType>(LI->getType())) AST->deleteValue(LI); LI->eraseFromParent(); } } } I suppose we could resctrict the map to containing only load and store instructions but if you've got a block with a lot of memory instructions I'm not sure it will be much faster, especially with the second loop. The artificial testcase I've come up with does exactly this. It basically does load-load-add-store-call over and over again. I've got the original patch ready to go. Shoudl I commit that? I'm thinking if we want to make things simpler later on we can do that and we'll have the testcase in the repository to catch whether we've simplified too much and lost the optimization. -Dave
On Oct 20, 2008, at 10:04 AM, David Greene wrote:> On Monday 06 October 2008 11:47, David Greene wrote: > >>> As far as the approach goes, here's a counterproposal that is a bit >>> less complex. I suggest adding a single DenseMap<Instruction*, >>> unsigned> to Mem2reg. If there is an entry in this map for a load/ >>> store instruction, it would store the instruction's index in its >>> basic >>> block. This means that to check dominance of two instrs in a block, >>> you just need to compare their two indices (which is fast). Really >>> small blocks (< 10 instructions?) would never be entered into map, >>> to >>> keep the size of the map low. The first time a query is made >>> within a >>> large block, it would compute the indices of all loads and stores >>> within the block. >> >> This is a simpler approach and should have the same effect. I'll >> code it >> up. > > Urk. After looking at doing this today, I realized it won't work. > The > problem is that RewriteSingleStoreAlloca has this loop: > > for (; &*I != OnlyStore; ++I) { // scan block for store. > if (isa<LoadInst>(I) && I->getOperand(0) == AI) > break; > > It's looking for a load that matches the address of the single > store. This > is the loop that's taking all the time. What I did with the > original patch > was to make lookups of load and store instructions fast. That won't > be > possible with a map indexing all instructions.Ok, if you need this direction, just use an std::vector<pair<unsigned,Instruction*>> as a forward map? You can do fast binary searches in it.> I suppose we could resctrict the map to containing only load and store > instructions but if you've got a block with a lot of memory > instructions I'm > not sure it will be much faster, especially with the second loop.Yes, you only want load and stores that directly target allocas. This won't impact the complexity of the routine but will be a nice speedup in practice I suspect.> The artificial testcase I've come up with does exactly this. It > basically > does load-load-add-store-call over and over again.It would be nice if you could share the artificial testcase, e.g. by attaching it to a bugzilla.> I've got the original patch ready to go. Shoudl I commit that? I'm > thinking > if we want to make things simpler later on we can do that and we'll > have the > testcase in the repository to catch whether we've simplified too > much and lost > the optimization.I'd really rather stick with the simple approach, instead of checking in the complex one and hoping someone fixes it later. Thanks Dave, -Chris
On Monday 20 October 2008 16:14, Chris Lattner wrote:> Ok, if you need this direction, just use an > std::vector<pair<unsigned,Instruction*>> as a forward map? You can do > fast binary searches in it.I don't see how that helps. I need a map from BasicBlock to a list of loads and stores.> > I suppose we could resctrict the map to containing only load and store > > instructions but if you've got a block with a lot of memory > > instructions I'm > > not sure it will be much faster, especially with the second loop. > > Yes, you only want load and stores that directly target allocas. This > won't impact the complexity of the routine but will be a nice speedup > in practice I suspect.But complexity is exactly what we have to attack. We're talking LARGE basic blocks here.> > The artificial testcase I've come up with does exactly this. It > > basically > > does load-load-add-store-call over and over again. > > It would be nice if you could share the artificial testcase, e.g. by > attaching it to a bugzilla.I've got it and will file the bug.> > I've got the original patch ready to go. Should I commit that? I'm > > thinking > > if we want to make things simpler later on we can do that and we'll > > have the > > testcase in the repository to catch whether we've simplified too > > much and lost > > the optimization. > > I'd really rather stick with the simple approach, instead of checking > in the complex one and hoping someone fixes it later. Thanks Dave,Fundamentally, we need a map from BasicBlock to a list of ordered loads and stores in that block. So that patch creates two maps: one from BasicBlock to a list of loads tagged with ordering information and one from BasicBlock to a list of stores tagged with ordering information. Now, I might be able to merge those maps but you'd still be walking through a lot of unnecessary instructions (skipping loads when looking for stores and vice-versa). It's this iteration through lists of instructions that's killing compile time. Thus we should make it as quick as possible. -Dave
On Monday 20 October 2008 16:14, Chris Lattner wrote:> It would be nice if you could share the artificial testcase, e.g. by > attaching it to a bugzilla.Feh. The file you are trying to attach is 1195 kilobytes (KB) in size. Non-patch attachments cannot be more than 1000 KB. We recommend that you store your attachment elsewhere on the web, and then insert the URL to it in a comment, or in the URL field for this bug. Alternately, if your attachment is an image, you could convert it to a compressible format like JPG or PNG and try again. This is not particularly user-friendly. :) -Dave