> On Mar 15, 2016, at 4:09 PM, Philip Reames <listmail at philipreames.com> wrote: > > > > On 03/15/2016 03:07 PM, via llvm-dev wrote: >> There’s a few passes in LLVM that make heavy use of a big DenseMap, one that potentially gets filled with up to 1 entry for each instruction in the function. EarlyCSE is the best example, but Reassociate and MachineCSE have this to some degree as well (there might be others?). To put it simply: at least in my profile, EarlyCSE spends ~1/5 of its time growing DenseMaps. This is kind of… bad. >> >> grow() is inherently slow because it needs to rehash and reinsert everything. This means growing a DenseMap costs much, much more than growing, for example, a vector. I talked about this with a few people and here are some possibilities we’ve come up with to improve this (some of which probably aren’t what we want): >> >> 1. Use a map that doesn’t require rehashing and reinsertion to grow. Chaining lets you do this, but std::unordered_map is probably so much slower than DenseMap we’d lose more than we gain. >> 2. Include the hash code in the map so that we don’t have to rehash. 32 bits more per entry (or whatever), and it might not help that much, since we still have to do the whole reinsertion routine. >> 3. Pre-calculate an estimate as to the map size we need. For example, in EarlyCSE, this is possibly gross overestimate of size needed: >> >> unsigned InstCount = 0; >> unsigned LoadCount = 0; >> unsigned CallCount = 0; >> for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE; ++FI) { >> if (FI->mayReadOrWriteMemory()) >> ++LoadCount; >> else if (isa<CallInst>(*FI)) >> ++CallCount; >> else >> ++InstCount; >> } >> AvailableValues.resize(InstCount); >> AvailableLoads.resize(LoadCount); >> AvailableCalls.resize(CallCount); >> >> But it does the job, and saves ~20% of time in EarlyCSE on my profiles. Yes, iterating over the entire function is way cheaper than grow(). Downsides are that while it’s still bounded by function size, it could end up allocating a good bit more depending on — in EarlyCSE’s case — the control flow/dominator structure. >> >> Any thoughts on this, or other less ugly alternatives? I estimate that, at least in our pass pipeline, we’re losing at least ~1% of total time to avoidable DenseMap::grow() operations, which feels a little bit… unnecessary. > > Slightly OT, but it looks like the LoadValue (value type of the AvailableLoads structure) is relatively memory inefficient. At minimum, we could get rid of the IsAtomic space by using a tagged pointer. That would at least bring us down to a 128 bits (a nice power of two). That might help speed up some of the copying.Just to note, it looks like I was testing on a somewhat older LLVM version that didn’t have LoadValue at all, so my guess is this means it’s even -worse- in trunk. —escha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/a5d95881/attachment.html>
On 03/15/2016 04:22 PM, escha at apple.com wrote:> >> On Mar 15, 2016, at 4:09 PM, Philip Reames <listmail at philipreames.com >> <mailto:listmail at philipreames.com>> wrote: >> >> >> >> On 03/15/2016 03:07 PM, via llvm-dev wrote: >>> There’s a few passes in LLVM that make heavy use of a big DenseMap, >>> one that potentially gets filled with up to 1 entry for each >>> instruction in the function. EarlyCSE is the best example, but >>> Reassociate and MachineCSE have this to some degree as well (there >>> might be others?). To put it simply: at least in my profile, >>> EarlyCSE spends ~1/5 of its time growing DenseMaps. This is kind of… >>> bad. >>> >>> grow() is inherently slow because it needs to rehash and reinsert >>> everything. This means growing a DenseMap costs much, much more than >>> growing, for example, a vector. I talked about this with a few >>> people and here are some possibilities we’ve come up with to improve >>> this (some of which probably aren’t what we want): >>> >>> 1. Use a map that doesn’t require rehashing and reinsertion to grow. >>> Chaining lets you do this, but std::unordered_map is probably so >>> much slower than DenseMap we’d lose more than we gain. >>> 2. Include the hash code in the map so that we don’t have to rehash. >>> 32 bits more per entry (or whatever), and it might not help that >>> much, since we still have to do the whole reinsertion routine. >>> 3. Pre-calculate an estimate as to the map size we need. For >>> example, in EarlyCSE, this is possibly gross overestimate of size >>> needed: >>> >>> unsigned InstCount = 0; >>> unsigned LoadCount = 0; >>> unsigned CallCount = 0; >>> for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE; >>> ++FI) { >>> if(FI->mayReadOrWriteMemory()) >>> ++LoadCount; >>> else if (isa<CallInst>(*FI)) >>> ++CallCount; >>> else >>> ++InstCount; >>> } >>> AvailableValues.resize(InstCount); >>> AvailableLoads.resize(LoadCount); >>> AvailableCalls.resize(CallCount); >>> >>> But it does the job, and saves ~20% of time in EarlyCSE on my >>> profiles. Yes, iterating over the entire function is way cheaper >>> than grow(). Downsides are that while it’s still bounded by function >>> size, it could end up allocating a good bit more depending on — in >>> EarlyCSE’s case — the control flow/dominator structure. >>> >>> Any thoughts on this, or other less ugly alternatives? I estimate >>> that, at least in our pass pipeline, we’re losing at least ~1% of >>> total time to avoidable DenseMap::grow() operations, which feels a >>> little bit… unnecessary. >> >> Slightly OT, but it looks like the LoadValue (value type of the >> AvailableLoads structure) is relatively memory inefficient. At >> minimum, we could get rid of the IsAtomic space by using a tagged >> pointer. That would at least bring us down to a 128 bits (a nice >> power of two). That might help speed up some of the copying. > > Just to note, it looks like I was testing on a somewhat older LLVM > version that didn’t have LoadValue at all, so my guess is this means > it’s even -worse- in trunk.Er, LoadValue's been around for a while (6 months). How far back are you testing? I'd strongly suggest switching to something more recent. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/6af5cf32/attachment.html>
> On Mar 15, 2016, at 4:56 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Er, LoadValue's been around for a while (6 months). How far back are you testing? I'd strongly suggest switching to something more recent.Some of us have to support internal release branches for non-trivial amounts of time. —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/2530c7d4/attachment.html>