thr3ads.net - llvm dev - [llvm-dev] RFC: DenseMap grow() slowness [Mar 2016]

If this information is useful, please help other people find it:
Share via:

via llvm-dev

2016-Mar-15 22:30 UTC

[llvm-dev] RFC: DenseMap grow() slowness

What should we use instead of DenseMap?

—escha
> On Mar 15, 2016, at 3:30 PM, Xinliang David Li <xinliangli at
gmail.com> wrote:
> 
> yes it makes sense. Avoid using DenseMap when the size of the map is
expected to be large but  can not be pre-determined.
> 
> David
> 
> On Tue, Mar 15, 2016 at 3:07 PM, via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> There’s a few passes in LLVM that make heavy use of a big DenseMap, one
that potentially gets filled with up to 1 entry for each instruction in the
function. EarlyCSE is the best example, but Reassociate and MachineCSE have this
to some degree as well (there might be others?). To put it simply: at least in
my profile, EarlyCSE spends ~1/5 of its time growing DenseMaps. This is kind of…
bad.
> 
> grow() is inherently slow because it needs to rehash and reinsert
everything. This means growing a DenseMap costs much, much more than growing,
for example, a vector. I talked about this with a few people and here are some
possibilities we’ve come up with to improve this (some of which probably aren’t
what we want):
> 
> 1. Use a map that doesn’t require rehashing and reinsertion to grow.
Chaining lets you do this, but std::unordered_map is probably so much slower
than DenseMap we’d lose more than we gain.
> 2. Include the hash code in the map so that we don’t have to rehash. 32
bits more per entry (or whatever), and it might not help that much, since we
still have to do the whole reinsertion routine.
> 3. Pre-calculate an estimate as to the map size we need. For example, in
EarlyCSE, this is possibly gross overestimate of size needed:
> 
>   unsigned InstCount = 0;
>   unsigned LoadCount = 0;
>   unsigned CallCount = 0;
>   for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE; ++FI)
{
>     if (FI->mayReadOrWriteMemory())
>       ++LoadCount;
>     else if (isa<CallInst>(*FI))
>       ++CallCount;
>     else
>       ++InstCount;
>   }
>   AvailableValues.resize(InstCount);
>   AvailableLoads.resize(LoadCount);
>   AvailableCalls.resize(CallCount);
> 
> But it does the job, and saves ~20% of time in EarlyCSE on my profiles.
Yes, iterating over the entire function is way cheaper than grow(). Downsides
are that while it’s still bounded by function size, it could end up allocating a
good bit more depending on — in EarlyCSE’s case — the control flow/dominator
structure.
> 
> Any thoughts on this, or other less ugly alternatives? I estimate that, at
least in our pass pipeline, we’re losing at least ~1% of total time to avoidable
DenseMap::grow() operations, which feels a little bit… unnecessary.
> 
> —escha
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/a5487bd4/attachment-0001.html>

Xinliang David Li via llvm-dev

2016-Mar-15 22:52 UTC

head link

[llvm-dev] RFC: DenseMap grow() slowness

In earlyCSE case, the size of DenseMap can be determined ahead of time, so
it is fine to use (assuming the iteration overhead is low).  There are
other factors to consider when using DenseMap. It reduces one level of
indirection by making the buckets an array of key/value pairs (unlike
StringMap where buckets is an array of pointers to key/value pairs). When
the value type is large and is expensive to construct, the overhead of
rehashing DenseMap becomes even higher. The memory overhead also becomes
larger as it has more large holes when the map is sparse.

I don't have a good answer to what the right alternatives (unordered_map,
std::map etc -- it depends many factors depending on insertion and query
patterns, value type, average size of the table etc and worse case
scenarios). It is case by case so doing some experiment with performance
data will be a helpful exercise.

thanks,

david

On Tue, Mar 15, 2016 at 3:30 PM, <escha at apple.com> wrote:
> What should we use instead of DenseMap?
>
> —escha
>
> On Mar 15, 2016, at 3:30 PM, Xinliang David Li <xinliangli at
gmail.com>
> wrote:
>
> yes it makes sense. Avoid using DenseMap when the size of the map is
> expected to be large but  can not be pre-determined.
>
> David
>
> On Tue, Mar 15, 2016 at 3:07 PM, via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
>
>> There’s a few passes in LLVM that make heavy use of a big DenseMap, one
>> that potentially gets filled with up to 1 entry for each instruction in
the
>> function. EarlyCSE is the best example, but Reassociate and MachineCSE
have
>> this to some degree as well (there might be others?). To put it simply:
at
>> least in my profile, EarlyCSE spends ~1/5 of its time growing
DenseMaps.
>> This is kind of… bad.
>>
>> grow() is inherently slow because it needs to rehash and reinsert
>> everything. This means growing a DenseMap costs much, much more than
>> growing, for example, a vector. I talked about this with a few people
and
>> here are some possibilities we’ve come up with to improve this (some of
>> which probably aren’t what we want):
>>
>> 1. Use a map that doesn’t require rehashing and reinsertion to grow.
>> Chaining lets you do this, but std::unordered_map is probably so much
>> slower than DenseMap we’d lose more than we gain.
>> 2. Include the hash code in the map so that we don’t have to rehash. 32
>> bits more per entry (or whatever), and it might not help that much,
since
>> we still have to do the whole reinsertion routine.
>> 3. Pre-calculate an estimate as to the map size we need. For example,
in
>> EarlyCSE, this is possibly gross overestimate of size needed:
>>
>>   unsigned InstCount = 0;
>>   unsigned LoadCount = 0;
>>   unsigned CallCount = 0;
>>   for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;
>> ++FI) {
>>     if (FI->mayReadOrWriteMemory())
>>       ++LoadCount;
>>     else if (isa<CallInst>(*FI))
>>       ++CallCount;
>>     else
>>       ++InstCount;
>>   }
>>   AvailableValues.resize(InstCount);
>>   AvailableLoads.resize(LoadCount);
>>   AvailableCalls.resize(CallCount);
>>
>> But it does the job, and saves ~20% of time in EarlyCSE on my profiles.
>> Yes, iterating over the entire function is way cheaper than grow().
>> Downsides are that while it’s still bounded by function size, it could
end
>> up allocating a good bit more depending on — in EarlyCSE’s case — the
>> control flow/dominator structure.
>>
>> Any thoughts on this, or other less ugly alternatives? I estimate that,
>> at least in our pass pipeline, we’re losing at least ~1% of total time
to
>> avoidable DenseMap::grow() operations, which feels a little bit…
>> unnecessary.
>>
>> —escha
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/680c01d6/attachment.html>

via llvm-dev

2016-Mar-15 22:54 UTC

head link

[llvm-dev] RFC: DenseMap grow() slowness

> On Mar 15, 2016, at 3:52 PM, Xinliang David Li <xinliangli at
gmail.com> wrote:
> 
> In earlyCSE case, the size of DenseMap can be determined ahead of time
Only an upper bound; the actual max size is the number of CSE-able instructions
“live” in scope at any one time (I think), so at least in theory it could be a
gross overestimate.

—escha

llvm dev - Mar 2016 - RFC: DenseMap grow() slowness

[llvm-dev] RFC: DenseMap grow() slowness

[llvm-dev] RFC: DenseMap grow() slowness

[llvm-dev] RFC: DenseMap grow() slowness