thr3ads.net - llvm dev - [llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?) [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2015-Sep-14 18:34 UTC

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

I did something similar for dominators, for GVN, etc.
All see significant speedups.

However, the answer i got back when i mentioned this was "things like
ptrset and densemap should only have a small performance difference
from side data when used and sized right", and i've found this to
mostly be true after looking harder.

In the case you are looking at, i see:

-  SmallPtrSet<Instruction*, 128> Alive;

This seems ... wrong.
In fact, it seems optimally bad.

This says the small ptr set size is 128.
That is, the smallsize is 128.

SmallPtrSet will linear search the array when it's size <= smallsize,
and otherwise fall back to building a non-small array and using better
algorithms.

Linear searching a 128 member array == slow

I bet if you change the 128 to 8, you will see significant speedups.

On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Are there any other passes that could benefit from having a single bit
(or similarly small amount) of per-Instruction metadata local to the pass, i.e.
to avoid having to keep a side-Set of instructions...
>
> FWIW, I have a target specific pass that needs exactly this sort of
> tracking.  Once you offer 1 bit, it's a slippery slope to giving a
> whole pointer.  I like your idea regardless.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

escha via llvm-dev

2015-Sep-14 18:41 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

I tried changing it to 16 and only saw a very small speedup. I’m pretty sure
that no matter how you slice it, adding every single instruction in a function
to a set is going to be expensive.

—escha
> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> I did something similar for dominators, for GVN, etc.
> All see significant speedups.
> 
> However, the answer i got back when i mentioned this was "things like
> ptrset and densemap should only have a small performance difference
> from side data when used and sized right", and i've found this to
> mostly be true after looking harder.
> 
> In the case you are looking at, i see:
> 
> -  SmallPtrSet<Instruction*, 128> Alive;
> 
> This seems ... wrong.
> In fact, it seems optimally bad.
> 
> 
> This says the small ptr set size is 128.
> That is, the smallsize is 128.
> 
> SmallPtrSet will linear search the array when it's size <=
smallsize,
> and otherwise fall back to building a non-small array and using better
> algorithms.
> 
> Linear searching a 128 member array == slow
> 
> I bet if you change the 128 to 8, you will see significant speedups.
> 
> 
> 
> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> Are there any other passes that could benefit from having a single
bit (or similarly small amount) of per-Instruction metadata local to the pass,
i.e. to avoid having to keep a side-Set of instructions...
>> 
>> FWIW, I have a target specific pass that needs exactly this sort of
>> tracking.  Once you offer 1 bit, it's a slippery slope to giving a
>> whole pointer.  I like your idea regardless.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Philip Reames via llvm-dev

2015-Sep-14 19:03 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Given the use case here, I somewhat agree.  In the common case, you're 
going to need an extra O(I) memory to represent the alive set.  However, 
there are definitely ways to do better without needing transient data 
stored in the instruction itself.

Have you considered a bloom filter or something equivalent for testing 
aliveness?  We don't actually need a precise result here - right?.  If 
we're willing to tolerate a slightly imprecise result (in theory), using 
a bloom filter could give noticeable memory reduction.

p.s. I disagree with Daniel on sizing.  While linear search over a 128 
element array is somewhat slow, memory allocation/churn can be far far 
worse.  I haven't looked at this particular use case, but in general, 
larger small sizes for frequently called functions can make a lot of sense.

Philip

On 09/14/2015 11:41 AM, escha via llvm-dev wrote:> I tried changing it to 16 and only saw a very small speedup. I’m pretty
sure that no matter how you slice it, adding every single instruction in a
function to a set is going to be expensive.
>
> —escha
>
>> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> I did something similar for dominators, for GVN, etc.
>> All see significant speedups.
>>
>> However, the answer i got back when i mentioned this was "things
like
>> ptrset and densemap should only have a small performance difference
>> from side data when used and sized right", and i've found this
to
>> mostly be true after looking harder.
>>
>> In the case you are looking at, i see:
>>
>> -  SmallPtrSet<Instruction*, 128> Alive;
>>
>> This seems ... wrong.
>> In fact, it seems optimally bad.
>>
>>
>> This says the small ptr set size is 128.
>> That is, the smallsize is 128.
>>
>> SmallPtrSet will linear search the array when it's size <=
smallsize,
>> and otherwise fall back to building a non-small array and using better
>> algorithms.
>>
>> Linear searching a 128 member array == slow
>>
>> I bet if you change the 128 to 8, you will see significant speedups.
>>
>>
>>
>> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> Are there any other passes that could benefit from having a
single bit (or similarly small amount) of per-Instruction metadata local to the
pass, i.e. to avoid having to keep a side-Set of instructions...
>>> FWIW, I have a target specific pass that needs exactly this sort of
>>> tracking.  Once you offer 1 bit, it's a slippery slope to
giving a
>>> whole pointer.  I like your idea regardless.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Daniel Berlin via llvm-dev

2015-Sep-14 21:31 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

On Mon, Sep 14, 2015 at 11:41 AM, escha <escha at apple.com>
wrote:> I tried changing it to 16 and only saw a very small speedup. I’m pretty
sure that no matter how you slice it, adding every single
> instruction in a function to a set is going to be expensive.
Yes, it will be.

For reference, we had two ways of doing this in GCC:

One, things had id numbers, so we could create bitmaps that told us
what was in a set without storing pointers :)
There were also flags, but in practice, flags were as fast as the bitmaps.

>
> —escha
>
>> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>> I did something similar for dominators, for GVN, etc.
>> All see significant speedups.
>>
>> However, the answer i got back when i mentioned this was "things
like
>> ptrset and densemap should only have a small performance difference
>> from side data when used and sized right", and i've found this
to
>> mostly be true after looking harder.
>>
>> In the case you are looking at, i see:
>>
>> -  SmallPtrSet<Instruction*, 128> Alive;
>>
>> This seems ... wrong.
>> In fact, it seems optimally bad.
>>
>>
>> This says the small ptr set size is 128.
>> That is, the smallsize is 128.
>>
>> SmallPtrSet will linear search the array when it's size <=
smallsize,
>> and otherwise fall back to building a non-small array and using better
>> algorithms.
>>
>> Linear searching a 128 member array == slow
>>
>> I bet if you change the 128 to 8, you will see significant speedups.
>>
>>
>>
>> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> Are there any other passes that could benefit from having a
single bit (or similarly small amount) of per-Instruction metadata local to the
pass, i.e. to avoid having to keep a side-Set of instructions...
>>>
>>> FWIW, I have a target specific pass that needs exactly this sort of
>>> tracking.  Once you offer 1 bit, it's a slippery slope to
giving a
>>> whole pointer.  I like your idea regardless.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Sep 2015 - RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)