thr3ads.net - llvm dev - [llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?) [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Owen Anderson via llvm-dev

2015-Sep-15 21:16 UTC

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>> 
>> 
>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>>>> I would assume that it’s just considered to be garbage. I feel
like any sort of per-pass side data like this should come with absolute minimal
contracts, to avoid introducing any more inter-pass complexity.
>>> I would think this would need to be a verifier error if it were
ever non-0
>> +1
>> 
>> Otherwise every pass which ever needs this bit would have to first zero
it out just to be safe, adding an extra walk over the whole functions.
>> 
>> Of course otherwise the pass modifying it will have to zero it, which
could also be a walk over the whole function.  So either way you have lots
iterate over, which is why i’m weary of this approach tbh.
> 
> Every pass which ever uses this internally would have to set it to zero
when it is done, adding an extra walk over the whole functions as you noticed.
This goes against “you don’t pay for what you don’t use”, so definitively -1 for
this. Better to cleanup before use.
> I agree that the approach does not scale/generalize well, and we should try
to find an alternative if possible. Now *if* it is the only way to improve
performance significantly, we might have to weight the tradeoff.
Does anyone have any concrete alternative suggestions to achieve the speedup
demonstrated here?

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/b6c7d7ff/attachment.html>

Hal Finkel via llvm-dev

2015-Sep-15 21:34 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

----- Original Message ----- 
> From: "Owen Anderson via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Mehdi Amini" <mehdi.amini at apple.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Tuesday, September 15, 2015 4:16:58 PM
> Subject: Re: [llvm-dev] RFC: speedups with instruction side-data
> (ADCE, perhaps others?)
> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
> I would assume that it’s just considered to be garbage. I feel like
> any sort of per-pass side data like this should come with absolute
> minimal contracts, to avoid introducing any more inter-pass
> complexity.
> I would think this would need to be a verifier error if it were ever
> non-0
> +1
> Otherwise every pass which ever needs this bit would have to first
> zero it out just to be safe, adding an extra walk over the whole
> functions.
> Of course otherwise the pass modifying it will have to zero it, which
> could also be a walk over the whole function. So either way you have
> lots iterate over, which is why i’m weary of this approach tbh.
> Every pass which ever uses this internally would have to set it to
> zero when it is done, adding an extra walk over the whole functions
> as you noticed. This goes against “you don’t pay for what you don’t
> use”, so definitively -1 for this. Better to cleanup before use.
> I agree that the approach does not scale/generalize well, and we
> should try to find an alternative if possible. Now *if* it is the
> only way to improve performance significantly, we might have to
> weight the tradeoff.
> Does anyone have any concrete alternative suggestions to achieve the
> speedup demonstrated here?
Are these actually free bits, or does this increase the size of Value? If
they're free, how many free bits do we have in Value and does this differ on
common 32-bit and 64-bit systems?

Also, this:
> I should note, when I was writing my simplifyInstructionsInBlock
> replacement, I got a similarly large speedup (~1000ms -> 700ms) with
> the trick where I only added instructions to the worklist if they
> needed to be /revisited/, as opposed to initting the entire worklist
> with all the instructions. This is likely a similar gain as to what
> I got here because it’s the difference between “put everything in
> the function in a set” and “putting only a few things from the
> function in a set”, so the gain from Not Putting Tons Of Stuff In A
> Set seems to be pretty consistent across widely varying use-cases.
Can this be generalized to other relevant use cases?

 -Hal
> —Owen
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> --Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-- 
Hal Finkel Assistant Computational Scientist Leadership Computing Facility
Argonne National Laboratory

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Pete Cooper via llvm-dev

2015-Sep-15 21:49 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com>
wrote:
> 
> 
>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> 
>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> 
>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>>>>> I would assume that it’s just considered to be garbage. I
feel like any sort of per-pass side data like this should come with absolute
minimal contracts, to avoid introducing any more inter-pass complexity.
>>>> I would think this would need to be a verifier error if it were
ever non-0
>>> +1
>>> 
>>> Otherwise every pass which ever needs this bit would have to first
zero it out just to be safe, adding an extra walk over the whole functions.
>>> 
>>> Of course otherwise the pass modifying it will have to zero it,
which could also be a walk over the whole function.  So either way you have lots
iterate over, which is why i’m weary of this approach tbh.
>> 
>> Every pass which ever uses this internally would have to set it to zero
when it is done, adding an extra walk over the whole functions as you noticed.
This goes against “you don’t pay for what you don’t use”, so definitively -1 for
this. Better to cleanup before use.
>> I agree that the approach does not scale/generalize well, and we should
try to find an alternative if possible. Now *if* it is the only way to improve
performance significantly, we might have to weight the tradeoff.
> 
> Does anyone have any concrete alternative suggestions to achieve the
speedup demonstrated here?Honestly, not an alternative to speedup ADCE.  This appears to be the fastest
way to improve that particular pass.

My worry here comes from prior experience.  I have done this exactly change
years ago on a non-LLVM compiler.  Adding one bit to each instruction was great,
we used it in a few passes, then we found we needed a second bit, then a third. 
That quickly became unmanageable, and if we add this bit then we put ourselves
on a repeat of that.

I think we should weigh up what this change gets us in overall compile time.  A
30% speedup in ADCE is great, and if a few other passes would benefit from this
and get similar speedups, then even better.  However, if thats only worth say 1%
of overall compile time, then i’d argue its not worth it.

Pete> 
> —Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/6a5a9bd3/attachment.html>

Marcello Maggioni via llvm-dev

2015-Sep-15 22:19 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Here the main problem seem to be that we don’t have a fast way to check for “is
this in a certain set”.

In my opinion here the problem is that we don’t have a good (read fast) way for
checking that a certain condition applies to an llvm Value.
I believe this bit is trying to address that.

Maybe there is room for improvement in LLVM set-like data structures?
Maybe there is a way to use a BitVector instead and generate from the
Value/Instruction a unique linear ID that can be used with that?

Marcello> On 15 Sep 2015, at 14:49, Pete Cooper via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
>> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com
<mailto:resistor at mac.com>> wrote:
>> 
>> 
>>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>>> 
>>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> 
>>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>>>>>> I would assume that it’s just considered to be garbage.
I feel like any sort of per-pass side data like this should come with absolute
minimal contracts, to avoid introducing any more inter-pass complexity.
>>>>> I would think this would need to be a verifier error if it
were ever non-0
>>>> +1
>>>> 
>>>> Otherwise every pass which ever needs this bit would have to
first zero it out just to be safe, adding an extra walk over the whole
functions.
>>>> 
>>>> Of course otherwise the pass modifying it will have to zero it,
which could also be a walk over the whole function.  So either way you have lots
iterate over, which is why i’m weary of this approach tbh.
>>> 
>>> Every pass which ever uses this internally would have to set it to
zero when it is done, adding an extra walk over the whole functions as you
noticed. This goes against “you don’t pay for what you don’t use”, so
definitively -1 for this. Better to cleanup before use.
>>> I agree that the approach does not scale/generalize well, and we
should try to find an alternative if possible. Now *if* it is the only way to
improve performance significantly, we might have to weight the tradeoff.
>> 
>> Does anyone have any concrete alternative suggestions to achieve the
speedup demonstrated here?
> Honestly, not an alternative to speedup ADCE.  This appears to be the
fastest way to improve that particular pass.
> 
> My worry here comes from prior experience.  I have done this exactly change
years ago on a non-LLVM compiler.  Adding one bit to each instruction was great,
we used it in a few passes, then we found we needed a second bit, then a third. 
That quickly became unmanageable, and if we add this bit then we put ourselves
on a repeat of that.
> 
> I think we should weigh up what this change gets us in overall compile
time.  A 30% speedup in ADCE is great, and if a few other passes would benefit
from this and get similar speedups, then even better.  However, if thats only
worth say 1% of overall compile time, then i’d argue its not worth it.
> 
> Pete
>> 
>> —Owen
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/f62be8ff/attachment.html>

Mehdi Amini via llvm-dev

2015-Sep-15 23:05 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com
<mailto:resistor at mac.com>> wrote:
> 
>> 
>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> 
>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> 
>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>>>>> I would assume that it’s just considered to be garbage. I
feel like any sort of per-pass side data like this should come with absolute
minimal contracts, to avoid introducing any more inter-pass complexity.
>>>> I would think this would need to be a verifier error if it were
ever non-0
>>> +1
>>> 
>>> Otherwise every pass which ever needs this bit would have to first
zero it out just to be safe, adding an extra walk over the whole functions.
>>> 
>>> Of course otherwise the pass modifying it will have to zero it,
which could also be a walk over the whole function.  So either way you have lots
iterate over, which is why i’m weary of this approach tbh.
>> 
>> Every pass which ever uses this internally would have to set it to zero
when it is done, adding an extra walk over the whole functions as you noticed.
This goes against “you don’t pay for what you don’t use”, so definitively -1 for
this. Better to cleanup before use.
>> I agree that the approach does not scale/generalize well, and we should
try to find an alternative if possible. Now *if* it is the only way to improve
performance significantly, we might have to weight the tradeoff.
> 
> Does anyone have any concrete alternative suggestions to achieve the
speedup demonstrated here?
It might also be that no one really started to think about the alternatives, but
the RFC is pretty narrow (the scaling/generalizing part I mentioned, what if we
need 2 bits in the future? 3? etc.) at this point. The absence of considered
alternatives is not an reason by itself to get intrusive changes in. Now as I
said, it might be the best/only way to go, it just hasn’t been demonstrated.

By the way talking about speedup, what is the impact on the compile time /
memory usage mesured for a clang run on the public and on our internal
test-suites? Can we have an LNT run result posted here?

Thanks,

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/9554193a/attachment.html>

Mehdi Amini via llvm-dev

2015-Sep-15 23:07 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 15, 2015, at 2:49 PM, Pete Cooper <peter_cooper at apple.com>
wrote:
> 
> 
>> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com
<mailto:resistor at mac.com>> wrote:
>> 
>> 
>>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>>> 
>>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> 
>>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>>>>>> I would assume that it’s just considered to be garbage.
I feel like any sort of per-pass side data like this should come with absolute
minimal contracts, to avoid introducing any more inter-pass complexity.
>>>>> I would think this would need to be a verifier error if it
were ever non-0
>>>> +1
>>>> 
>>>> Otherwise every pass which ever needs this bit would have to
first zero it out just to be safe, adding an extra walk over the whole
functions.
>>>> 
>>>> Of course otherwise the pass modifying it will have to zero it,
which could also be a walk over the whole function.  So either way you have lots
iterate over, which is why i’m weary of this approach tbh.
>>> 
>>> Every pass which ever uses this internally would have to set it to
zero when it is done, adding an extra walk over the whole functions as you
noticed. This goes against “you don’t pay for what you don’t use”, so
definitively -1 for this. Better to cleanup before use.
>>> I agree that the approach does not scale/generalize well, and we
should try to find an alternative if possible. Now *if* it is the only way to
improve performance significantly, we might have to weight the tradeoff.
>> 
>> Does anyone have any concrete alternative suggestions to achieve the
speedup demonstrated here?
> Honestly, not an alternative to speedup ADCE.  This appears to be the
fastest way to improve that particular pass.
> 
> My worry here comes from prior experience.  I have done this exactly change
years ago on a non-LLVM compiler.  Adding one bit to each instruction was great,
we used it in a few passes, then we found we needed a second bit, then a third. 
That quickly became unmanageable, and if we add this bit then we put ourselves
on a repeat of that.
> 
> I think we should weigh up what this change gets us in overall compile
time.  A 30% speedup in ADCE is great, and if a few other passes would benefit
from this and get similar speedups, then even better.  However, if thats only
worth say 1% of overall compile time, then i’d argue its not worth it.
My email client was out-of sync when I just answered, and catching-up I see that
Pete summarized perfectly the way I was seeing it.

— 
Mehdi



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/ee76fd97/attachment.html>

Daniel Berlin via llvm-dev

2015-Sep-16 01:50 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Can someone provide the file used to demonstrate the speedup here?
I'd be glad to take a quick crack at seeing if i can achieve the same
speedup.


On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>
> I would assume that it’s just considered to be garbage. I feel like any
sort
> of per-pass side data like this should come with absolute minimal
contracts,
> to avoid introducing any more inter-pass complexity.
>
> I would think this would need to be a verifier error if it were ever non-0
>
> +1
>
> Otherwise every pass which ever needs this bit would have to first zero it
> out just to be safe, adding an extra walk over the whole functions.
>
> Of course otherwise the pass modifying it will have to zero it, which could
> also be a walk over the whole function.  So either way you have lots
iterate
> over, which is why i’m weary of this approach tbh.
>
>
> Every pass which ever uses this internally would have to set it to zero
when
> it is done, adding an extra walk over the whole functions as you noticed.
> This goes against “you don’t pay for what you don’t use”, so definitively
-1
> for this. Better to cleanup before use.
> I agree that the approach does not scale/generalize well, and we should try
> to find an alternative if possible. Now *if* it is the only way to improve
> performance significantly, we might have to weight the tradeoff.
>
>
> Does anyone have any concrete alternative suggestions to achieve the
speedup
> demonstrated here?
>
> —Owen
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

escha via llvm-dev

2015-Sep-16 02:16 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

You mean the input test data? I was testing performance using our offline perf
suite (which is a ton of out of tree code), so it’s not something I can share,
but I imagine any similar test suite put through a typical compiler pipeline
will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost
of managing the set) + (cost of eraseinstruction), which in our case turns out
to be 1/3 the former and 2/3 the latter (roughly).

—escha
> On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> Can someone provide the file used to demonstrate the speedup here?
> I'd be glad to take a quick crack at seeing if i can achieve the same
speedup.
> 
> 
> On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>> 
>> I would assume that it’s just considered to be garbage. I feel like any
sort
>> of per-pass side data like this should come with absolute minimal
contracts,
>> to avoid introducing any more inter-pass complexity.
>> 
>> I would think this would need to be a verifier error if it were ever
non-0
>> 
>> +1
>> 
>> Otherwise every pass which ever needs this bit would have to first zero
it
>> out just to be safe, adding an extra walk over the whole functions.
>> 
>> Of course otherwise the pass modifying it will have to zero it, which
could
>> also be a walk over the whole function.  So either way you have lots
iterate
>> over, which is why i’m weary of this approach tbh.
>> 
>> 
>> Every pass which ever uses this internally would have to set it to zero
when
>> it is done, adding an extra walk over the whole functions as you
noticed.
>> This goes against “you don’t pay for what you don’t use”, so
definitively -1
>> for this. Better to cleanup before use.
>> I agree that the approach does not scale/generalize well, and we should
try
>> to find an alternative if possible. Now *if* it is the only way to
improve
>> performance significantly, we might have to weight the tradeoff.
>> 
>> 
>> Does anyone have any concrete alternative suggestions to achieve the
speedup
>> demonstrated here?
>> 
>> —Owen
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

llvm dev - Sep 2015 - RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)