thr3ads.net - llvm dev - [llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?) [Sep 2015]

If this information is useful, please help other people find it:
Share via:

escha via llvm-dev

2015-Sep-16 02:16 UTC

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

You mean the input test data? I was testing performance using our offline perf
suite (which is a ton of out of tree code), so it’s not something I can share,
but I imagine any similar test suite put through a typical compiler pipeline
will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost
of managing the set) + (cost of eraseinstruction), which in our case turns out
to be 1/3 the former and 2/3 the latter (roughly).

—escha
> On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> Can someone provide the file used to demonstrate the speedup here?
> I'd be glad to take a quick crack at seeing if i can achieve the same
speedup.
> 
> 
> On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
>> 
>> I would assume that it’s just considered to be garbage. I feel like any
sort
>> of per-pass side data like this should come with absolute minimal
contracts,
>> to avoid introducing any more inter-pass complexity.
>> 
>> I would think this would need to be a verifier error if it were ever
non-0
>> 
>> +1
>> 
>> Otherwise every pass which ever needs this bit would have to first zero
it
>> out just to be safe, adding an extra walk over the whole functions.
>> 
>> Of course otherwise the pass modifying it will have to zero it, which
could
>> also be a walk over the whole function.  So either way you have lots
iterate
>> over, which is why i’m weary of this approach tbh.
>> 
>> 
>> Every pass which ever uses this internally would have to set it to zero
when
>> it is done, adding an extra walk over the whole functions as you
noticed.
>> This goes against “you don’t pay for what you don’t use”, so
definitively -1
>> for this. Better to cleanup before use.
>> I agree that the approach does not scale/generalize well, and we should
try
>> to find an alternative if possible. Now *if* it is the only way to
improve
>> performance significantly, we might have to weight the tradeoff.
>> 
>> 
>> Does anyone have any concrete alternative suggestions to achieve the
speedup
>> demonstrated here?
>> 
>> —Owen
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

Sean Silva via llvm-dev

2015-Sep-16 02:20 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

On Tue, Sep 15, 2015 at 7:16 PM, escha via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> You mean the input test data? I was testing performance using our offline
> perf suite (which is a ton of out of tree code), so it’s not something I
> can share, but I imagine any similar test suite put through a typical
> compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty
> much the sum of (cost of managing the set) + (cost of eraseinstruction),
> which in our case turns out to be 1/3 the former and 2/3 the latter
> (roughly).
>
Have you looked at making eraseinstruction faster? Sounds like that would
have a greater overall speedup since it would help many passes.

-- Sean Silva

>
> —escha
>
> > On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
> >
> > Can someone provide the file used to demonstrate the speedup here?
> > I'd be glad to take a quick crack at seeing if i can achieve the
same
> speedup.
> >
> >
> > On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>
> >> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:
> >>
> >> I would assume that it’s just considered to be garbage. I feel
like any
> sort
> >> of per-pass side data like this should come with absolute minimal
> contracts,
> >> to avoid introducing any more inter-pass complexity.
> >>
> >> I would think this would need to be a verifier error if it were
ever
> non-0
> >>
> >> +1
> >>
> >> Otherwise every pass which ever needs this bit would have to first
zero
> it
> >> out just to be safe, adding an extra walk over the whole
functions.
> >>
> >> Of course otherwise the pass modifying it will have to zero it,
which
> could
> >> also be a walk over the whole function.  So either way you have
lots
> iterate
> >> over, which is why i’m weary of this approach tbh.
> >>
> >>
> >> Every pass which ever uses this internally would have to set it to
zero
> when
> >> it is done, adding an extra walk over the whole functions as you
> noticed.
> >> This goes against “you don’t pay for what you don’t use”, so
> definitively -1
> >> for this. Better to cleanup before use.
> >> I agree that the approach does not scale/generalize well, and we
should
> try
> >> to find an alternative if possible. Now *if* it is the only way to
> improve
> >> performance significantly, we might have to weight the tradeoff.
> >>
> >>
> >> Does anyone have any concrete alternative suggestions to achieve
the
> speedup
> >> demonstrated here?
> >>
> >> —Owen
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/0c876860/attachment.html>

Daniel Berlin via llvm-dev

2015-Sep-16 03:47 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com>
wrote:> You mean the input test data? I was testing performance using our offline
perf suite (which is a ton of out of tree code), so it’s not something I can
share, but I imagine any similar test suite put through a typical compiler
pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum
of (cost of managing the set) + (cost of eraseinstruction), which in our case
turns out to be 1/3 the former and 2/3 the latter (roughly).
I asked because I can't really reproduce the results of your massive
speedups on even the larger testcases i have around, certainly not on
the order you reported.

I can get a speedup of about 10%  by changing the smallptrsetsize to
be 16 from 128.
I can get a reduction of about 15% by using a visited bit (IE half the
speedup you report).
I can get roughly the same speedup by using denseset, etc.

escha via llvm-dev

2015-Sep-16 06:56 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 15, 2015, at 8:47 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com> wrote:
>> You mean the input test data? I was testing performance using our
offline perf suite (which is a ton of out of tree code), so it’s not something I
can share, but I imagine any similar test suite put through a typical compiler
pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum
of (cost of managing the set) + (cost of eraseinstruction), which in our case
turns out to be 1/3 the former and 2/3 the latter (roughly).
> 
> I asked because I can't really reproduce the results of your massive
> speedups on even the larger testcases i have around, certainly not on
> the order you reported.
> 
> I can get a speedup of about 10%  by changing the smallptrsetsize to
> be 16 from 128.
> I can get a reduction of about 15% by using a visited bit (IE half the
> speedup you report).
> I can get roughly the same speedup by using denseset, etc.
Hmm, interesting! I guess it could be a variation in either:

a) the number of instructions per function on average
or
b) the amount of stuff that needs to be DCE’d on average (less stuff -> more
time spent in set maintenance, comparatively)

If it turns out we can get most of the gain in practice without this sort of
extra bit, that sounds good too.

—escha

llvm dev - Sep 2015 - RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)