escha via llvm-dev
2015-Sep-16 02:16 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly). —escha> On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > Can someone provide the file used to demonstrate the speedup here? > I'd be glad to take a quick crack at seeing if i can achieve the same speedup. > > > On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >> >> I would assume that it’s just considered to be garbage. I feel like any sort >> of per-pass side data like this should come with absolute minimal contracts, >> to avoid introducing any more inter-pass complexity. >> >> I would think this would need to be a verifier error if it were ever non-0 >> >> +1 >> >> Otherwise every pass which ever needs this bit would have to first zero it >> out just to be safe, adding an extra walk over the whole functions. >> >> Of course otherwise the pass modifying it will have to zero it, which could >> also be a walk over the whole function. So either way you have lots iterate >> over, which is why i’m weary of this approach tbh. >> >> >> Every pass which ever uses this internally would have to set it to zero when >> it is done, adding an extra walk over the whole functions as you noticed. >> This goes against “you don’t pay for what you don’t use”, so definitively -1 >> for this. Better to cleanup before use. >> I agree that the approach does not scale/generalize well, and we should try >> to find an alternative if possible. Now *if* it is the only way to improve >> performance significantly, we might have to weight the tradeoff. >> >> >> Does anyone have any concrete alternative suggestions to achieve the speedup >> demonstrated here? >> >> —Owen >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>
Sean Silva via llvm-dev
2015-Sep-16 02:20 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
On Tue, Sep 15, 2015 at 7:16 PM, escha via llvm-dev <llvm-dev at lists.llvm.org> wrote:> You mean the input test data? I was testing performance using our offline > perf suite (which is a ton of out of tree code), so it’s not something I > can share, but I imagine any similar test suite put through a typical > compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty > much the sum of (cost of managing the set) + (cost of eraseinstruction), > which in our case turns out to be 1/3 the former and 2/3 the latter > (roughly). >Have you looked at making eraseinstruction faster? Sounds like that would have a greater overall speedup since it would help many passes. -- Sean Silva> > —escha > > > On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > > Can someone provide the file used to demonstrate the speedup here? > > I'd be glad to take a quick crack at seeing if i can achieve the same > speedup. > > > > > > On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > >> > >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > >> > >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > >> > >> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev > >> <llvm-dev at lists.llvm.org> wrote: > >> > >> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: > >> > >> I would assume that it’s just considered to be garbage. I feel like any > sort > >> of per-pass side data like this should come with absolute minimal > contracts, > >> to avoid introducing any more inter-pass complexity. > >> > >> I would think this would need to be a verifier error if it were ever > non-0 > >> > >> +1 > >> > >> Otherwise every pass which ever needs this bit would have to first zero > it > >> out just to be safe, adding an extra walk over the whole functions. > >> > >> Of course otherwise the pass modifying it will have to zero it, which > could > >> also be a walk over the whole function. So either way you have lots > iterate > >> over, which is why i’m weary of this approach tbh. > >> > >> > >> Every pass which ever uses this internally would have to set it to zero > when > >> it is done, adding an extra walk over the whole functions as you > noticed. > >> This goes against “you don’t pay for what you don’t use”, so > definitively -1 > >> for this. Better to cleanup before use. > >> I agree that the approach does not scale/generalize well, and we should > try > >> to find an alternative if possible. Now *if* it is the only way to > improve > >> performance significantly, we might have to weight the tradeoff. > >> > >> > >> Does anyone have any concrete alternative suggestions to achieve the > speedup > >> demonstrated here? > >> > >> —Owen > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/0c876860/attachment.html>
Daniel Berlin via llvm-dev
2015-Sep-16 03:47 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com> wrote:> You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly).I asked because I can't really reproduce the results of your massive speedups on even the larger testcases i have around, certainly not on the order you reported. I can get a speedup of about 10% by changing the smallptrsetsize to be 16 from 128. I can get a reduction of about 15% by using a visited bit (IE half the speedup you report). I can get roughly the same speedup by using denseset, etc.
escha via llvm-dev
2015-Sep-16 06:56 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 15, 2015, at 8:47 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > On Tue, Sep 15, 2015 at 7:16 PM, escha <escha at apple.com> wrote: >> You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly). > > I asked because I can't really reproduce the results of your massive > speedups on even the larger testcases i have around, certainly not on > the order you reported. > > I can get a speedup of about 10% by changing the smallptrsetsize to > be 16 from 128. > I can get a reduction of about 15% by using a visited bit (IE half the > speedup you report). > I can get roughly the same speedup by using denseset, etc.Hmm, interesting! I guess it could be a variation in either: a) the number of instructions per function on average or b) the amount of stuff that needs to be DCE’d on average (less stuff -> more time spent in set maintenance, comparatively) If it turns out we can get most of the gain in practice without this sort of extra bit, that sounds good too. —escha