Owen Anderson via llvm-dev
2015-Sep-15 21:16 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote: > >> >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> >>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >>>> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. >>> I would think this would need to be a verifier error if it were ever non-0 >> +1 >> >> Otherwise every pass which ever needs this bit would have to first zero it out just to be safe, adding an extra walk over the whole functions. >> >> Of course otherwise the pass modifying it will have to zero it, which could also be a walk over the whole function. So either way you have lots iterate over, which is why i’m weary of this approach tbh. > > Every pass which ever uses this internally would have to set it to zero when it is done, adding an extra walk over the whole functions as you noticed. This goes against “you don’t pay for what you don’t use”, so definitively -1 for this. Better to cleanup before use. > I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff.Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here? —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/b6c7d7ff/attachment.html>
Hal Finkel via llvm-dev
2015-Sep-15 21:34 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
----- Original Message -----> From: "Owen Anderson via llvm-dev" <llvm-dev at lists.llvm.org> > To: "Mehdi Amini" <mehdi.amini at apple.com> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Tuesday, September 15, 2015 4:16:58 PM > Subject: Re: [llvm-dev] RFC: speedups with instruction side-data > (ADCE, perhaps others?)> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev < > llvm-dev at lists.llvm.org > wrote:> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev < > llvm-dev at lists.llvm.org > wrote:> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev < > llvm-dev at lists.llvm.org > wrote:> On 09/14/2015 02:47 PM, escha via llvm-dev wrote:> I would assume that it’s just considered to be garbage. I feel like > any sort of per-pass side data like this should come with absolute > minimal contracts, to avoid introducing any more inter-pass > complexity. > I would think this would need to be a verifier error if it were ever > non-0 > +1> Otherwise every pass which ever needs this bit would have to first > zero it out just to be safe, adding an extra walk over the whole > functions.> Of course otherwise the pass modifying it will have to zero it, which > could also be a walk over the whole function. So either way you have > lots iterate over, which is why i’m weary of this approach tbh.> Every pass which ever uses this internally would have to set it to > zero when it is done, adding an extra walk over the whole functions > as you noticed. This goes against “you don’t pay for what you don’t > use”, so definitively -1 for this. Better to cleanup before use. > I agree that the approach does not scale/generalize well, and we > should try to find an alternative if possible. Now *if* it is the > only way to improve performance significantly, we might have to > weight the tradeoff.> Does anyone have any concrete alternative suggestions to achieve the > speedup demonstrated here?Are these actually free bits, or does this increase the size of Value? If they're free, how many free bits do we have in Value and does this differ on common 32-bit and 64-bit systems? Also, this:> I should note, when I was writing my simplifyInstructionsInBlock > replacement, I got a similarly large speedup (~1000ms -> 700ms) with > the trick where I only added instructions to the worklist if they > needed to be /revisited/, as opposed to initting the entire worklist > with all the instructions. This is likely a similar gain as to what > I got here because it’s the difference between “put everything in > the function in a set” and “putting only a few things from the > function in a set”, so the gain from Not Putting Tons Of Stuff In A > Set seems to be pretty consistent across widely varying use-cases.Can this be generalized to other relevant use cases? -Hal> —Owen > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> --Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Pete Cooper via llvm-dev
2015-Sep-15 21:49 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com> wrote: > > >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >>> >>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> >>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >>>>> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. >>>> I would think this would need to be a verifier error if it were ever non-0 >>> +1 >>> >>> Otherwise every pass which ever needs this bit would have to first zero it out just to be safe, adding an extra walk over the whole functions. >>> >>> Of course otherwise the pass modifying it will have to zero it, which could also be a walk over the whole function. So either way you have lots iterate over, which is why i’m weary of this approach tbh. >> >> Every pass which ever uses this internally would have to set it to zero when it is done, adding an extra walk over the whole functions as you noticed. This goes against “you don’t pay for what you don’t use”, so definitively -1 for this. Better to cleanup before use. >> I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff. > > Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here?Honestly, not an alternative to speedup ADCE. This appears to be the fastest way to improve that particular pass. My worry here comes from prior experience. I have done this exactly change years ago on a non-LLVM compiler. Adding one bit to each instruction was great, we used it in a few passes, then we found we needed a second bit, then a third. That quickly became unmanageable, and if we add this bit then we put ourselves on a repeat of that. I think we should weigh up what this change gets us in overall compile time. A 30% speedup in ADCE is great, and if a few other passes would benefit from this and get similar speedups, then even better. However, if thats only worth say 1% of overall compile time, then i’d argue its not worth it. Pete> > —Owen-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/6a5a9bd3/attachment.html>
Marcello Maggioni via llvm-dev
2015-Sep-15 22:19 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
Here the main problem seem to be that we don’t have a fast way to check for “is this in a certain set”. In my opinion here the problem is that we don’t have a good (read fast) way for checking that a certain condition applies to an llvm Value. I believe this bit is trying to address that. Maybe there is room for improvement in LLVM set-like data structures? Maybe there is a way to use a BitVector instead and generate from the Value/Instruction a unique linear ID that can be used with that? Marcello> On 15 Sep 2015, at 14:49, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > >> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote: >> >> >>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>>> >>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> >>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >>>>>> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. >>>>> I would think this would need to be a verifier error if it were ever non-0 >>>> +1 >>>> >>>> Otherwise every pass which ever needs this bit would have to first zero it out just to be safe, adding an extra walk over the whole functions. >>>> >>>> Of course otherwise the pass modifying it will have to zero it, which could also be a walk over the whole function. So either way you have lots iterate over, which is why i’m weary of this approach tbh. >>> >>> Every pass which ever uses this internally would have to set it to zero when it is done, adding an extra walk over the whole functions as you noticed. This goes against “you don’t pay for what you don’t use”, so definitively -1 for this. Better to cleanup before use. >>> I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff. >> >> Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here? > Honestly, not an alternative to speedup ADCE. This appears to be the fastest way to improve that particular pass. > > My worry here comes from prior experience. I have done this exactly change years ago on a non-LLVM compiler. Adding one bit to each instruction was great, we used it in a few passes, then we found we needed a second bit, then a third. That quickly became unmanageable, and if we add this bit then we put ourselves on a repeat of that. > > I think we should weigh up what this change gets us in overall compile time. A 30% speedup in ADCE is great, and if a few other passes would benefit from this and get similar speedups, then even better. However, if thats only worth say 1% of overall compile time, then i’d argue its not worth it. > > Pete >> >> —Owen > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/f62be8ff/attachment.html>
Mehdi Amini via llvm-dev
2015-Sep-15 23:05 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote: > >> >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >>> >>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> >>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >>>>> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. >>>> I would think this would need to be a verifier error if it were ever non-0 >>> +1 >>> >>> Otherwise every pass which ever needs this bit would have to first zero it out just to be safe, adding an extra walk over the whole functions. >>> >>> Of course otherwise the pass modifying it will have to zero it, which could also be a walk over the whole function. So either way you have lots iterate over, which is why i’m weary of this approach tbh. >> >> Every pass which ever uses this internally would have to set it to zero when it is done, adding an extra walk over the whole functions as you noticed. This goes against “you don’t pay for what you don’t use”, so definitively -1 for this. Better to cleanup before use. >> I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff. > > Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here?It might also be that no one really started to think about the alternatives, but the RFC is pretty narrow (the scaling/generalizing part I mentioned, what if we need 2 bits in the future? 3? etc.) at this point. The absence of considered alternatives is not an reason by itself to get intrusive changes in. Now as I said, it might be the best/only way to go, it just hasn’t been demonstrated. By the way talking about speedup, what is the impact on the compile time / memory usage mesured for a clang run on the public and on our internal test-suites? Can we have an LNT run result posted here? Thanks, — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/9554193a/attachment.html>
Mehdi Amini via llvm-dev
2015-Sep-15 23:07 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
> On Sep 15, 2015, at 2:49 PM, Pete Cooper <peter_cooper at apple.com> wrote: > > >> On Sep 15, 2015, at 2:16 PM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote: >> >> >>> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>>> >>>> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> >>>>> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >>>>>> I would assume that it’s just considered to be garbage. I feel like any sort of per-pass side data like this should come with absolute minimal contracts, to avoid introducing any more inter-pass complexity. >>>>> I would think this would need to be a verifier error if it were ever non-0 >>>> +1 >>>> >>>> Otherwise every pass which ever needs this bit would have to first zero it out just to be safe, adding an extra walk over the whole functions. >>>> >>>> Of course otherwise the pass modifying it will have to zero it, which could also be a walk over the whole function. So either way you have lots iterate over, which is why i’m weary of this approach tbh. >>> >>> Every pass which ever uses this internally would have to set it to zero when it is done, adding an extra walk over the whole functions as you noticed. This goes against “you don’t pay for what you don’t use”, so definitively -1 for this. Better to cleanup before use. >>> I agree that the approach does not scale/generalize well, and we should try to find an alternative if possible. Now *if* it is the only way to improve performance significantly, we might have to weight the tradeoff. >> >> Does anyone have any concrete alternative suggestions to achieve the speedup demonstrated here? > Honestly, not an alternative to speedup ADCE. This appears to be the fastest way to improve that particular pass. > > My worry here comes from prior experience. I have done this exactly change years ago on a non-LLVM compiler. Adding one bit to each instruction was great, we used it in a few passes, then we found we needed a second bit, then a third. That quickly became unmanageable, and if we add this bit then we put ourselves on a repeat of that. > > I think we should weigh up what this change gets us in overall compile time. A 30% speedup in ADCE is great, and if a few other passes would benefit from this and get similar speedups, then even better. However, if thats only worth say 1% of overall compile time, then i’d argue its not worth it.My email client was out-of sync when I just answered, and catching-up I see that Pete summarized perfectly the way I was seeing it. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150915/ee76fd97/attachment.html>
Daniel Berlin via llvm-dev
2015-Sep-16 01:50 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
Can someone provide the file used to demonstrate the speedup here? I'd be glad to take a quick crack at seeing if i can achieve the same speedup. On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On 09/14/2015 02:47 PM, escha via llvm-dev wrote: > > I would assume that it’s just considered to be garbage. I feel like any sort > of per-pass side data like this should come with absolute minimal contracts, > to avoid introducing any more inter-pass complexity. > > I would think this would need to be a verifier error if it were ever non-0 > > +1 > > Otherwise every pass which ever needs this bit would have to first zero it > out just to be safe, adding an extra walk over the whole functions. > > Of course otherwise the pass modifying it will have to zero it, which could > also be a walk over the whole function. So either way you have lots iterate > over, which is why i’m weary of this approach tbh. > > > Every pass which ever uses this internally would have to set it to zero when > it is done, adding an extra walk over the whole functions as you noticed. > This goes against “you don’t pay for what you don’t use”, so definitively -1 > for this. Better to cleanup before use. > I agree that the approach does not scale/generalize well, and we should try > to find an alternative if possible. Now *if* it is the only way to improve > performance significantly, we might have to weight the tradeoff. > > > Does anyone have any concrete alternative suggestions to achieve the speedup > demonstrated here? > > —Owen > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
escha via llvm-dev
2015-Sep-16 02:16 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
You mean the input test data? I was testing performance using our offline perf suite (which is a ton of out of tree code), so it’s not something I can share, but I imagine any similar test suite put through a typical compiler pipeline will exercise ADCE in similar ways. ADCE’s cost is pretty much the sum of (cost of managing the set) + (cost of eraseinstruction), which in our case turns out to be 1/3 the former and 2/3 the latter (roughly). —escha> On Sep 15, 2015, at 6:50 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > > Can someone provide the file used to demonstrate the speedup here? > I'd be glad to take a quick crack at seeing if i can achieve the same speedup. > > > On Tue, Sep 15, 2015 at 2:16 PM, Owen Anderson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> On Sep 14, 2015, at 5:02 PM, Mehdi Amini via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On Sep 14, 2015, at 2:58 PM, Pete Cooper via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> >> On Sep 14, 2015, at 2:49 PM, Matt Arsenault via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> >> On 09/14/2015 02:47 PM, escha via llvm-dev wrote: >> >> I would assume that it’s just considered to be garbage. I feel like any sort >> of per-pass side data like this should come with absolute minimal contracts, >> to avoid introducing any more inter-pass complexity. >> >> I would think this would need to be a verifier error if it were ever non-0 >> >> +1 >> >> Otherwise every pass which ever needs this bit would have to first zero it >> out just to be safe, adding an extra walk over the whole functions. >> >> Of course otherwise the pass modifying it will have to zero it, which could >> also be a walk over the whole function. So either way you have lots iterate >> over, which is why i’m weary of this approach tbh. >> >> >> Every pass which ever uses this internally would have to set it to zero when >> it is done, adding an extra walk over the whole functions as you noticed. >> This goes against “you don’t pay for what you don’t use”, so definitively -1 >> for this. Better to cleanup before use. >> I agree that the approach does not scale/generalize well, and we should try >> to find an alternative if possible. Now *if* it is the only way to improve >> performance significantly, we might have to weight the tradeoff. >> >> >> Does anyone have any concrete alternative suggestions to achieve the speedup >> demonstrated here? >> >> —Owen >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>
Reasonably Related Threads
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- RFC: speedups with instruction side-data (ADCE, perhaps others?)