Daniel Berlin via llvm-dev
2015-Sep-14 18:34 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
I did something similar for dominators, for GVN, etc. All see significant speedups. However, the answer i got back when i mentioned this was "things like ptrset and densemap should only have a small performance difference from side data when used and sized right", and i've found this to mostly be true after looking harder. In the case you are looking at, i see: - SmallPtrSet<Instruction*, 128> Alive; This seems ... wrong. In fact, it seems optimally bad. This says the small ptr set size is 128. That is, the smallsize is 128. SmallPtrSet will linear search the array when it's size <= smallsize, and otherwise fall back to building a non-small array and using better algorithms. Linear searching a 128 member array == slow I bet if you change the 128 to 8, you will see significant speedups. On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev <llvm-dev at lists.llvm.org> wrote:> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Are there any other passes that could benefit from having a single bit (or similarly small amount) of per-Instruction metadata local to the pass, i.e. to avoid having to keep a side-Set of instructions... > > FWIW, I have a target specific pass that needs exactly this sort of > tracking. Once you offer 1 bit, it's a slippery slope to giving a > whole pointer. I like your idea regardless. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
escha via llvm-dev
2015-Sep-14 18:41 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
I tried changing it to 16 and only saw a very small speedup. I’m pretty sure that no matter how you slice it, adding every single instruction in a function to a set is going to be expensive. —escha> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > > I did something similar for dominators, for GVN, etc. > All see significant speedups. > > However, the answer i got back when i mentioned this was "things like > ptrset and densemap should only have a small performance difference > from side data when used and sized right", and i've found this to > mostly be true after looking harder. > > In the case you are looking at, i see: > > - SmallPtrSet<Instruction*, 128> Alive; > > This seems ... wrong. > In fact, it seems optimally bad. > > > This says the small ptr set size is 128. > That is, the smallsize is 128. > > SmallPtrSet will linear search the array when it's size <= smallsize, > and otherwise fall back to building a non-small array and using better > algorithms. > > Linear searching a 128 member array == slow > > I bet if you change the 128 to 8, you will see significant speedups. > > > > On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> Are there any other passes that could benefit from having a single bit (or similarly small amount) of per-Instruction metadata local to the pass, i.e. to avoid having to keep a side-Set of instructions... >> >> FWIW, I have a target specific pass that needs exactly this sort of >> tracking. Once you offer 1 bit, it's a slippery slope to giving a >> whole pointer. I like your idea regardless. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Philip Reames via llvm-dev
2015-Sep-14 19:03 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
Given the use case here, I somewhat agree. In the common case, you're going to need an extra O(I) memory to represent the alive set. However, there are definitely ways to do better without needing transient data stored in the instruction itself. Have you considered a bloom filter or something equivalent for testing aliveness? We don't actually need a precise result here - right?. If we're willing to tolerate a slightly imprecise result (in theory), using a bloom filter could give noticeable memory reduction. p.s. I disagree with Daniel on sizing. While linear search over a 128 element array is somewhat slow, memory allocation/churn can be far far worse. I haven't looked at this particular use case, but in general, larger small sizes for frequently called functions can make a lot of sense. Philip On 09/14/2015 11:41 AM, escha via llvm-dev wrote:> I tried changing it to 16 and only saw a very small speedup. I’m pretty sure that no matter how you slice it, adding every single instruction in a function to a set is going to be expensive. > > —escha > >> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> I did something similar for dominators, for GVN, etc. >> All see significant speedups. >> >> However, the answer i got back when i mentioned this was "things like >> ptrset and densemap should only have a small performance difference >> from side data when used and sized right", and i've found this to >> mostly be true after looking harder. >> >> In the case you are looking at, i see: >> >> - SmallPtrSet<Instruction*, 128> Alive; >> >> This seems ... wrong. >> In fact, it seems optimally bad. >> >> >> This says the small ptr set size is 128. >> That is, the smallsize is 128. >> >> SmallPtrSet will linear search the array when it's size <= smallsize, >> and otherwise fall back to building a non-small array and using better >> algorithms. >> >> Linear searching a 128 member array == slow >> >> I bet if you change the 128 to 8, you will see significant speedups. >> >> >> >> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>> Are there any other passes that could benefit from having a single bit (or similarly small amount) of per-Instruction metadata local to the pass, i.e. to avoid having to keep a side-Set of instructions... >>> FWIW, I have a target specific pass that needs exactly this sort of >>> tracking. Once you offer 1 bit, it's a slippery slope to giving a >>> whole pointer. I like your idea regardless. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Daniel Berlin via llvm-dev
2015-Sep-14 21:31 UTC
[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)
On Mon, Sep 14, 2015 at 11:41 AM, escha <escha at apple.com> wrote:> I tried changing it to 16 and only saw a very small speedup. I’m pretty sure that no matter how you slice it, adding every single > instruction in a function to a set is going to be expensive.Yes, it will be. For reference, we had two ways of doing this in GCC: One, things had id numbers, so we could create bitmaps that told us what was in a set without storing pointers :) There were also flags, but in practice, flags were as fast as the bitmaps.> > —escha > >> On Sep 14, 2015, at 11:34 AM, Daniel Berlin <dberlin at dberlin.org> wrote: >> >> I did something similar for dominators, for GVN, etc. >> All see significant speedups. >> >> However, the answer i got back when i mentioned this was "things like >> ptrset and densemap should only have a small performance difference >> from side data when used and sized right", and i've found this to >> mostly be true after looking harder. >> >> In the case you are looking at, i see: >> >> - SmallPtrSet<Instruction*, 128> Alive; >> >> This seems ... wrong. >> In fact, it seems optimally bad. >> >> >> This says the small ptr set size is 128. >> That is, the smallsize is 128. >> >> SmallPtrSet will linear search the array when it's size <= smallsize, >> and otherwise fall back to building a non-small array and using better >> algorithms. >> >> Linear searching a 128 member array == slow >> >> I bet if you change the 128 to 8, you will see significant speedups. >> >> >> >> On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev >>> <llvm-dev at lists.llvm.org> wrote: >>>> Are there any other passes that could benefit from having a single bit (or similarly small amount) of per-Instruction metadata local to the pass, i.e. to avoid having to keep a side-Set of instructions... >>> >>> FWIW, I have a target specific pass that needs exactly this sort of >>> tracking. Once you offer 1 bit, it's a slippery slope to giving a >>> whole pointer. I like your idea regardless. >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >