thr3ads.net - llvm dev - [llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?) [Sep 2015]

If this information is useful, please help other people find it:
Share via:

escha via llvm-dev

2015-Sep-14 16:37 UTC

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

I’ve been playing around with optimizing performance various passes and noticed
something about ADCE: it keeps an Alive set that requires a lot of upkeep
time-wise, but if each instruction had a /single bit/ of side data (to represent
liveness, solely within the pass), the set wouldn’t be needed. I tested this out
and observed a ~1/3 reduction in time in ADCE: 1454ms to 982ms according to a
profile over our test suite (for a total of 0.6% compile time savings in our
pipeline).

Are there any other passes that could benefit from having a single bit (or
similarly small amount) of per-Instruction metadata local to the pass, i.e. to
avoid having to keep a side-Set of instructions that’s only used to test
membership of the set (i.e. not iterated over)? Is this sort of thing something
reasonable (maybe it could be stuffed in the SubclassData short and exposed via
a public API?)

—escha

escha via llvm-dev

2015-Sep-14 17:29 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 14, 2015, at 9:37 AM, escha via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> I’ve been playing around with optimizing performance various passes and
noticed something about ADCE: it keeps an Alive set that requires a lot of
upkeep time-wise, but if each instruction had a /single bit/ of side data (to
represent liveness, solely within the pass), the set wouldn’t be needed. I
tested this out and observed a ~1/3 reduction in time in ADCE: 1454ms to 982ms
according to a profile over our test suite (for a total of 0.6% compile time
savings in our pipeline).
> 
> Are there any other passes that could benefit from having a single bit (or
similarly small amount) of per-Instruction metadata local to the pass, i.e. to
avoid having to keep a side-Set of instructions that’s only used to test
membership of the set (i.e. not iterated over)? Is this sort of thing something
reasonable (maybe it could be stuffed in the SubclassData short and exposed via
a public API?)
Here’s a simple prototype of the kind of thing I mean:

diff --git a/include/llvm/IR/Value.h b/include/llvm/IR/Value.h
index 9d83cef..d48f8f3 100644
--- a/include/llvm/IR/Value.h
+++ b/include/llvm/IR/Value.h
@@ -104,13 +104,17 @@ protected:
   ///
   /// Note, this should *NOT* be used directly by any class other than User.
   /// User uses this value to find the Use list.
-  enum : unsigned { NumUserOperandsBits = 29 };
+  enum : unsigned { NumUserOperandsBits = 28 };
   unsigned NumUserOperands : NumUserOperandsBits;
 
   bool IsUsedByMD : 1;
   bool HasName : 1;
   bool HasHungOffUses : 1;
 
+  /// \brief A 1-bit marker that can be used freely within passes, e.g. for
+  /// mark-sweep algorithms like ADCE, to avoid keeping a side data set.
+  bool Marker : 1;
+
 private:
   template <typename UseT> // UseT == 'Use' or 'const
Use'
   class use_iterator_impl
@@ -388,6 +392,12 @@ public:
   /// \brief Return true if there is a value handle associated with this value.
   bool hasValueHandle() const { return HasValueHandle; }
 
+  /// \brief Return the current value of the Marker.
+  bool getMarker() const { return Marker; }
+
+  /// \brief Set the current value of the Marker.
+  void setMarker(bool Val) { Marker = Val; }
+
   /// \brief Return true if there is metadata referencing this value.
   bool isUsedByMetadata() const { return IsUsedByMD; }
 
diff --git a/lib/Transforms/Scalar/ADCE.cpp b/lib/Transforms/Scalar/ADCE.cpp
index 96a0107..5578860 100644
--- a/lib/Transforms/Scalar/ADCE.cpp
+++ b/lib/Transforms/Scalar/ADCE.cpp
@@ -53,15 +53,17 @@ bool ADCE::runOnFunction(Function& F) {
   if (skipOptnoneFunction(F))
     return false;
 
-  SmallPtrSet<Instruction*, 128> Alive;
+ // SmallPtrSet<Instruction*, 128> Alive;
   SmallVector<Instruction*, 128> Worklist;
 
   // Collect the set of "root" instructions that are known live.
   for (Instruction &I : instructions(F)) {
     if (isa<TerminatorInst>(I) || isa<DbgInfoIntrinsic>(I) ||
I.isEHPad() ||
         I.mayHaveSideEffects()) {
-      Alive.insert(&I);
+      I.setMarker(true);
       Worklist.push_back(&I);
+    } else {
+      I.setMarker(false);
     }
   }
 
@@ -70,8 +72,10 @@ bool ADCE::runOnFunction(Function& F) {
     Instruction *Curr = Worklist.pop_back_val();
     for (Use &OI : Curr->operands()) {
       if (Instruction *Inst = dyn_cast<Instruction>(OI))
-        if (Alive.insert(Inst).second)
+        if (!Inst->getMarker()) {
           Worklist.push_back(Inst);
+          Inst->setMarker(true);
+        }
     }
   }
 
@@ -80,7 +84,7 @@ bool ADCE::runOnFunction(Function& F) {
   // value of the function, and may therefore be deleted safely.
   // NOTE: We reuse the Worklist vector here for memory efficiency.
   for (Instruction &I : instructions(F)) {
-    if (!Alive.count(&I)) {
+    if (!I.getMarker()) {
       Worklist.push_back(&I);
       I.dropAllReferences();
     }

Steve King via llvm-dev

2015-Sep-14 18:23 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Are there any other passes that could benefit from having a single bit (or
similarly small amount) of per-Instruction metadata local to the pass, i.e. to
avoid having to keep a side-Set of instructions...
FWIW, I have a target specific pass that needs exactly this sort of
tracking.  Once you offer 1 bit, it's a slippery slope to giving a
whole pointer.  I like your idea regardless.

Daniel Berlin via llvm-dev

2015-Sep-14 18:34 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

I did something similar for dominators, for GVN, etc.
All see significant speedups.

However, the answer i got back when i mentioned this was "things like
ptrset and densemap should only have a small performance difference
from side data when used and sized right", and i've found this to
mostly be true after looking harder.

In the case you are looking at, i see:

-  SmallPtrSet<Instruction*, 128> Alive;

This seems ... wrong.
In fact, it seems optimally bad.

This says the small ptr set size is 128.
That is, the smallsize is 128.

SmallPtrSet will linear search the array when it's size <= smallsize,
and otherwise fall back to building a non-small array and using better
algorithms.

Linear searching a 128 member array == slow

I bet if you change the 128 to 8, you will see significant speedups.

On Mon, Sep 14, 2015 at 11:23 AM, Steve King via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Are there any other passes that could benefit from having a single bit
(or similarly small amount) of per-Instruction metadata local to the pass, i.e.
to avoid having to keep a side-Set of instructions...
>
> FWIW, I have a target specific pass that needs exactly this sort of
> tracking.  Once you offer 1 bit, it's a slippery slope to giving a
> whole pointer.  I like your idea regardless.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Sean Silva via llvm-dev

2015-Sep-14 21:28 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

What is the contract between passes for the marker bit? I.e. do passes need
to "clean up" the marker bit to a predetermined value after they run?
(or
they need to assume that it might be dirty? (like I think you do below))

-- Sean Silva

On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> I’ve been playing around with optimizing performance various passes and
> noticed something about ADCE: it keeps an Alive set that requires a lot of
> upkeep time-wise, but if each instruction had a /single bit/ of side data
> (to represent liveness, solely within the pass), the set wouldn’t be
> needed. I tested this out and observed a ~1/3 reduction in time in ADCE:
> 1454ms to 982ms according to a profile over our test suite (for a total of
> 0.6% compile time savings in our pipeline).
>
> Are there any other passes that could benefit from having a single bit (or
> similarly small amount) of per-Instruction metadata local to the pass, i.e.
> to avoid having to keep a side-Set of instructions that’s only used to test
> membership of the set (i.e. not iterated over)? Is this sort of thing
> something reasonable (maybe it could be stuffed in the SubclassData short
> and exposed via a public API?)
>
> —escha
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150914/d37e2e8f/attachment-0001.html>

Pete Cooper via llvm-dev

2015-Sep-14 21:31 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 14, 2015, at 2:28 PM, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> What is the contract between passes for the marker bit? I.e. do passes need
to "clean up" the marker bit to a predetermined value after they run?
(or they need to assume that it might be dirty? (like I think you do below))Good questions, and to add to this, what happens if we run passes in parallel?  

I can imagine a world where we run 2 function passes in parallel so tagging
instructions is ok, but what if we then started to run 2 BB passes in parallel,
or used the tag to track something like GlobalValue.  Going to break horribly in
such scenarios.

Pete> 
> -- Sean Silva
> 
> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> I’ve been playing around with optimizing performance various passes and
noticed something about ADCE: it keeps an Alive set that requires a lot of
upkeep time-wise, but if each instruction had a /single bit/ of side data (to
represent liveness, solely within the pass), the set wouldn’t be needed. I
tested this out and observed a ~1/3 reduction in time in ADCE: 1454ms to 982ms
according to a profile over our test suite (for a total of 0.6% compile time
savings in our pipeline).
> 
> Are there any other passes that could benefit from having a single bit (or
similarly small amount) of per-Instruction metadata local to the pass, i.e. to
avoid having to keep a side-Set of instructions that’s only used to test
membership of the set (i.e. not iterated over)? Is this sort of thing something
reasonable (maybe it could be stuffed in the SubclassData short and exposed via
a public API?)
> 
> —escha
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150914/6ee7a4b8/attachment.html>

escha via llvm-dev

2015-Sep-14 21:47 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

I would assume that it’s just considered to be garbage. I feel like any sort of
per-pass side data like this should come with absolute minimal contracts, to
avoid introducing any more inter-pass complexity.

—escha
> On Sep 14, 2015, at 2:28 PM, Sean Silva <chisophugis at gmail.com>
wrote:
> 
> What is the contract between passes for the marker bit? I.e. do passes need
to "clean up" the marker bit to a predetermined value after they run?
(or they need to assume that it might be dirty? (like I think you do below))
> 
> -- Sean Silva
> 
> On Mon, Sep 14, 2015 at 9:37 AM, escha via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> I’ve been playing around with optimizing performance various passes and
noticed something about ADCE: it keeps an Alive set that requires a lot of
upkeep time-wise, but if each instruction had a /single bit/ of side data (to
represent liveness, solely within the pass), the set wouldn’t be needed. I
tested this out and observed a ~1/3 reduction in time in ADCE: 1454ms to 982ms
according to a profile over our test suite (for a total of 0.6% compile time
savings in our pipeline).
> 
> Are there any other passes that could benefit from having a single bit (or
similarly small amount) of per-Instruction metadata local to the pass, i.e. to
avoid having to keep a side-Set of instructions that’s only used to test
membership of the set (i.e. not iterated over)? Is this sort of thing something
reasonable (maybe it could be stuffed in the SubclassData short and exposed via
a public API?)
> 
> —escha
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150914/441edf37/attachment.html>

Chris Lattner via llvm-dev

2015-Sep-17 03:36 UTC

head link

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

> On Sep 14, 2015, at 4:28 PM, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> What is the contract between passes for the marker bit? I.e. do passes need
to "clean up" the marker bit to a predetermined value after they run?
(or they need to assume that it might be dirty? (like I think you do below))
This is the key question for this sort of thing (which has come up many times
before).

For the record, I’m not opposed to something like this, so long as:

- It only applies to instructions and basic blocks, not constants, Function,
GlobalVariable, etc.  We don’t want to make parallelization harder than it
already is.
- The contract (enforced by the verifier) is that these bits are always zero
between passes.
- Adding bits doesn’t increase the size of the IR over what it is today.

Beyond that, I think that it is really important to design the right
data-structure around this to provide these invariants.  Here’s a sketch of
something that I think could work well, over-simplified to be just a single bit
on Instruction to illustrate the point:

class Instruction {
  bool TheBit = 0  // squished into something that already exists.
...
public:
  bool isBitSet() const { return TheBit; }
  void setBit(BitTracker &BT, bool Value = true) {
     if (TheBit) return;

     BT.addToSet(this);
     TheBIt = Value;
  }
}

Where BitTracker is basically just an std::vector<Instruction*> that keeps
track of all the instructions which (might) have a bit set, and offers a method
to go zero all the bits.

An API like this would make it very straight-forward for function passes to
preserve the invariant.

-Chris

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Sep 2015 - RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

[llvm-dev] RFC: speedups with instruction side-data (ADCE, perhaps others?)

Reasonably Related Threads