thr3ads.net - llvm dev - [LLVMdev] Analysis of polly-detect overhead in oggenc [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2013-Jul-14 21:59 UTC

[LLVMdev] Analysis of polly-detect overhead in oggenc

On 07/14/2013 08:05 AM, Star Tan wrote:> I have found that the extremely expensive compile-time overhead comes from
the string buffer operation for "INVALID" MACRO in the polly-detect
pass.
> Attached is a hack patch file that simply remove the string buffer
operation. This patch file can significantly reduce compile-time overhead when
compiling big source code. For example, for oggen*8.ll,  the compile time is
reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.
Very nice analysis. I just tried it myself and can verify that for
oggenc 16x, your patch reduces the scop-detection time from 90 seconds
(80 %) to 0.5 seconds (2.5 %).

I think there are two problems:

   1) The cost of printing a single LLVM type/value increases with
      the size of the overall Module. This seems to be because
      TypeFinder::run() is called each time, without caching in place.
      The cost of TypeFinder::run() increases with the size of the
      module, as it basically just performs a scan on the entire Module.

   2) We are formatting the failure messages during normal compilation,
      even though they are only used in debugging tools like -view-scops

In terms of solutions:

It would be interesting to understand why is 1) so slow, especially as
it seems to be either a fundamental problem in LLVM IR printing or the
way we use the IR printing infrastructure. On the other side, for Polly
we need to solve 2) anyway. Even if formatting would be faster, we
should still not do it, if not needed. As we need to solve 2) anyway, 1)
will only hit us when we do debugging/formatting. I assume in case of
debugging the files we are looking into are normally smaller, such that
the formatting overhead will not be that big.

Hence, I would focus on 2). We could probably just put the code under a
NDEBUG ifndef, but I would actually like to keep them available even in
NDEBUG mode, as we may want to use the error messages to hint users to
why their code can not be optimized. For this and also to get rid of
another annoyance, the INVALID macro, I think we need to restructure the
reporting of the last error, such that formatting of the error messages
can be done on-demand. Another problem that could be solved at this
point is to remove the macro use, which hides the fact that the
functions return as soon as INVALID is called, which is plainly ugly.

I am not sure how to structure this, but I could imagine some small
class hierarchy that has a class for each error type. Each class just
stores pointers to the data structures it needs to format its error
message, but only formats the error on-demand. We could then return this
class in case of failure and return a NoError class or a NULL pointer in
case of success.

This change may also help us to later add support to keep track of all
errors we encounter (not just the first one). This is something Andreas
and Johannes found helpful earlier.

Cheers,
Tobias



g

Daniel Berlin

2013-Jul-15 03:15 UTC

head link

[LLVMdev] Analysis of polly-detect overhead in oggenc

On Sun, Jul 14, 2013 at 2:59 PM, Tobias Grosser <tobias at grosser.es>
wrote:> On 07/14/2013 08:05 AM, Star Tan wrote:
>>
>> I have found that the extremely expensive compile-time overhead comes
from
>> the string buffer operation for "INVALID" MACRO in the
polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer
>> operation. This patch file can significantly reduce compile-time
overhead
>> when compiling big source code. For example, for oggen*8.ll,  the
compile
>> time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this
patch
>> file.
>
>
> Very nice analysis. I just tried it myself and can verify that for
> oggenc 16x, your patch reduces the scop-detection time from 90 seconds
> (80 %) to 0.5 seconds (2.5 %).
>
> I think there are two problems:
>
>   1) The cost of printing a single LLVM type/value increases with
>      the size of the overall Module. This seems to be because
>      TypeFinder::run() is called each time, without caching in place.
>      The cost of TypeFinder::run() increases with the size of the
>      module, as it basically just performs a scan on the entire Module.
>
>   2) We are formatting the failure messages during normal compilation,
>      even though they are only used in debugging tools like -view-scops
>
> In terms of solutions:
>
> It would be interesting to understand why is 1) so slow, especially as
> it seems to be either a fundamental problem in LLVM IR printing or the
> way we use the IR printing infrastructure.
I once analyzed it for GVN when I hit a similar issue (type finder
being rerun again and again while printing statements), and I came up
with the answer that the printing infrastructure does not expect to be
printing things regularly, and the typefinder only gets called so much
during debug printing, because of the call paths it goes through.

The basic cause is that Value::print creates AssemblyWriter objects
every time it is called. This in turn calls the type finder, for every
single call to Value::print.  The type finder is being called to build
up the list of named types in the module, so that when it prints
things, it can use the right name for the type instead of an anonymous
name.

if you do the following:

diff --git a/lib/VMCore/AsmWriter.cpp b/lib/VMCore/AsmWriter.cpp
index 7ef1131..8a2206b 100644
--- a/lib/VMCore/AsmWriter.cpp
+++ b/lib/VMCore/AsmWriter.cpp
@@ -2118,7 +2118,7 @@ void Value::print(raw_ostream &ROS,
AssemblyAnnotationWrit
   if (const Instruction *I = dyn_cast<Instruction>(this)) {
     const Function *F = I->getParent() ? I->getParent()->getParent() :
0;
     SlotTracker SlotTable(F);
-    AssemblyWriter W(OS, SlotTable, getModuleFromVal(I), AAW);
+    AssemblyWriter W(OS, SlotTable, NULL, AAW);
     W.printInstruction(*I);
   } else if (const BasicBlock *BB = dyn_cast<BasicBlock>(this)) {
     SlotTracker SlotTable(BB->getParent());


(This is an old patch, AsmWriter.cpp is not in lib/VMCore anymore, but
you can see what is happening), you will not be shown named types in
the operand printing, but it will not be slow anymore.

This is of course, a complete hack just to work around the issue if
you need to get other work done.
The real fix is either to stop recreating these AssemblyWriter
objects, or improve caching in the bowels of it so that it doesn't
need to rerun typefinder again and again if nothing has changed.





> On the other side, for Polly
> we need to solve 2) anyway. Even if formatting would be faster, we
> should still not do it, if not needed. As we need to solve 2) anyway, 1)
> will only hit us when we do debugging/formatting. I assume in case of
> debugging the files we are looking into are normally smaller, such that
> the formatting overhead will not be that big.
>
> Hence, I would focus on 2). We could probably just put the code under a
> NDEBUG ifndef, but I would actually like to keep them available even in
> NDEBUG mode, as we may want to use the error messages to hint users to
> why their code can not be optimized. For this and also to get rid of
> another annoyance, the INVALID macro, I think we need to restructure the
> reporting of the last error, such that formatting of the error messages
> can be done on-demand. Another problem that could be solved at this
> point is to remove the macro use, which hides the fact that the
> functions return as soon as INVALID is called, which is plainly ugly.
>
> I am not sure how to structure this, but I could imagine some small
> class hierarchy that has a class for each error type. Each class just
> stores pointers to the data structures it needs to format its error
> message, but only formats the error on-demand. We could then return this
> class in case of failure and return a NoError class or a NULL pointer in
> case of success.
>
> This change may also help us to later add support to keep track of all
> errors we encounter (not just the first one). This is something Andreas
> and Johannes found helpful earlier.
>
> Cheers,
> Tobias
>
>
>
>
> g
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Star Tan

2013-Jul-21 16:49 UTC

head link

[LLVMdev] Analysis of polly-detect overhead in oggenc

Hi all,


I have attached a patch file to reduce the polly-detect overhead.


My idea is to avoid calling TypeFinder in Non-DEBUG mode, so TypeFinder is only
called in DEBUG mode with the DEBUG macro.
This patch file did this work with following modifications:


First, it keeps most of string information by replacing  "<<"
with "+" operation. For example, code like this:
    INVALID(CFG, "Non branch instruction terminates BB: " +
BB.getName());
would be converted into:
   LastFailure = "Non branch instruction terminates BB: " +
BB.getName().str();


Second, it simplifies some complex operations like:
       INVALID(AffFunc,
              "Non affine branch in BB '" << BB.getName()
<< "' with LHS: "
                                          << *LHS << " and RHS:
" << *RHS);
into:
      LastFailure = "Non affine branch in BB: " + BB.getName().str();


In such cases, some information for "LHS" and "RHS" are
missed. However, it only has little affects on the variable
"LastFailure", while keeping all information for DEBUG information.
Since the variable "LastFailure" is only used in ScopGraphPrinter,
which should only show critical information in graph, I hope such modification
is acceptable.


Results show that it has almost the same performance improvements as my previous
hack patch file, i.e., reducing the compile-time of Polly-detect pass from 90s
(>80%) to 0.5s (2.5%) when compiling oggenc 16X.


Best wishes,
Star Tan


Postscript to  Tobias:
 I have also implemented your initial proposal, which uses some class hierarchy
to show different Failure information. Unfortunately, I found it would
significantly complicate the source code. It not only introduces a lot of
dynamic object "new" and "copy" operations, but also makes
source code hard to understand. I argue that very detailed information for the
variable "LastFailure" is not essential because it should only show
critical information in the graph. If users want to know detailed failure
information, they should refer to DEBUG information. This new patch file keeps
most of critical information for "LastFailure" except some detailed
information about Instruction and SCEV.
Do you have any suggestion?


At 2013-07-15 05:59:26,"Tobias Grosser" <tobias at grosser.es>
wrote:>On 07/14/2013 08:05 AM, Star Tan wrote:
>> I have found that the extremely expensive compile-time overhead comes
from the string buffer operation for "INVALID" MACRO in the
polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer
operation. This patch file can significantly reduce compile-time overhead when
compiling big source code. For example, for oggen*8.ll,  the compile time is
reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.
>
>Very nice analysis. I just tried it myself and can verify that for
>oggenc 16x, your patch reduces the scop-detection time from 90 seconds
>(80 %) to 0.5 seconds (2.5 %).
>
>I think there are two problems:
>
>   1) The cost of printing a single LLVM type/value increases with
>      the size of the overall Module. This seems to be because
>      TypeFinder::run() is called each time, without caching in place.
>      The cost of TypeFinder::run() increases with the size of the
>      module, as it basically just performs a scan on the entire Module.
>
>   2) We are formatting the failure messages during normal compilation,
>      even though they are only used in debugging tools like -view-scops
>
>In terms of solutions:
>
>It would be interesting to understand why is 1) so slow, especially as
>it seems to be either a fundamental problem in LLVM IR printing or the
>way we use the IR printing infrastructure. On the other side, for Polly
>we need to solve 2) anyway. Even if formatting would be faster, we
>should still not do it, if not needed. As we need to solve 2) anyway, 1)
>will only hit us when we do debugging/formatting. I assume in case of
>debugging the files we are looking into are normally smaller, such that
>the formatting overhead will not be that big.
>
>Hence, I would focus on 2). We could probably just put the code under a
>NDEBUG ifndef, but I would actually like to keep them available even in
>NDEBUG mode, as we may want to use the error messages to hint users to
>why their code can not be optimized. For this and also to get rid of
>another annoyance, the INVALID macro, I think we need to restructure the
>reporting of the last error, such that formatting of the error messages
>can be done on-demand. Another problem that could be solved at this
>point is to remove the macro use, which hides the fact that the
>functions return as soon as INVALID is called, which is plainly ugly.
>
>I am not sure how to structure this, but I could imagine some small
>class hierarchy that has a class for each error type. Each class just
>stores pointers to the data structures it needs to format its error
>message, but only formats the error on-demand. We could then return this
>class in case of failure and return a NoError class or a NULL pointer in
>case of success.
>
>This change may also help us to later add support to keep track of all
>errors we encounter (not just the first one). This is something Andreas
>and Johannes found helpful earlier.
>
>Cheers,
>Tobias
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ScopDetect-Optimize-compile-time-cost-for-string-ope.patch
Type: application/octet-stream
Size: 16775 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.obj>

Tobias Grosser

2013-Jul-21 17:40 UTC

head link

[LLVMdev] Analysis of polly-detect overhead in oggenc

On 07/21/2013 09:49 AM, Star Tan wrote:> Hi all,
>
>
> I have attached a patch file to reduce the polly-detect overhead.
Great.
> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so
> TypeFinder is only called in DEBUG mode with the DEBUG macro.  This
> patch file did this work with following modifications:
>
>
> First, it keeps most of string information by replacing 
"<<" with "+" operation. For example, code like this:
>      INVALID(CFG, "Non branch instruction terminates BB: " +
BB.getName());
> would be converted into:
>     LastFailure = "Non branch instruction terminates BB: " +
BB.getName().str();
>
>
> Second, it simplifies some complex operations like:
>         INVALID(AffFunc,
>                "Non affine branch in BB '" <<
BB.getName() << "' with LHS: "
>                                            << *LHS << "
and RHS: " << *RHS);
> into:
>        LastFailure = "Non affine branch in BB: " +
BB.getName().str();
> In such cases, some information for "LHS" and "RHS" are
missed.
Yes. And unfortunately, we may also loose the 'name' of unnamed basic
blocks. Basic blocks without a name get formatted as %111 with '111'
being their sequence number. Your above change will not be able to
derive this number.
> However, it only has little affects on the variable
"LastFailure",
> while keeping all information for DEBUG information.
Why is that? It seems the DEBUG output is the very same that gets
written to "LastFailure".
> Since the
> variable "LastFailure" is only used in ScopGraphPrinter, which
should
> only show critical information in graph, I hope such modification is
> acceptable.
Why should we only show critical information? In the GraphPrinter we do
not worry about compile time so much, such that we can easily show
helpful information. We just need to make sure that we do not slow down
the compile-time in the generic case.
> Results show that it has almost the same performance improvements as
> my previous hack patch file, i.e., reducing the compile-time of
> Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc
> 16X.
Great.
> Postscript to  Tobias:
>   I have also implemented your initial proposal, which uses some class
>   hierarchy to show different Failure information. Unfortunately, I
>   found it would significantly complicate the source code. It not only
>   introduces a lot of dynamic object "new" and "copy"
operations, but
>   also makes source code hard to understand. I argue that very
>   detailed information for the variable "LastFailure" is not
essential
>   because it should only show critical information in the graph. If
>   users want to know detailed failure information, they should refer
>   to DEBUG information. This new patch file keeps most of critical
>   information for "LastFailure" except some detailed information
about
>   Instruction and SCEV.
Interesting. This was also something I was afraid of. Passing new/delete
stuff through the scop detection is probably not what we want.
> Do you have any suggestion?
I do.

Your patch goes in the right direction and it helped to get a better
idea of what should be done. I think the first point that we learned is
that passing class pointers around is probably too distrubing. I also
believe having the formatting of the error messages in the normal scop
detection is not great, as it both adds unrelated stuff to the code, but
also makes it harder to conditionally disable the error reporting. On
the other side, I believe it is good to make the 'return false'
explicit.

Hence, I propose to transform the code in something like the following:

Instead of

  if (checkSomething())
    INVALID(AffFunc, "Test" << SCEV <<);

we should get something like:

  if (checkSomething()) {
    reportInvalidAffFunc(SCEV);
    return false;
  }

The reportInvalidAffFunc is then either a NO-OP (during normal
execution) or it reports the error (in case we need it).

I am not yet fully sure how the reportInvalid* functions should look
like. However, the part of your patch that is easy to commit without
further discussion is to move the 'return false' out of the INVALID
macro. Meaning, translating the above code to:

  if (checkSomething()) {
    INVALID(AffFunc, "Test" << SCEV <<);
    return false;
  }

I propose to submit such a patch first and then focus on the remaining
problems.

Cheers,
Tobias

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jul 2013 - [LLVMdev] Analysis of polly-detect overhead in oggenc

[LLVMdev] Analysis of polly-detect overhead in oggenc

[LLVMdev] Analysis of polly-detect overhead in oggenc

[LLVMdev] Analysis of polly-detect overhead in oggenc

[LLVMdev] Analysis of polly-detect overhead in oggenc

Reasonably Related Threads