Tobias Grosser
2013-Jul-14  21:59 UTC
[LLVMdev] Analysis of polly-detect overhead in oggenc
On 07/14/2013 08:05 AM, Star Tan wrote:> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. > Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.Very nice analysis. I just tried it myself and can verify that for oggenc 16x, your patch reduces the scop-detection time from 90 seconds (80 %) to 0.5 seconds (2.5 %). I think there are two problems: 1) The cost of printing a single LLVM type/value increases with the size of the overall Module. This seems to be because TypeFinder::run() is called each time, without caching in place. The cost of TypeFinder::run() increases with the size of the module, as it basically just performs a scan on the entire Module. 2) We are formatting the failure messages during normal compilation, even though they are only used in debugging tools like -view-scops In terms of solutions: It would be interesting to understand why is 1) so slow, especially as it seems to be either a fundamental problem in LLVM IR printing or the way we use the IR printing infrastructure. On the other side, for Polly we need to solve 2) anyway. Even if formatting would be faster, we should still not do it, if not needed. As we need to solve 2) anyway, 1) will only hit us when we do debugging/formatting. I assume in case of debugging the files we are looking into are normally smaller, such that the formatting overhead will not be that big. Hence, I would focus on 2). We could probably just put the code under a NDEBUG ifndef, but I would actually like to keep them available even in NDEBUG mode, as we may want to use the error messages to hint users to why their code can not be optimized. For this and also to get rid of another annoyance, the INVALID macro, I think we need to restructure the reporting of the last error, such that formatting of the error messages can be done on-demand. Another problem that could be solved at this point is to remove the macro use, which hides the fact that the functions return as soon as INVALID is called, which is plainly ugly. I am not sure how to structure this, but I could imagine some small class hierarchy that has a class for each error type. Each class just stores pointers to the data structures it needs to format its error message, but only formats the error on-demand. We could then return this class in case of failure and return a NoError class or a NULL pointer in case of success. This change may also help us to later add support to keep track of all errors we encounter (not just the first one). This is something Andreas and Johannes found helpful earlier. Cheers, Tobias g
On Sun, Jul 14, 2013 at 2:59 PM, Tobias Grosser <tobias at grosser.es> wrote:> On 07/14/2013 08:05 AM, Star Tan wrote: >> >> I have found that the extremely expensive compile-time overhead comes from >> the string buffer operation for "INVALID" MACRO in the polly-detect pass. >> Attached is a hack patch file that simply remove the string buffer >> operation. This patch file can significantly reduce compile-time overhead >> when compiling big source code. For example, for oggen*8.ll, the compile >> time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch >> file. > > > Very nice analysis. I just tried it myself and can verify that for > oggenc 16x, your patch reduces the scop-detection time from 90 seconds > (80 %) to 0.5 seconds (2.5 %). > > I think there are two problems: > > 1) The cost of printing a single LLVM type/value increases with > the size of the overall Module. This seems to be because > TypeFinder::run() is called each time, without caching in place. > The cost of TypeFinder::run() increases with the size of the > module, as it basically just performs a scan on the entire Module. > > 2) We are formatting the failure messages during normal compilation, > even though they are only used in debugging tools like -view-scops > > In terms of solutions: > > It would be interesting to understand why is 1) so slow, especially as > it seems to be either a fundamental problem in LLVM IR printing or the > way we use the IR printing infrastructure.I once analyzed it for GVN when I hit a similar issue (type finder being rerun again and again while printing statements), and I came up with the answer that the printing infrastructure does not expect to be printing things regularly, and the typefinder only gets called so much during debug printing, because of the call paths it goes through. The basic cause is that Value::print creates AssemblyWriter objects every time it is called. This in turn calls the type finder, for every single call to Value::print. The type finder is being called to build up the list of named types in the module, so that when it prints things, it can use the right name for the type instead of an anonymous name. if you do the following: diff --git a/lib/VMCore/AsmWriter.cpp b/lib/VMCore/AsmWriter.cpp index 7ef1131..8a2206b 100644 --- a/lib/VMCore/AsmWriter.cpp +++ b/lib/VMCore/AsmWriter.cpp @@ -2118,7 +2118,7 @@ void Value::print(raw_ostream &ROS, AssemblyAnnotationWrit if (const Instruction *I = dyn_cast<Instruction>(this)) { const Function *F = I->getParent() ? I->getParent()->getParent() : 0; SlotTracker SlotTable(F); - AssemblyWriter W(OS, SlotTable, getModuleFromVal(I), AAW); + AssemblyWriter W(OS, SlotTable, NULL, AAW); W.printInstruction(*I); } else if (const BasicBlock *BB = dyn_cast<BasicBlock>(this)) { SlotTracker SlotTable(BB->getParent()); (This is an old patch, AsmWriter.cpp is not in lib/VMCore anymore, but you can see what is happening), you will not be shown named types in the operand printing, but it will not be slow anymore. This is of course, a complete hack just to work around the issue if you need to get other work done. The real fix is either to stop recreating these AssemblyWriter objects, or improve caching in the bowels of it so that it doesn't need to rerun typefinder again and again if nothing has changed.> On the other side, for Polly > we need to solve 2) anyway. Even if formatting would be faster, we > should still not do it, if not needed. As we need to solve 2) anyway, 1) > will only hit us when we do debugging/formatting. I assume in case of > debugging the files we are looking into are normally smaller, such that > the formatting overhead will not be that big. > > Hence, I would focus on 2). We could probably just put the code under a > NDEBUG ifndef, but I would actually like to keep them available even in > NDEBUG mode, as we may want to use the error messages to hint users to > why their code can not be optimized. For this and also to get rid of > another annoyance, the INVALID macro, I think we need to restructure the > reporting of the last error, such that formatting of the error messages > can be done on-demand. Another problem that could be solved at this > point is to remove the macro use, which hides the fact that the > functions return as soon as INVALID is called, which is plainly ugly. > > I am not sure how to structure this, but I could imagine some small > class hierarchy that has a class for each error type. Each class just > stores pointers to the data structures it needs to format its error > message, but only formats the error on-demand. We could then return this > class in case of failure and return a NoError class or a NULL pointer in > case of success. > > This change may also help us to later add support to keep track of all > errors we encounter (not just the first one). This is something Andreas > and Johannes found helpful earlier. > > Cheers, > Tobias > > > > > g > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi all,
I have attached a patch file to reduce the polly-detect overhead.
My idea is to avoid calling TypeFinder in Non-DEBUG mode, so TypeFinder is only
called in DEBUG mode with the DEBUG macro.
This patch file did this work with following modifications:
First, it keeps most of string information by replacing  "<<"
with "+" operation. For example, code like this:
    INVALID(CFG, "Non branch instruction terminates BB: " +
BB.getName());
would be converted into:
   LastFailure = "Non branch instruction terminates BB: " +
BB.getName().str();
Second, it simplifies some complex operations like:
       INVALID(AffFunc,
              "Non affine branch in BB '" << BB.getName()
<< "' with LHS: "
                                          << *LHS << " and RHS:
" << *RHS);
into:
      LastFailure = "Non affine branch in BB: " + BB.getName().str();
In such cases, some information for "LHS" and "RHS" are
missed. However, it only has little affects on the variable
"LastFailure", while keeping all information for DEBUG information.
Since the variable "LastFailure" is only used in ScopGraphPrinter,
which should only show critical information in graph, I hope such modification
is acceptable.
Results show that it has almost the same performance improvements as my previous
hack patch file, i.e., reducing the compile-time of Polly-detect pass from 90s
(>80%) to 0.5s (2.5%) when compiling oggenc 16X.
Best wishes,
Star Tan
Postscript to  Tobias:
 I have also implemented your initial proposal, which uses some class hierarchy
to show different Failure information. Unfortunately, I found it would
significantly complicate the source code. It not only introduces a lot of
dynamic object "new" and "copy" operations, but also makes
source code hard to understand. I argue that very detailed information for the
variable "LastFailure" is not essential because it should only show
critical information in the graph. If users want to know detailed failure
information, they should refer to DEBUG information. This new patch file keeps
most of critical information for "LastFailure" except some detailed
information about Instruction and SCEV.
Do you have any suggestion?
At 2013-07-15 05:59:26,"Tobias Grosser" <tobias at grosser.es>
wrote:>On 07/14/2013 08:05 AM, Star Tan wrote:
>> I have found that the extremely expensive compile-time overhead comes
from the string buffer operation for "INVALID" MACRO in the
polly-detect pass.
>> Attached is a hack patch file that simply remove the string buffer
operation. This patch file can significantly reduce compile-time overhead when
compiling big source code. For example, for oggen*8.ll,  the compile time is
reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.
>
>Very nice analysis. I just tried it myself and can verify that for
>oggenc 16x, your patch reduces the scop-detection time from 90 seconds
>(80 %) to 0.5 seconds (2.5 %).
>
>I think there are two problems:
>
>   1) The cost of printing a single LLVM type/value increases with
>      the size of the overall Module. This seems to be because
>      TypeFinder::run() is called each time, without caching in place.
>      The cost of TypeFinder::run() increases with the size of the
>      module, as it basically just performs a scan on the entire Module.
>
>   2) We are formatting the failure messages during normal compilation,
>      even though they are only used in debugging tools like -view-scops
>
>In terms of solutions:
>
>It would be interesting to understand why is 1) so slow, especially as
>it seems to be either a fundamental problem in LLVM IR printing or the
>way we use the IR printing infrastructure. On the other side, for Polly
>we need to solve 2) anyway. Even if formatting would be faster, we
>should still not do it, if not needed. As we need to solve 2) anyway, 1)
>will only hit us when we do debugging/formatting. I assume in case of
>debugging the files we are looking into are normally smaller, such that
>the formatting overhead will not be that big.
>
>Hence, I would focus on 2). We could probably just put the code under a
>NDEBUG ifndef, but I would actually like to keep them available even in
>NDEBUG mode, as we may want to use the error messages to hint users to
>why their code can not be optimized. For this and also to get rid of
>another annoyance, the INVALID macro, I think we need to restructure the
>reporting of the last error, such that formatting of the error messages
>can be done on-demand. Another problem that could be solved at this
>point is to remove the macro use, which hides the fact that the
>functions return as soon as INVALID is called, which is plainly ugly.
>
>I am not sure how to structure this, but I could imagine some small
>class hierarchy that has a class for each error type. Each class just
>stores pointers to the data structures it needs to format its error
>message, but only formats the error on-demand. We could then return this
>class in case of failure and return a NoError class or a NULL pointer in
>case of success.
>
>This change may also help us to later add support to keep track of all
>errors we encounter (not just the first one). This is something Andreas
>and Johannes found helpful earlier.
>
>Cheers,
>Tobias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ScopDetect-Optimize-compile-time-cost-for-string-ope.patch
Type: application/octet-stream
Size: 16775 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130722/bb63f382/attachment.obj>
Tobias Grosser
2013-Jul-21  17:40 UTC
[LLVMdev] Analysis of polly-detect overhead in oggenc
On 07/21/2013 09:49 AM, Star Tan wrote:> Hi all, > > > I have attached a patch file to reduce the polly-detect overhead.Great.> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so > TypeFinder is only called in DEBUG mode with the DEBUG macro. This > patch file did this work with following modifications: > > > First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this: > INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName()); > would be converted into: > LastFailure = "Non branch instruction terminates BB: " + BB.getName().str(); > > > Second, it simplifies some complex operations like: > INVALID(AffFunc, > "Non affine branch in BB '" << BB.getName() << "' with LHS: " > << *LHS << " and RHS: " << *RHS); > into: > LastFailure = "Non affine branch in BB: " + BB.getName().str();> In such cases, some information for "LHS" and "RHS" are missed.Yes. And unfortunately, we may also loose the 'name' of unnamed basic blocks. Basic blocks without a name get formatted as %111 with '111' being their sequence number. Your above change will not be able to derive this number.> However, it only has little affects on the variable "LastFailure", > while keeping all information for DEBUG information.Why is that? It seems the DEBUG output is the very same that gets written to "LastFailure".> Since the > variable "LastFailure" is only used in ScopGraphPrinter, which should > only show critical information in graph, I hope such modification is > acceptable.Why should we only show critical information? In the GraphPrinter we do not worry about compile time so much, such that we can easily show helpful information. We just need to make sure that we do not slow down the compile-time in the generic case.> Results show that it has almost the same performance improvements as > my previous hack patch file, i.e., reducing the compile-time of > Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc > 16X.Great.> Postscript to Tobias: > I have also implemented your initial proposal, which uses some class > hierarchy to show different Failure information. Unfortunately, I > found it would significantly complicate the source code. It not only > introduces a lot of dynamic object "new" and "copy" operations, but > also makes source code hard to understand. I argue that very > detailed information for the variable "LastFailure" is not essential > because it should only show critical information in the graph. If > users want to know detailed failure information, they should refer > to DEBUG information. This new patch file keeps most of critical > information for "LastFailure" except some detailed information about > Instruction and SCEV.Interesting. This was also something I was afraid of. Passing new/delete stuff through the scop detection is probably not what we want.> Do you have any suggestion?I do. Your patch goes in the right direction and it helped to get a better idea of what should be done. I think the first point that we learned is that passing class pointers around is probably too distrubing. I also believe having the formatting of the error messages in the normal scop detection is not great, as it both adds unrelated stuff to the code, but also makes it harder to conditionally disable the error reporting. On the other side, I believe it is good to make the 'return false' explicit. Hence, I propose to transform the code in something like the following: Instead of if (checkSomething()) INVALID(AffFunc, "Test" << SCEV <<); we should get something like: if (checkSomething()) { reportInvalidAffFunc(SCEV); return false; } The reportInvalidAffFunc is then either a NO-OP (during normal execution) or it reports the error (in case we need it). I am not yet fully sure how the reportInvalid* functions should look like. However, the part of your patch that is easy to commit without further discussion is to move the 'return false' out of the INVALID macro. Meaning, translating the above code to: if (checkSomething()) { INVALID(AffFunc, "Test" << SCEV <<); return false; } I propose to submit such a patch first and then focus on the remaining problems. Cheers, Tobias
Reasonably Related Threads
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc