At 2013-07-14 13:20:42,"Tobias Grosser" <tobias at grosser.es> wrote:>On 07/13/2013 09:18 PM, Star Tan wrote: >> >> >> At 2013-07-14 02:30:07,"Tobias Grosser" <tobias at grosser.es> wrote: >>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>> Hi Tobias, >>> >>> Hi Star, >[...] >>> Before we write a patch, we should do some profiling to understand where >>> the overhead comes from. I propose you generate oggenc*16 or even >>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>> >>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>> the function we spend all the time in. >>> >>> Cheers, >>> Tobias >> >> Thanks for your very useful suggestion. >> I have profiled the oggenc*16 and oggenc*32 and the results are listed as follows: >> >> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five functions reported by perf are: >> 48.97% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 7.43% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.36% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >> >> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five functions reported by perf are: >> 57.44% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 11.51% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.54% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >> 2.26% opt opt [.] llvm::SlotTracker::processModule() >> >> It is surprise that all compile-time for TypeFinder is added into the compile-time for Polly-detect, but I cannot find the any call instructions to TypeFinder in Polly-detect. > >Yes, this does not seem very conclusive. We probably need a call graph >to see where those are called. > >Did you try running 'perf record' with the '-g' option? This should give >you callgraph information, that should be very helpful to track down the >callers in Polly. Also, if you prefer a graphical view of the >results, you may want to have a look at Gprof2Dot [1]. Finally, if this >all does not work, just running Polly in gdb and randomly breaking a >couple of times (manual sampling), may possibly hint you to the right place. >I also tried perf with -g, but it report nothing useful. the result of perf -g is: - 48.70% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) ` - llvm::TypeFinder::run(llvm::Module const&, bool) + 43.34% 0 - 1.78% 0x480031 + llvm::LoadInst::~LoadInst() - 1.41% 0x460031 + llvm::LoadInst::~LoadInst() - 1.01% 0x18 llvm::BranchInst::~BranchInst() 0x8348007d97fa3d8d - 0.87% 0x233 + llvm::GetElementPtrInst::~GetElementPtrInst() - 0.57% 0x39 + llvm::SExtInst::~SExtInst() - 0.54% 0x460032 + llvm::StoreInst::~StoreInst() GDB is a useful tool! Thanks for Sebastian's advice! By setting a break point on llvm::TypeFinder::run(llvm::Module const&, bool), I find most of calling cases are issued from the following two callsites: 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () The detailed backtrace of "isValidMemoryAccess" is: #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) () #2 0x08f76fc9 in llvm::AssemblyWriter::init() () #3 0x08f77176 in llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*) const () #5 0xb7c1d044 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #6 0xb7c1ea75 in polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #7 0xb7c1f4aa in polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () #14 0x081e6040 in main () >Also, can you upload the .ll file somewhere, such that I can access it?>(Please do not attach it to the email)I have attached the source code of oggenc.c and oggen.ll in the bug r16624: http://llvm.org/bugs/show_bug.cgi?id=16624 Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/490f6bd9/attachment.html>
Tobi,
it looks like this code is the problem:
    for (std::vector<Value *>::iterator PI = Pointers.begin(),
           PE = Pointers.end();
         ;) {
      Value *V = *PI;
      if (V->getName().size() == 0)
        OS << "\"" << *V <<
"\"";
      else
        OS << "\"" << V->getName() <<
"\"";
      ++PI;
      if (PI != PE)
        OS << ", ";
      else
        break;
    }
    INVALID_NOVERIFY(Alias, OS.str());
it prints values to OS even when debug is not turned on:
if you remember, I have already sent a patch to conditionally execute this code
in a DEBUG() stmt.  You rejected that patch because it would not work for
other reporting tools -polly-show or some other flags.
On Sun, Jul 14, 2013 at 4:26 AM, Star Tan <tanmx_star at yeah.net>
wrote:>
> At 2013-07-14 13:20:42,"Tobias Grosser" <tobias at
grosser.es> wrote:
>>On 07/13/2013 09:18 PM, Star Tan wrote:
>>>
>>>
>>> At 2013-07-14 02:30:07,"Tobias Grosser" <tobias at
grosser.es> wrote:
>>>> On 07/13/2013 10:13 AM, Star Tan wrote:
>>>>> Hi Tobias,
>>>>
>>>> Hi Star,
>>[...]
>>>> Before we write a patch, we should do some profiling to
understand where
>>>> the overhead comes from. I propose you generate oggenc*16 or
even
>>>> oggen*32 to ensure we get to about 90% Polly-Detect overhead.
>>>>
>>>> I would then run Polly under linux 'perf'. Using
'perf record polly-opt
>>>> ...' and then 'perf report'. If we are lucky, this
points us exactly to
>>>> the function we spend all the time in.
>>>>
>>>> Cheers,
>>>> Tobias
>>>
>>> Thanks for your very useful suggestion.
>>> I have profiled the oggenc*16 and oggenc*32 and the results are
listed as
>>> follows:
>>>
>>> oggenc*16: polly-detect compile-time percentage is 71.3%. The top
five
>>> functions reported by perf are:
>>>       48.97%  opt  opt                        [.]
>>> llvm::TypeFinder::run(llvm::Module const&, bool)
>>>         7.43%  opt  opt                        [.]
>>> llvm::TypeFinder::incorporateType(llvm::Type*)
>>>         7.36%  opt  opt                        [.]
>>> llvm::TypeFinder::incorporateValue(llvm::Value const*)
>>>         4.04%  opt  libc-2.17.so           [.] 0x0000000000138bea
>>>         2.06%  opt  [kernel.kallsyms]   [k] 0xffffffff81043e6a
>>>
>>> oggenc*32:  polly-detect compile-time percentage is 82.9%. The top
five
>>> functions reported by perf are:
>>>       57.44%  opt  opt                    [.]
>>> llvm::TypeFinder::run(llvm::Module const&, bool)
>>>       11.51%  opt  opt                    [.]
>>> llvm::TypeFinder::incorporateType(llvm::Type*)
>>>         7.54%  opt  opt                    [.]
>>> llvm::TypeFinder::incorporateValue(llvm::Value const*)
>>>         2.66%  opt  libc-2.17.so       [.] 0x0000000000138c02
>>>         2.26%  opt  opt                    [.]
>>> llvm::SlotTracker::processModule()
>>>
>>> It is surprise that all compile-time for TypeFinder is added into
the
>>> compile-time for Polly-detect, but I cannot find the any call
instructions
>>> to TypeFinder in Polly-detect.
>>
>>Yes, this does not seem very conclusive. We probably need a call graph
>>to see where those are called.
>>
>>Did you try running 'perf record' with the '-g' option?
This should give
>>you callgraph information, that should be very helpful to track down the
>>callers in Polly. Also, if you prefer a graphical view of the
>>results, you may want to have a look at Gprof2Dot [1]. Finally, if this
>>all does not work, just running Polly in gdb and randomly breaking a
>>couple of times (manual sampling), may possibly hint you to the right
>> place.
>>
>
> I also tried perf with -g, but it report nothing useful. the result of perf
> -g is:
> -  48.70%  opt  opt                    [.]
> llvm::TypeFinder::run(llvm::Module const&, bool)
> `
>
>    - llvm::TypeFinder::run(llvm::Module const&, bool)
>       + 43.34% 0
>       - 1.78% 0x480031
>          + llvm::LoadInst::~LoadInst()
>       - 1.41% 0x460031
>          + llvm::LoadInst::~LoadInst()
>       - 1.01% 0x18
>            llvm::BranchInst::~BranchInst()
>            0x8348007d97fa3d8d
>       - 0.87% 0x233
>          + llvm::GetElementPtrInst::~GetElementPtrInst()
>       - 0.57% 0x39
>          + llvm::SExtInst::~SExtInst()
>       - 0.54% 0x460032
>          + llvm::StoreInst::~StoreInst()
>
>
> GDB is a useful tool! Thanks for Sebastian's advice!
>
>  By setting a break point on llvm::TypeFinder::run(llvm::Module const&,
> bool), I find most of calling cases are issued from the following  two
> callsites:
> 0xb7c1c5d2 in
polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&,
> polly::ScopDetection::DetectionContext&) const ()
> 0xb7c1d754 in
polly::ScopDetection::isValidInstruction(llvm::Instruction&,
> polly::ScopDetection::DetectionContext&) const ()
>
> The detailed backtrace of "isValidMemoryAccess" is:
>  #0  0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) ()
> #1  0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module
const&)
> ()
> #2  0x08f76fc9 in llvm::AssemblyWriter::init() ()
> #3  0x08f77176 in
> llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&,
> llvm::SlotTracker&, llvm::Module const*,
llvm::AssemblyAnnotationWriter*) ()
> #4  0x08f79d1a in llvm::Value::print(llvm::raw_ostream&,
> llvm::AssemblyAnnotationWriter*) const ()
> #5  0xb7c1d044 in
> polly::ScopDetection::isValidInstruction(llvm::Instruction&,
> polly::ScopDetection::DetectionContext&) const ()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #6  0xb7c1ea75 in
>
polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&)
> const ()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #7  0xb7c1f4aa in
>
polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&)
> const ()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #8  0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) ()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #9  0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) ()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&)
()
>    from
> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
> #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&)
()
> #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&)
()
> #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) ()
> #14 0x081e6040 in main ()
>
>
>
>  >Also, can you upload the .ll file somewhere, such that I can access
it?
>>(Please do not attach it to the email)
>
> I have attached the source code of oggenc.c and oggen.ll in the bug r16624:
> http://llvm.org/bugs/show_bug.cgi?id=16624
>
> Best wishes,
> Star Tan
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Polly Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to polly-dev+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file. However, this patch file does not solve this problem. It only shows the reason why polly-detect pass leads to significant compile-time overhead. Best wishes, Star Tan. At 2013-07-14 17:26:19,"Star Tan" <tanmx_star at yeah.net> wrote: At 2013-07-14 13:20:42,"Tobias Grosser" <tobias at grosser.es> wrote:>On 07/13/2013 09:18 PM, Star Tan wrote: >> >> >> At 2013-07-14 02:30:07,"Tobias Grosser" <tobias at grosser.es> wrote: >>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>> Hi Tobias, >>> >>> Hi Star, >[...] >>> Before we write a patch, we should do some profiling to understand where >>> the overhead comes from. I propose you generate oggenc*16 or even >>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>> >>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>> the function we spend all the time in. >>> >>> Cheers, >>> Tobias >> >> Thanks for your very useful suggestion. >> I have profiled the oggenc*16 and oggenc*32 and the results are listed as follows: >> >> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five functions reported by perf are: >> 48.97% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 7.43% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.36% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >> >> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five functions reported by perf are: >> 57.44% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) >> 11.51% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*) >> 7.54% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*) >> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >> 2.26% opt opt [.] llvm::SlotTracker::processModule() >> >> It is surprise that all compile-time for TypeFinder is added into the compile-time for Polly-detect, but I cannot find the any call instructions to TypeFinder in Polly-detect. > >Yes, this does not seem very conclusive. We probably need a call graph >to see where those are called. > >Did you try running 'perf record' with the '-g' option? This should give >you callgraph information, that should be very helpful to track down the >callers in Polly. Also, if you prefer a graphical view of the >results, you may want to have a look at Gprof2Dot [1]. Finally, if this >all does not work, just running Polly in gdb and randomly breaking a >couple of times (manual sampling), may possibly hint you to the right place. >I also tried perf with -g, but it report nothing useful. the result of perf -g is: - 48.70% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) ` - llvm::TypeFinder::run(llvm::Module const&, bool) + 43.34% 0 - 1.78% 0x480031 + llvm::LoadInst::~LoadInst() - 1.41% 0x460031 + llvm::LoadInst::~LoadInst() - 1.01% 0x18 llvm::BranchInst::~BranchInst() 0x8348007d97fa3d8d - 0.87% 0x233 + llvm::GetElementPtrInst::~GetElementPtrInst() - 0.57% 0x39 + llvm::SExtInst::~SExtInst() - 0.54% 0x460032 + llvm::StoreInst::~StoreInst() GDB is a useful tool! Thanks for Sebastian's advice! By setting a break point on llvm::TypeFinder::run(llvm::Module const&, bool), I find most of calling cases are issued from! the following two callsites: 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const () The detailed backtrace of "isValidMemoryAccess" is: #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) () #2 0x08f76fc9 in llvm::AssemblyWriter::init() () #3 0x08f77176 in llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*) const () #5 0xb7c1d044 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContex! t&) const () from /home/star/llvm/llvm_build/tools/polly/Rel ease+Asserts/lib/LLVMPolly.so #6 0xb7c1ea75 in polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #7 0xb7c1f4aa in polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) const () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () #12 0x0! 9067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () #14 0x081e6040 in main () >Also, can you upload the .ll file somewhere, such that I can access it?>(Please do not attach it to the email)I have attached the source code of oggenc.c and oggen.ll in the bug r16624: http://llvm.org/bugs/show_bug.cgi?id=16624 Best wishes, Star Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/74b9c293/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: hack_for_polly_detect.patch Type: application/octet-stream Size: 3915 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/74b9c293/attachment.obj>
Hi Sebastian, Yes, you have pointed an important reason. If we comment this source code you have listed, then the compile-time overhead for oggenc*8.ll can be reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%). I just sent another mail to explain why polly-detect pass leads to significant compile-time overhead. Besides the reason you have pointed, another reason is resulted from those string buffer operations in "INVALID" MACRO. If we comment both the string buffer operations in "INVALID" MACRO and in the "isValidMemoryAccess" function, the compile-time overhead for oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%). I think we should revise these string buffer operations in Polly-detect pass. Best wishes, Star Tan At 2013-07-14 23:01:18,"Sebastian Pop" <sebpop at gmail.com> wrote:>Tobi, > >it looks like this code is the problem: > > for (std::vector<Value *>::iterator PI = Pointers.begin(), > PE = Pointers.end(); > ;) { > Value *V = *PI; > > if (V->getName().size() == 0) > OS << "\"" << *V << "\""; > else > OS << "\"" << V->getName() << "\""; > > ++PI; > > if (PI != PE) > OS << ", "; > else > break; > } > > INVALID_NOVERIFY(Alias, OS.str()); > >it prints values to OS even when debug is not turned on: >if you remember, I have already sent a patch to conditionally execute this code >in a DEBUG() stmt. You rejected that patch because it would not work for >other reporting tools -polly-show or some other flags. > > >On Sun, Jul 14, 2013 at 4:26 AM, Star Tan <tanmx_star at yeah.net> wrote: >> >> At 2013-07-14 13:20:42,"Tobias Grosser" <tobias at grosser.es> wrote: >>>On 07/13/2013 09:18 PM, Star Tan wrote: >>>> >>>> >>>> At 2013-07-14 02:30:07,"Tobias Grosser" <tobias at grosser.es> wrote: >>>>> On 07/13/2013 10:13 AM, Star Tan wrote: >>>>>> Hi Tobias, >>>>> >>>>> Hi Star, >>>[...] >>>>> Before we write a patch, we should do some profiling to understand where >>>>> the overhead comes from. I propose you generate oggenc*16 or even >>>>> oggen*32 to ensure we get to about 90% Polly-Detect overhead. >>>>> >>>>> I would then run Polly under linux 'perf'. Using 'perf record polly-opt >>>>> ...' and then 'perf report'. If we are lucky, this points us exactly to >>>>> the function we spend all the time in. >>>>> >>>>> Cheers, >>>>> Tobias >>>> >>>> Thanks for your very useful suggestion. >>>> I have profiled the oggenc*16 and oggenc*32 and the results are listed as >>>> follows: >>>> >>>> oggenc*16: polly-detect compile-time percentage is 71.3%. The top five >>>> functions reported by perf are: >>>> 48.97% opt opt [.] >>>> llvm::TypeFinder::run(llvm::Module const&, bool) >>>> 7.43% opt opt [.] >>>> llvm::TypeFinder::incorporateType(llvm::Type*) >>>> 7.36% opt opt [.] >>>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>>> 4.04% opt libc-2.17.so [.] 0x0000000000138bea >>>> 2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a >>>> >>>> oggenc*32: polly-detect compile-time percentage is 82.9%. The top five >>>> functions reported by perf are: >>>> 57.44% opt opt [.] >>>> llvm::TypeFinder::run(llvm::Module const&, bool) >>>> 11.51% opt opt [.] >>>> llvm::TypeFinder::incorporateType(llvm::Type*) >>>> 7.54% opt opt [.] >>>> llvm::TypeFinder::incorporateValue(llvm::Value const*) >>>> 2.66% opt libc-2.17.so [.] 0x0000000000138c02 >>>> 2.26% opt opt [.] >>>> llvm::SlotTracker::processModule() >>>> >>>> It is surprise that all compile-time for TypeFinder is added into the >>>> compile-time for Polly-detect, but I cannot find the any call instructions >>>> to TypeFinder in Polly-detect. >>> >>>Yes, this does not seem very conclusive. We probably need a call graph >>>to see where those are called. >>> >>>Did you try running 'perf record' with the '-g' option? This should give >>>you callgraph information, that should be very helpful to track down the >>>callers in Polly. Also, if you prefer a graphical view of the >>>results, you may want to have a look at Gprof2Dot [1]. Finally, if this >>>all does not work, just running Polly in gdb and randomly breaking a >>>couple of times (manual sampling), may possibly hint you to the right >>> place. >>> >> >> I also tried perf with -g, but it report nothing useful. the result of perf >> -g is: >> - 48.70% opt opt [.] >> llvm::TypeFinder::run(llvm::Module const&, bool) >> ` >> >> - llvm::TypeFinder::run(llvm::Module const&, bool) >> + 43.34% 0 >> - 1.78% 0x480031 >> + llvm::LoadInst::~LoadInst() >> - 1.41% 0x460031 >> + llvm::LoadInst::~LoadInst() >> - 1.01% 0x18 >> llvm::BranchInst::~BranchInst() >> 0x8348007d97fa3d8d >> - 0.87% 0x233 >> + llvm::GetElementPtrInst::~GetElementPtrInst() >> - 0.57% 0x39 >> + llvm::SExtInst::~SExtInst() >> - 0.54% 0x460032 >> + llvm::StoreInst::~StoreInst() >> >> >> GDB is a useful tool! Thanks for Sebastian's advice! >> >> By setting a break point on llvm::TypeFinder::run(llvm::Module const&, >> bool), I find most of calling cases are issued from the following two >> callsites: >> 0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> 0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> >> The detailed backtrace of "isValidMemoryAccess" is: >> #0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) () >> #1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) >> () >> #2 0x08f76fc9 in llvm::AssemblyWriter::init() () >> #3 0x08f77176 in >> llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, >> llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) () >> #4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, >> llvm::AssemblyAnnotationWriter*) const () >> #5 0xb7c1d044 in >> polly::ScopDetection::isValidInstruction(llvm::Instruction&, >> polly::ScopDetection::DetectionContext&) const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #6 0xb7c1ea75 in >> polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) >> const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #7 0xb7c1f4aa in >> polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) >> const () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) () >> from >> /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so >> #11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) () >> #12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) () >> #13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) () >> #14 0x081e6040 in main () >> >> >> >> >Also, can you upload the .ll file somewhere, such that I can access it? >>>(Please do not attach it to the email) >> >> I have attached the source code of oggenc.c and oggen.ll in the bug r16624: >> http://llvm.org/bugs/show_bug.cgi?id=16624 >> >> Best wishes, >> Star Tan >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Polly Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to polly-dev+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/9a47f6de/attachment.html>
Tobias Grosser
2013-Jul-14  21:59 UTC
[LLVMdev] Analysis of polly-detect overhead in oggenc
On 07/14/2013 08:05 AM, Star Tan wrote:> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. > Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.Very nice analysis. I just tried it myself and can verify that for oggenc 16x, your patch reduces the scop-detection time from 90 seconds (80 %) to 0.5 seconds (2.5 %). I think there are two problems: 1) The cost of printing a single LLVM type/value increases with the size of the overall Module. This seems to be because TypeFinder::run() is called each time, without caching in place. The cost of TypeFinder::run() increases with the size of the module, as it basically just performs a scan on the entire Module. 2) We are formatting the failure messages during normal compilation, even though they are only used in debugging tools like -view-scops In terms of solutions: It would be interesting to understand why is 1) so slow, especially as it seems to be either a fundamental problem in LLVM IR printing or the way we use the IR printing infrastructure. On the other side, for Polly we need to solve 2) anyway. Even if formatting would be faster, we should still not do it, if not needed. As we need to solve 2) anyway, 1) will only hit us when we do debugging/formatting. I assume in case of debugging the files we are looking into are normally smaller, such that the formatting overhead will not be that big. Hence, I would focus on 2). We could probably just put the code under a NDEBUG ifndef, but I would actually like to keep them available even in NDEBUG mode, as we may want to use the error messages to hint users to why their code can not be optimized. For this and also to get rid of another annoyance, the INVALID macro, I think we need to restructure the reporting of the last error, such that formatting of the error messages can be done on-demand. Another problem that could be solved at this point is to remove the macro use, which hides the fact that the functions return as soon as INVALID is called, which is plainly ugly. I am not sure how to structure this, but I could imagine some small class hierarchy that has a class for each error type. Each class just stores pointers to the data structures it needs to format its error message, but only formats the error on-demand. We could then return this class in case of failure and return a NoError class or a NULL pointer in case of success. This change may also help us to later add support to keep track of all errors we encounter (not just the first one). This is something Andreas and Johannes found helpful earlier. Cheers, Tobias g
Star Tan wrote:> I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass. > Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.On top of your patch, I have removed from ScopDetection.cpp all printing of LLVM values, like this: - INVALID(AffFunc, "Non affine access function: " << *AccessFunction); + INVALID(AffFunc, "Non affine access function: "); there are a good dozen or so of these pretty printing. With these changes the compile time spent in ScopDetection drops dramatically to almost 0: here is the longest running one in the compilation of an Android stack: 2.1900 ( 13.7%) 0.0100 ( 7.7%) 2.2000 ( 13.6%) 2.2009 ( 13.4%) Polly - Detect static control parts (SCoPs) Before these changes, the top most expensive ScopDetection time used to be a few hundred of seconds. Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Seemingly Similar Threads
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] Analysis of polly-detect overhead in oggenc
- [LLVMdev] asm-verbose (on by default) is really, really slow