Gasiunas, Vaidas
2014-Oct-09  09:37 UTC
[LLVMdev] Performance regression in the LiveIntevals phase
Some time ago we reported a compile-time performance regression in the
LiveIntervals analysis pass (see http://llvm.org/bugs/show_bug.cgi?id=18580). We
detected it at first after migrating from LLVM 3.1 to 3.3, but the problem
persists also in 3.5. This regression is especially critical when compiling long
functions. In one of our benchmarks compile time goes from 200s (in 3.1) up to
1500s (in 3.3).
We also managed to reproduce the regression with a C++ example using clang. Here
are the instructions:
Generate the example C++ file with the Perl script attached to the Bug (5000 is
a good size parameter to clearly see the regression):
        perl create.pl example.cpp 5000
Generate LLVM IR code for the C++ file (the regression is in the backend
functionality, so we don't want to measure frontend compilation)
        llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31
        llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35
Now you can run the actual performance measurements using llc:
        llvm31/llc -cppgen=program -o=5000.asm example.ir-31
        llvm35/llc -cppgen=program -o=5000.asm example.ir-35
or you can also compile example.ir-31 with llvm3.5:
        llvm35/llc -cppgen=program -o=5000.asm example.ir-31
Our analysis had shown that the most of time is taken for generation of live
intervals for physical registers. In the biggest example, the live interval of
one of the physical registers consists of about 500.000 live ranges. Insertions
in the middle of the array of live ranges cause quadratic algorithmic
complexity, which is apparently the main reason for the slow-down. I changed the
implementation of computeRegUnitInterval() so that it uses a set instead of the
array, and in this way managed to reduce the execution time of the
computeRegUnitInterval from 1200s down to about 1s. The fix is a bit ugly,
however, because I cannot completely switch to the set, since further analyses
are more efficient on the array. For that reason, I flush the contents of the
set into the array at the end of computeRegUnitInterval. I also had to rewrite
various operations on the live interval so that they use the set structure if it
is available and the array otherwise.
If there is an interest, I can port the patch to the dev version of llvm, but
maybe someone has a better idea of how to deal with the regression.
Regards,
Vaidas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141009/a88b1ed8/attachment.html>
Hal Finkel
2014-Oct-09  16:17 UTC
[LLVMdev] Performance regression in the LiveIntevals phase
----- Original Message -----> From: "Vaidas Gasiunas" <vaidas.gasiunas at sap.com> > To: "LLVM Dev (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu> > Sent: Thursday, October 9, 2014 4:37:39 AM > Subject: [LLVMdev] Performance regression in the LiveIntevals phase > > Some time ago we reported a compile-time performance regression in > the LiveIntervals analysis pass (see > http://llvm.org/bugs/show_bug.cgi?id=18580 ). We detected it at > first after migrating from LLVM 3.1 to 3.3, but the problem persists > also in 3.5. This regression is especially critical when compiling > long functions. In one of our benchmarks compile time goes from 200s > (in 3.1) up to 1500s (in 3.3). > > > > We also managed to reproduce the regression with a C++ example using > clang. Here are the instructions: > > > > Generate the example C++ file with the Perl script attached to the > Bug (5000 is a good size parameter to clearly see the regression): > > perl create.pl example.cpp 5000 > > Generate LLVM IR code for the C++ file (the regression is in the > backend functionality, so we don't want to measure frontend > compilation) > > llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31 > > llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35 > > > > Now you can run the actual performance measurements using llc: > > llvm31/llc -cppgen=program -o=5000.asm example.ir-31 > > llvm35/llc -cppgen=program -o=5000.asm example.ir-35 > > or you can also compile example.ir-31 with llvm3.5: > > llvm35/llc -cppgen=program -o=5000.asm example.ir-31 > > > > Our analysis had shown that the most of time is taken for generation > of live intervals for physical registers. In the biggest example, > the live interval of one of the physical registers consists of about > 500.000 live ranges. Insertions in the middle of the array of live > ranges cause quadratic algorithmic complexity, which is apparently > the main reason for the slow-down. I changed the implementation of > computeRegUnitInterval() so that it uses a set instead of the array, > and in this way managed to reduce the execution time of the > computeRegUnitInterval from 1200s down to about 1s. The fix is a bit > ugly, however, because I cannot completely switch to the set, since > further analyses are more efficient on the array. For that reason, I > flush the contents of the set into the array at the end of > computeRegUnitInterval. I also had to rewrite various operations on > the live interval so that they use the set structure if it is > available and the array otherwise. > > If there is an interest, I can port the patch to the dev version of > llvm, but maybe someone has a better idea of how to deal with the > regression.I think this would be interesting, please do. There are a few places in LLVM where we currently keep dual data structures, and needing another one is not necessarily bad (we have SetVector, for example). Thanks! -Hal> > > > Regards, > > Vaidas > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Andrew Trick
2014-Oct-09  19:34 UTC
[LLVMdev] Performance regression in the LiveIntevals phase
> On Oct 9, 2014, at 9:17 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> From: "Vaidas Gasiunas" <vaidas.gasiunas at sap.com> >> To: "LLVM Dev (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu> >> Sent: Thursday, October 9, 2014 4:37:39 AM >> Subject: [LLVMdev] Performance regression in the LiveIntevals phase >> >> Some time ago we reported a compile-time performance regression in >> the LiveIntervals analysis pass (see >> http://llvm.org/bugs/show_bug.cgi?id=18580 ). We detected it at >> first after migrating from LLVM 3.1 to 3.3, but the problem persists >> also in 3.5. This regression is especially critical when compiling >> long functions. In one of our benchmarks compile time goes from 200s >> (in 3.1) up to 1500s (in 3.3). >> >> >> >> We also managed to reproduce the regression with a C++ example using >> clang. Here are the instructions: >> >> >> >> Generate the example C++ file with the Perl script attached to the >> Bug (5000 is a good size parameter to clearly see the regression): >> >> perl create.pl example.cpp 5000 >> >> Generate LLVM IR code for the C++ file (the regression is in the >> backend functionality, so we don't want to measure frontend >> compilation) >> >> llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31 >> >> llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35 >> >> >> >> Now you can run the actual performance measurements using llc: >> >> llvm31/llc -cppgen=program -o=5000.asm example.ir-31 >> >> llvm35/llc -cppgen=program -o=5000.asm example.ir-35 >> >> or you can also compile example.ir-31 with llvm3.5: >> >> llvm35/llc -cppgen=program -o=5000.asm example.ir-31 >> >> >> >> Our analysis had shown that the most of time is taken for generation >> of live intervals for physical registers. In the biggest example, >> the live interval of one of the physical registers consists of about >> 500.000 live ranges. Insertions in the middle of the array of live >> ranges cause quadratic algorithmic complexity, which is apparently >> the main reason for the slow-down. I changed the implementation of >> computeRegUnitInterval() so that it uses a set instead of the array, >> and in this way managed to reduce the execution time of the >> computeRegUnitInterval from 1200s down to about 1s. The fix is a bit >> ugly, however, because I cannot completely switch to the set, since >> further analyses are more efficient on the array. For that reason, I >> flush the contents of the set into the array at the end of >> computeRegUnitInterval. I also had to rewrite various operations on >> the live interval so that they use the set structure if it is >> available and the array otherwise. >> >> If there is an interest, I can port the patch to the dev version of >> llvm, but maybe someone has a better idea of how to deal with the >> regression. > > I think this would be interesting, please do. There are a few places in LLVM where we currently keep dual data structures, and needing another one is not necessarily bad (we have SetVector, for example).Yes, it would be good to get a fix for this behavior in LLVM trunk even if it isn’t pretty. I’ve also seen extendToUses showing up on profiles. Your description of the fix sounds reasonable. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141009/49bf84cf/attachment.html>
Gerolf Hoflehner
2014-Oct-09  20:31 UTC
[LLVMdev] Performance regression in the LiveIntevals phase
Dual data structures is common in other register allocator algorithms. For example, coloring allocators have a (sparse) triangular bit matrix for constant access to interference and (short) adjacency vectors to access all interfering neighbors quickly. A typical space-time trade-off. -Gerolf> On Oct 9, 2014, at 9:17 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> From: "Vaidas Gasiunas" <vaidas.gasiunas at sap.com> >> To: "LLVM Dev (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu> >> Sent: Thursday, October 9, 2014 4:37:39 AM >> Subject: [LLVMdev] Performance regression in the LiveIntevals phase >> >> Some time ago we reported a compile-time performance regression in >> the LiveIntervals analysis pass (see >> http://llvm.org/bugs/show_bug.cgi?id=18580 ). We detected it at >> first after migrating from LLVM 3.1 to 3.3, but the problem persists >> also in 3.5. This regression is especially critical when compiling >> long functions. In one of our benchmarks compile time goes from 200s >> (in 3.1) up to 1500s (in 3.3). >> >> >> >> We also managed to reproduce the regression with a C++ example using >> clang. Here are the instructions: >> >> >> >> Generate the example C++ file with the Perl script attached to the >> Bug (5000 is a good size parameter to clearly see the regression): >> >> perl create.pl example.cpp 5000 >> >> Generate LLVM IR code for the C++ file (the regression is in the >> backend functionality, so we don't want to measure frontend >> compilation) >> >> llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31 >> >> llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35 >> >> >> >> Now you can run the actual performance measurements using llc: >> >> llvm31/llc -cppgen=program -o=5000.asm example.ir-31 >> >> llvm35/llc -cppgen=program -o=5000.asm example.ir-35 >> >> or you can also compile example.ir-31 with llvm3.5: >> >> llvm35/llc -cppgen=program -o=5000.asm example.ir-31 >> >> >> >> Our analysis had shown that the most of time is taken for generation >> of live intervals for physical registers. In the biggest example, >> the live interval of one of the physical registers consists of about >> 500.000 live ranges. Insertions in the middle of the array of live >> ranges cause quadratic algorithmic complexity, which is apparently >> the main reason for the slow-down. I changed the implementation of >> computeRegUnitInterval() so that it uses a set instead of the array, >> and in this way managed to reduce the execution time of the >> computeRegUnitInterval from 1200s down to about 1s. The fix is a bit >> ugly, however, because I cannot completely switch to the set, since >> further analyses are more efficient on the array. For that reason, I >> flush the contents of the set into the array at the end of >> computeRegUnitInterval. I also had to rewrite various operations on >> the live interval so that they use the set structure if it is >> available and the array otherwise. >> >> If there is an interest, I can port the patch to the dev version of >> llvm, but maybe someone has a better idea of how to deal with the >> regression. > > I think this would be interesting, please do. There are a few places in LLVM where we currently keep dual data structures, and needing another one is not necessarily bad (we have SetVector, for example). > > Thanks! > > -Hal > >> >> >> >> Regards, >> >> Vaidas >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141009/98c214cc/attachment.html>
Apparently Analagous Threads
- [LLVMdev] Making LLVM safer in out-of-memory situations
- [LLVMdev] Making LLVM safer in out-of-memory situations
- [LLVMdev] Making LLVM safer in out-of-memory situations
- [LLVMdev] Making LLVM safer in out-of-memory situations
- [LLVMdev] Making LLVM safer in out-of-memory situations