Florian Hahn via llvm-dev
2020-Aug-18 21:14 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
> On Aug 18, 2020, at 16:59, Michael Kruse <llvmdev at meinersbur.de> wrote: > > Thanks for all the work. The reductions in stores look promising. Do you also have performance numbers how much this improves the execution time? Did you observe any regressions where MSSA resulted in fewer removed stores?I did not gather numbers for execution time yet, but I’ll try to share some tomorrow. At the current state, for MultiSource/SPEC2000/SPEC2006, there are the following regressions test-suite...6/471.omnetpp/471.omnetpp.test 321.00 316.00 -1.6% test-suite...arks/mafft/pairlocalalign.test 64.00 61.00 -4.7% test-suite...ks/Prolangs-C++/city/city.test 23.00 21.00 -8.7% test-suite...oxyApps-C/miniGMG/miniGMG.test 69.00 60.00 -13.0% I suspect those are caused by the few cases that the MemorySSA version does not yet support. Those will need some more investigating, but I think ideally we would not block the switch on them, so we can switch early and address the issues that pop up early. Cheers, Florian
Florian Hahn via llvm-dev
2020-Aug-19 14:37 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
> On Aug 18, 2020, at 22:14, Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > >> On Aug 18, 2020, at 16:59, Michael Kruse <llvmdev at meinersbur.de> wrote: >> >> Thanks for all the work. The reductions in stores look promising. Do you also have performance numbers how much this improves the execution time? Did you observe any regressions where MSSA resulted in fewer removed stores? > > I did not gather numbers for execution time yet, but I’ll try to share some tomorrow.Here are some execution time results for ARM64 with -O3 -flto with the MemorySSA-DSE compared against the current DSE implementation for CINT2006 (negative % means reduction in execution time with MemorySSA-DSE). This excludes small changes within the noise (<= 0.5%) Exec_time number of stores removed test-suite...T2006/456.hmmer/456.hmmer.test -1.6%. + 70.8% test-suite.../CINT2006/403.gcc/403.gcc.test -1.4%. + 35.7% test-suite...0.perlbench/400.perlbench.test -1.2%. + 33.2% test-suite...3.xalancbmk/483.xalancbmk.test -1.0%. + 3.02% test-suite...T2006/401.bzip2/401.bzip2.test -0.8%. + 70.6%
Alina Sbirlea via llvm-dev
2020-Aug-19 17:46 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
Hi Florian, First, thank you for working on this. I'm really glad to see this work so close to being enabled. I think the numbers look good for run time, and the benefits of switching for all configurations are clear. For compile time, the current regressions are noticeable, but not a deal breaker in my opinion. I'm very much in favor of switching in all configurations. To address some of the concerns, it may make sense to lower the threshold somewhat to minimize impact at this time (we won't have benefits as large at the time of the switch). I'm talking about getting the geomean closer to 1% in all configurations if possible. I believe that the regressions introduced by this flag flip can be undone by further using MemorySSA in the other passes currently using MemDepAnalysis, and offsetting the cost of computing MemorySSA in the first place. The threshold could be raised again to enable more stores eliminated once the MemCpyOpt+MSSA and NewGVN become the default. If reducing the thresholds is not possible or removes most of the run time benefits, I would vote for enabling as is. Best, Alina On Wed, Aug 19, 2020 at 7:37 AM Florian Hahn via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > > On Aug 18, 2020, at 22:14, Florian Hahn via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > > > > >> On Aug 18, 2020, at 16:59, Michael Kruse <llvmdev at meinersbur.de> wrote: > >> > >> Thanks for all the work. The reductions in stores look promising. Do > you also have performance numbers how much this improves the execution > time? Did you observe any regressions where MSSA resulted in fewer removed > stores? > > > > I did not gather numbers for execution time yet, but I’ll try to share > some tomorrow. > > > Here are some execution time results for ARM64 with -O3 -flto with the > MemorySSA-DSE compared against the current DSE implementation for CINT2006 > (negative % means reduction in execution time with MemorySSA-DSE). This > excludes small changes within the noise (<= 0.5%) > > > Exec_time number of stores removed > test-suite...T2006/456.hmmer/456.hmmer.test -1.6%. + 70.8% > test-suite.../CINT2006/403.gcc/403.gcc.test -1.4%. + 35.7% > test-suite...0.perlbench/400.perlbench.test -1.2%. + 33.2% > test-suite...3.xalancbmk/483.xalancbmk.test -1.0%. + 3.02% > test-suite...T2006/401.bzip2/401.bzip2.test -0.8%. + 70.6% > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200819/4ef0ab83/attachment-0001.html>
Michael Kruse via llvm-dev
2020-Aug-20 17:43 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
Thank you for the performance numbers. IMHO they justify switching to MSSA-DSE for all configuration even with slight compile time regressions. Michael Am Mi., 19. Aug. 2020 um 09:37 Uhr schrieb Florian Hahn < florian_hahn at apple.com>:> > > > On Aug 18, 2020, at 22:14, Florian Hahn via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > > > > >> On Aug 18, 2020, at 16:59, Michael Kruse <llvmdev at meinersbur.de> wrote: > >> > >> Thanks for all the work. The reductions in stores look promising. Do > you also have performance numbers how much this improves the execution > time? Did you observe any regressions where MSSA resulted in fewer removed > stores? > > > > I did not gather numbers for execution time yet, but I’ll try to share > some tomorrow. > > > Here are some execution time results for ARM64 with -O3 -flto with the > MemorySSA-DSE compared against the current DSE implementation for CINT2006 > (negative % means reduction in execution time with MemorySSA-DSE). This > excludes small changes within the noise (<= 0.5%) > > > Exec_time number of stores removed > test-suite...T2006/456.hmmer/456.hmmer.test -1.6%. + 70.8% > test-suite.../CINT2006/403.gcc/403.gcc.test -1.4%. + 35.7% > test-suite...0.perlbench/400.perlbench.test -1.2%. + 33.2% > test-suite...3.xalancbmk/483.xalancbmk.test -1.0%. + 3.02% > test-suite...T2006/401.bzip2/401.bzip2.test -0.8%. + 70.6% >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200820/5535fc11/attachment.html>
Possibly Parallel Threads
- [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
- [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
- [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
- RFC: Mark BasicAA as a CFG-only pass.
- Turning on MemorySSA for loop passes