Alina Sbirlea via llvm-dev
2020-Sep-01 22:03 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
Hi Florian, Following up on D86967, I missed that all the timings were using the legacy pass manager. Did you do any testing on the compile and run time impact for the new pass manager? Thank you, Alina On Tue, Aug 25, 2020 at 12:51 PM Florian Hahn <florian_hahn at apple.com> wrote:> Hi, > > Thanks for all the responses! > > My understanding is that there were no objections so far against trading a > bit of additional compile-time for more eliminated stores. Unless there are > any new concerns raised, I’ll continue working on submitting the remaining > patches to keep compile-time in check and flip the switch once they are in. > > > > On Aug 19, 2020, at 18:46, Alina Sbirlea <alina.sbirlea at gmail.com> > wrote: > > > > Hi Florian, > > > > First, thank you for working on this. I'm really glad to see this work > so close to being enabled. > > > > I think the numbers look good for run time, and the benefits of > switching for all configurations are clear. > > > > For compile time, the current regressions are noticeable, but not a deal > breaker in my opinion. I'm very much in favor of switching in all > configurations. > > > > To address some of the concerns, it may make sense to lower the > threshold somewhat to minimize impact at this time (we won't have benefits > as large at the time of the switch). I'm talking about getting the geomean > closer to 1% in all configurations if possible. > > I believe that the regressions introduced by this flag flip can be > undone by further using MemorySSA in the other passes currently using > MemDepAnalysis, and offsetting the cost of computing MemorySSA in the first > place. The threshold could be raised again to enable more stores eliminated > once the MemCpyOpt+MSSA and NewGVN become the default. > > > > If reducing the thresholds is not possible or removes most of the run > time benefits, I would vote for enabling as is. > > I’ve been working on some additional changes to bring compile-time down a > bit further, while keeping the number of eliminated stores at a similar > level. Geomeans: > -O3 +0.62% > ReleaseThinLTO +1.06%, > ReleaseLTO-g +0.79% Re > ReleaseThinLTO (link-only) +0.96% > ReleaseLTO-G (link only). +0.80% > > > http://llvm-compile-time-tracker.com/compare.php?from=e19ef1aab524ef10a2d118adcd9f6fd6ca2d7ca9&to=c6812412983b4ebb20f1e32b83893284c8117e7f&stat=instructions > > This also includes changes so DSE can re-use MemorySSA constructed by LICM > earlier during LTO. The limits could certainly be a bit tighter still, but > I hope they might encourage a few additional eyes on DSE + MemorySSA > compile-time wise, which would be great. Also, triggering as often as > possible is probably beneficial in terms of flushing out any remaining > issues early. > > Cheers, > Florian-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200901/5d5cd53c/attachment.html>
Florian Hahn via llvm-dev
2020-Sep-03 15:08 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
> On Sep 1, 2020, at 23:03, Alina Sbirlea <alina.sbirlea at gmail.com> wrote: > > Hi Florian, > > Following up on D86967, I missed that all the timings were using the legacy pass manager. > Did you do any testing on the compile and run time impact for the new pass manager?All numbers shared earlier where indeed with the default configuration/LPM. I did some testing with the new PM today (using ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER). In terms of removed stores, the improvements are very similar to the legacy pass manager. In terms of compile-time, the impact is a bit bigger unfortunately, due to the fact that we need more extra computations of MemorySSA. For example, in the regular pipeline, DSE is scheduled just before LICM, both of which use MemorySSA. With the LPM, if I understand correctly, LICM's loop pass manager constructs MemorySSA even if there is no loop in a function. So in the LPM, the number of times MemorySSA is computed stays the same. With the NPW, it seems like LICM’s loop pass manager does only construct MemorySSA if there are actual loops in a function. This is great, but unfortunately means that using DSE + MemorySSA introduces an extra MemorySSA construction for each function without loops. For tramp3d-v4 this introduces 3x more MemorySSA constructions when running DSE + MemorySSA followed by LICM on the LTO module, for example. I do not think there are any short-term solutions for this in the NPM, but once more passes, e.g. NewGVN or MemCpyOptimizer, are using MemorySSA, the cost should be somewhat amortized. So here are the latest CTMark numbers with DSE + MemorySSA enabled. Note that they those measurements do not include the compile-time improvements made recently to MemDepAnalysis a week ago (https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e <https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e>). I think this highlights the benefits of switching as soon as possible, so the DSE + MemorySSA implementation can benefit from more additional and fresh eyes with focus on compile-time. New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.28% - 0.41% ReleaseLTO-g. + 1.64% - 0.35% RelThinLTO (link only) + 0.93% - 0.41% RelLO-g (link only). + 2.09% - 0.35% Details: http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions <http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions> Legacy Pass Manager exec instrs. size-text O3 + 0.67% - 0.27% ReleaseThinLTO + 1.07% - 0.42% ReleaseLTO-g. + 0.81% - 0.33% RelThinLTO (link only) + 0.94% - 0.42% RelLO-g (link only). + 0.78% - 0.33% Details: http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions <http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions> There currently are 2 patches pending until we should be ready to flip the switch: https://reviews.llvm.org/D86651 <https://reviews.llvm.org/D86651> [MemCpyOpt] Preserve MemorySSA. https://reviews.llvm.org/D86815 <https://reviews.llvm.org/D86815> [LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments. Cheers, Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/1bae2def/attachment.html>
Florian Hahn via llvm-dev
2020-Sep-04 21:14 UTC
[llvm-dev] [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
Just another quick update: the patch to flip the switch is now up as well https://reviews.llvm.org/D87163 <https://reviews.llvm.org/D87163> I plan on flipping the switch once the review gets approved, unless there are any remaining major concerns. Please let me know if you see any regressions/problems after the switch. Cheers, Florian> On Sep 3, 2020, at 16:08, Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > >> On Sep 1, 2020, at 23:03, Alina Sbirlea <alina.sbirlea at gmail.com <mailto:alina.sbirlea at gmail.com>> wrote: >> >> Hi Florian, >> >> Following up on D86967, I missed that all the timings were using the legacy pass manager. >> Did you do any testing on the compile and run time impact for the new pass manager? > > > All numbers shared earlier where indeed with the default configuration/LPM. > > I did some testing with the new PM today (using ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER). In terms of removed stores, the improvements are very similar to the legacy pass manager. > > In terms of compile-time, the impact is a bit bigger unfortunately, due to the fact that we need more extra computations of MemorySSA. For example, in the regular pipeline, DSE is scheduled just before LICM, both of which use MemorySSA. With the LPM, if I understand correctly, LICM's loop pass manager constructs MemorySSA even if there is no loop in a function. So in the LPM, the number of times MemorySSA is computed stays the same. > > With the NPW, it seems like LICM’s loop pass manager does only construct MemorySSA if there are actual loops in a function. This is great, but unfortunately means that using DSE + MemorySSA introduces an extra MemorySSA construction for each function without loops. For tramp3d-v4 this introduces 3x more MemorySSA constructions when running DSE + MemorySSA followed by LICM on the LTO module, for example. > > I do not think there are any short-term solutions for this in the NPM, but once more passes, e.g. NewGVN or MemCpyOptimizer, are using MemorySSA, the cost should be somewhat amortized. > > So here are the latest CTMark numbers with DSE + MemorySSA enabled. Note that they those measurements do not include the compile-time improvements made recently to MemDepAnalysis a week ago (https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e <https://github.com/llvm/llvm-project/commit/3a54b6a4b71c21cf3bab4f132cbc2904fb9d997e>). I think this highlights the benefits of switching as soon as possible, so the DSE + MemorySSA implementation can benefit from more additional and fresh eyes with focus on compile-time. > > New Pass Manager > exec instrs. size-text > O3 + 0.95% - 0.25% > ReleaseThinLTO + 1.28% - 0.41% > ReleaseLTO-g. + 1.64% - 0.35% > RelThinLTO (link only) + 0.93% - 0.41% > RelLO-g (link only). + 2.09% - 0.35% > > Details: > http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions <http://195.201.131.214:8000/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions> > > Legacy Pass Manager > exec instrs. size-text > O3 + 0.67% - 0.27% > ReleaseThinLTO + 1.07% - 0.42% > ReleaseLTO-g. + 0.81% - 0.33% > RelThinLTO (link only) + 0.94% - 0.42% > RelLO-g (link only). + 0.78% - 0.33% > > Details: > http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions <http://llvm-compile-time-tracker.com/compare.php?from=7fa5828950d060a70eb57e721765da7cc67c5695&to=b5e5f6ed0f583ddd7a032c274e013335216d6372&stat=instructions> > > There currently are 2 patches pending until we should be ready to flip the switch: > > https://reviews.llvm.org/D86651 <https://reviews.llvm.org/D86651> [MemCpyOpt] Preserve MemorySSA. > https://reviews.llvm.org/D86815 <https://reviews.llvm.org/D86815> [LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments. > > Cheers, > Florian > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/35b76eb1/attachment.html>