Chijun Sima via llvm-dev
2018-Aug-10 19:27 UTC
[llvm-dev] [GSoC] Implement a single updater class for Dominators - Final Report
Hello everyone, During this summer I was working on implementing a single updater class for Dominators. Note 1): If you are just concern about how this can affect your development, please see section 4 - Actions. Note 2): The full length report, which details the contribution, challenges can be found in [1]. 1. Background The API for updating the dominator and postdominator tree provided by LLVM before this project was fragmented. Different functions affecting these data structures had to decide whether update them eagerly or to perform updates on the DominatorTree using a special wrapper class DeferredDominance to enable lazy updates and deduplication. Passes and utils also needed to handle some corner cases (e.g., explicitly remove forward-unreachable nodes from the PostDominatorTree before deletion) when they updated dominators. Thus, there was a strong need to implement a single updater class for abstracting away the tree update strategies and which trees are actually being updated to provide a uniform and clean interface for pass developers to update dominators. Moreover, after implementing the new updater class, it is also feasible to implement a timing method on it to take a closer look than before on where is beneficial to improve the compilation time. The new updater class is also able to offer developers an easier way to test whether it is profitable to preserve the DominatorTree along the optimization pipeline combing with automatic CFG snapshot and diff algorithms implemented in this new updater. 2. Tasks completed The work during this summer can be divided into the following four parts: a) Explore the drawbacks of the DomintorTree interface, cleanup the codebase, implement the new DomTreeUpdater interface and fit the DomTreeUpdater in all scenarios. The DomTreeUpdater is designed and implemented [2] to have the functionality of the previous DominatorTree wrapper class DeferredDominance. It can also work the same as a raw DominatorTree and PostDominatorTree. First, it can provide 12 ways of updating dominators. ({None, DomTree, PostDomTree, Both Trees} x {Eager, Lazy, (Auto - proposed/not merged)}) It greatly expands and reuses the previous interface of only 3~6 update strategies (often {None, DomTree, Lazy} + {None, DomTree, Eager}) provided by utils in Local.cpp and BasicBlockUtils.cpp. Second, it does not require users to write redundant calls and worry about corner cases mentioned in [3]. It also enables the utils to be cleaner by combining the parameters previously used by three different classes (i.e. DeferredDominance, DominatorTree, PostDominatorTree) into one and removing unnecessary branching logic. Furthermore, it exposes more function (e.g., callbacks on BasicBlock deletion) than its predecessor (DeferredDominance) and is less confusing to users. (e.g., recalculate() under Lazy UpdateStrategy is not related to the internal state of the updater any more) b) Transform codebase to use the new DomTreeUpdater interface to make updating dominators more consistent and uniform. For converting existing passes to use the new interface, now all the interfaces that previously accepted DeferredDominance and all passes that used DeferredDominace or both DomTree and PostDomTree are migrated to use the new interface [4]. It can enable passes to preserve PostDominatorTree easily in the future and make utils easier to maintain. c) Implement timing methods to find out where is profitable to improve the compilation time. Explore and preserve the DominatorTree and PostDominatorTree along the LLVM optimization pipeline in various passes. After I implemented the DomTreeUpdater class, I began implementing a timer and several statistics to find out where the most time is consumed by the DominatorTree related calculation. After the timing patch [5] is ready, I discovered that recalculation takes most of the time in the dominatortree calculation (See [Statistics] Benchmarks on Lazy UpdateStrategy [6]). Because of that, I began to look into how to preserve the DominatorTree along the pipeline. I used the “-debug-pass=Structure” argument to see where the domtree is invalidated in the legacy pass manager and looked into the source code of those passes to find out whether it is good to preserve the dominators in it. I also wrote a tool to analyze where the DominatorTree is used and invalidated in the new Pass Manager. Because the new pass manager schedules passes dynamically, I used multiple inputs to find out the results. I made several improvements to the pipeline [7-9] and wrote a report ([Dominators] Preserving the DominatorTree along the pipeline [10]). d) Explore ways to make passes which cannot submit accurate updates to use the DomTreeUpdater safely and ways to make developers of complex passes to test whether it is profitable to preserve the DomintorTree along the pipeline. During the transformation of existing passes and transforms, Brian Rzycki reminded me of that the JumpThreading pass relies on an undocumented behavior of the predecessor of the DomTreeUpdater (i.e. removing invalid updates which never actually happened). JumpThreading Pass currently relies on some internal state of the DomTreeUpdater to work. Because the pass itself is complicated to test thoroughly, and DominatorTree updates is hard to get synced with the CFG in the pass because of its complexity, the safest choice is to implement a method which uses an algorithm to find out the difference between the CFG of the previous state when the DominatorTree get initialized or updated and the current state. The algorithm views the CFG as a set of edges and the diff between two CFGs can then be calculated using set difference. I implemented this functionality in three different ways. The first one named “Auto[1]” in [11] only needs the user to submit the BasicBlocks going to be deleted. The second one named “Auto[2]” in [12] needs the same information the user submits currently using the Lazy UpdateStrategy. The third one named “Auto[3]” in [13] needs the user to submit extra information before the CFG changes. I implemented timing patches [16-18] on these three implementations. I benchmarked them [19-21] and found the third one was almost in line with the current implementation, which is the most promising. But due to the lack of time before the project ends, its performance was not fully examined to get merged into the LLVM codebase. I think the next step after the project is to find out ways to optimize the storage time used by this implementation to get it faster. Though the first implementation is slower than the third one, I think it can serve as a tool to test whether it is profitable to preserve the DominatorTree Analysis in a complex pass. I implemented this idea on the SimplifyCFG pass in [14]. The statistics [15] shows that currently it is not profitable to do so and it deserves further investigation after the project. 3. Future work a) I mentioned in my initial proposal to implement a pruning algorithm to prune updates to the PostDominatorTree by getting information from the up-to-date DominatorTree to improve the compilation time. After benchmarking [6], I found most time spent by the dominator tree calculation is reconstruction of the (forward) dominator tree, problems with updating dominators in JumpThreading exposed at the same time and we need to figure out a smarter way to update the DominatorTree in JumpThreading and save time consumed by reconstruction. After a discussion with my mentor, we decided to look into ways to improve the compilation time by preserving the DominatorTree / PostDominatorTree along the pipeline and explore ways on CFG snapshotting and diffing with hints-based update strategy. b) The JumpThreading Pass in the current LLVM codebase heavily relies on the internal state of the DomTreeUpdater which makes it hard and tricky to update the DominatorTree and the PostDominatorTree in this pass. The JumpThreading Pass can submit an update which is maybe already known by the DominatorTree which makes updates cancel with each other inside the DomTreeUpdater which results in a DominatorTree getting unsynced with the current CFG. This problem can be solved by applying the Auto UpdateStrategy patches mentioned above. We need to continue to work on improving the compilation time performance before those patches go upstream. c) From the report I created [10], several passes still invalidates the DominatorTree and the PostDominatorTree in the new pass manager. It is possible to improve the compilation time by preserving analysis in those passes. d) The SimplifyCFG Pass consumes most of the time of the DominatorTree calculation by invalidating it several times along the LLVM optimization pipeline. A work-in-progress patch [14] has been created to experimentally preserve the DominatorTree in the SimplifyCFG. Though it suffers a worse performance than the baseline (the current implementation in LLVM revision 339227), it is valuable to look into why it takes so long to update the DominatorTree after SimplifyCFG runs and whether it can be improved. 4. Actions a) For pass developers: If you want to update the DominatorTree lazily or do deduplication before submitting updates, please rebase your patch using DeferredDominance class to use the DomTreeUpdater class. Current passes preserving the DominatorTree by updating it incrementally and using utils which accept the DomTreeUpdater provided by Local.cpp and BasicBlockUtils.cpp can gain free PostDominatorTree preservation ability by migrating from raw DominatorTree updating API to the DomTreeUpdater. If you are currently writing a complex pass and wondering whether it is beneficial to preserve the DominatorTree, you can decide it by having a try using [11] and [16] and see the time consumed by “DomTree Calculation”. See [14] for an example. b) For developers working on utils (Local.cpp and BasicBlockUtils.cpp): Please make functions updating the DominatorTree using the incremental way via the DomTreeUpdater. It is needed to check whether it is compatible with both DominatorTree and PostDominatorTree under both updating strategies (Eager/Lazy). 5. Acknowledgements I would like to offer my special thanks to my mentor Jakub Kuderski for patiently guiding me on this project and providing me with valuable reviews and advice. I would also like to thank Brian Rzycki for advising me and reviewing my patches. Thanks to all LLVM community members for their support during the project. I would like to continue maintaining the DomTreeUpdater after the project :). Best, Chijun Sima [1] https://docs.google.com/document/d/1hubvaiQxQSBR9H9TOnH1nrtSusEr3vg5sXVLGj7xrHE/edit?usp=sharing [2] https://reviews.llvm.org/D48383 [3] http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html [4] https://reviews.llvm.org/D48967 [5] https://reviews.llvm.org/D50300 [6] https://reviews.llvm.org/P8098 [7] https://reviews.llvm.org/D48914 [8] https://reviews.llvm.org/D49982 [9] https://reviews.llvm.org/D49988 [10] http://lists.llvm.org/pipermail/llvm-dev/2018-August/125185.html [11] https://reviews.llvm.org/D50302 [12] https://reviews.llvm.org/D50308 [13] https://reviews.llvm.org/D50311 [14] https://reviews.llvm.org/D50305 [15] https://reviews.llvm.org/P8097 [16] https://reviews.llvm.org/D50303 [17] https://reviews.llvm.org/D50309 [18] https://reviews.llvm.org/D50313 [19] https://reviews.llvm.org/P8099 [20] https://reviews.llvm.org/P8100 [21] https://reviews.llvm.org/P8101