Wan, Xiaofei
2013-Jul-16 10:12 UTC
[LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
Hi, community: For the sake of our business need, I want to enable "Function-based parallel code generation" to boost up the compilation of single module, please see the details of the design and provide your feedbacks on below aspects, thanks 1. Is this idea the proper solution for my requirement 2. This new feature will be enabled by llc -thd=N and has no impact on original llc when -thd=1 3. Can this new feature of llc be accepted by community and merged into LLVM code tree Patches The patch is divided into four separated parts, the all-in-one patch could be found here: http://llvm-reviews.chandlerc.com/D1152 Design https://docs.google.com/document/d/1QSkP6AumMCAVpgzwympD5pI3btPJt4SRgjY-vhyfySg/edit?usp=sharing Function-based parallel LLVM backend code generation Wan Xiaofei (xiaofei.wan at intel.com) Background l Our business need to compile C/C++ source files into LLVM IR and link them into a big BC file; the big BC file is then compiled into binary code on different arch/target devices. l Backend code generation is a time-consuming activity happened on target device which makes it an important user experience. l Make -j or file based parallelism can't help here since there is only one big BC file; function-based parallel LLVM backend code generation is a good solution to improve compilation time which will fully utilize multi-cores. Overall design strategy and goal l Generate totally same binary as what single thread output l No impacts on single thread performance l Little impacts on LLVM code infrastructure Why not choose module-based parallelism l Module partition ü How to partition one module into several modules which will consume similar time; Compilation time is not decided by instructions number. Compilation time depends on instruction categories + instruction number + CFG + others ü We tried this solution and stopped after below obstacles. Global variables & functions may be used by other modules; each global variables & constants has a use list, use list has to be re-constructed during module partition. Functions and variables must be cloned out since they can't belong to two modules; this may be a big effort and a waste of memory, especially for big BC files. l Binaries merge ü Linking different binaries is needed after all modules are finished, usually linking is a time consuming activity. l Validation strategy ü To simplify the validation, generating same binary is the best solution (including symbols and function order). It is not easy to generate totally identical binaries even though it is correct in module-based parallelism. Symbol & temp variable mangling are done in different modules; it is difficult to ensure the symbols are same. ü The function order after linking may not be same as what it should be in one-module. l Potential benefit ü What is the benefit from module-based parallelism, can it bring more benefit than function-based parallelism? ü Module partition & module linkage are two extra overheads. ü Global variables should be cloned for several times, it is a big memory penalty. Design of function-based parallelism [cid:image001.png at 01CE824E.0C5C7950] Step 1: Make LLVM pass Reentrant l Function passes should be thread-safe since function-based parallelism is adopted. l UseList and ValueHandleList of 'Constant' class may be accessed by different functions; all operations on UseList and ValueHandleList should be locked. l LLVMContext will be shared by different functions; all accesses to LLVMContext should be locked. l For allocators which use default SlabAllocator (which is static), operation on these kinds of allocators should be locked. l Symbols in MCContext are accessed by different functions, it should be locked. Step 2: Multiple pass manager * The role of PM in LLVM code generator ü PassManager is top level pass manager, it contains all module level passes which are necessary to generate the binary code; In all module-level passes, function pass manager is the biggest module pass. ü PassManager will control all steps during the code generation; a function should walk through all passes contained in function pass manager to emit final binary code. ü Pass can't be shared by different function/thread simultaneously since pass contain many immediate information. * Multiple pass managers ü Multiple pass managers are created to implement parallel compilation, each pass manager is owned by one thread. ü During all passes/pass managers, there is one parent pass/pass manager which will delegate some activities for other passes/pass managers ü AsmPrinter is the last pass for function passes, it is shared by all threads; in this pass, parent AsmPrinter will delegate the code emission for other threads. Step 3: Share the last pass "AsmPrinter" * AsmPrinter is the last function pass which will emit the final binary code; it is shared by different functions/threads. * AsmPrinter is responsible for merging instructions generated by different threads. * AsmPrinter will provide mechanism to make sure the instructions sequences are same as what they are in single thread. Validation methodology & test result * Parallel llc will generate totally same binary code as single thread, similar validation to single thread can be used * Long time stress tests are launched to guarantee the correctness and robustness. Current status and test result * Parallel llc can generate same code as single thread by "objdump -d", it could pass 10 hours stress test for all performance benchmark * Parallel llc can introduce ~2.9X performance gain on XEON sever for 4 threads Thanks Wan Xiaofei -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 202279 bytes Desc: image001.png URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.Threading.patch Type: application/octet-stream Size: 5225 bytes Desc: Parallel.CG.Threading.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.AsmPrinter.patch Type: application/octet-stream Size: 29239 bytes Desc: Parallel.CG.AsmPrinter.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.MultiplePM.patch Type: application/octet-stream Size: 21920 bytes Desc: Parallel.CG.MultiplePM.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: Parallel.CG.PassReentrant.patch Type: application/octet-stream Size: 56511 bytes Desc: Parallel.CG.PassReentrant.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130716/6461f125/attachment-0003.obj>
Reasonably Related Threads
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation