César
2013-Apr-29 00:05 UTC
[LLVMdev] GSOC Proposal: Implement Decoupled Software Pipeline
Hello, below is the first draft of my proposal for GSoC. Any comments/advices are apreciated. # Implement Decoupled Software Pipeline (DSWP) ------------------------------------------------------------------------------- # Abstract ------------------------------------------------------------------------------- The goal of this project is to implement the automatic parallelization technique called Decoupled Software Pipeline [1] in LLVM. # Contents ------------------------------------------------------------------------------- ### Personal Details Name: Divino César S. Lucas Email: divcesar AT gmail DOT com Other contact methods: gtalk, +55 19 8294 0276 ### Project Proposal The goal of this project is to implement DSWP in LLVM. DSWP is an automatic parallelizing technique targeted to parallelize loops that have cross iteration dependencies. Being so, DSWP have a more widely application scope than other automatic parallelizing techniques such as DOALL and Vectorization. The DSWP algorithm is shown below and a detailed description follows. %%%%%%%% DSWP Algorithm %%%%%%%%%%%% DSWP_Procedure(L) G = build_dependence_graph(L); SCCs = find_SCCs(G) If (|SCCs| == 1) return ; EndIf DAG_SCC = coalesce_SCCs(G, SCCs) P = TPP_algorithm(DAG_SCC, L) If (|P| == 1) return ; EndIf split_code_into_loops(L, P) insert_necessary_flows(L,P) EndProcedure %%%%%%%% DSWP Algorithm %%%%%%%%%%%% This procedure receive as input the loop L in an intermediate representation and modifies it as a side-effect. The first step in the algorithm is construct the Program Dependence Graph (PDG)[2], this is a graph that represents all data, memory and control dependence for the loop. After constructing the PDG all dependency cycles are identified by finding all the strongly connected components (SCC) in the PDG. At this point DSWP check if only one SCC was found, this means that the whole loop form one dependency cycle and thus it is not possible to decouple the execution of parts of the loop body (it is not possible to split the body into more than one thread). If the loop has more than one SCC the next step coalesce each SCC in just one node and form the DAG_SCC which is a graph that represent the dependency pattern among the SCCs. Potentially each SCC would be executed by one different processing element(PE), however due to load balancing and/or communication latency it may be preferable to execute more than one SCC in each PE. The procedure TPP_Algorithm tries to partition the SCCs the best possible among all PEs. If just one partition was created the algorithm finish since no parallelism would be extracted. The last but one step is responsible for extracting the SCCs from the original loop and creating new thread/loops for each SCCs. The last step is responsible to insert communication between the stages to enable data forwarding and synchronization. The main goal of this project is to implement DSWP as a pass in the LLVM opt tool, however as a "side-effect" of this project new analyzis can be added to opt such as -print-pdg to print the program dependence graph in a DOT[3] format and/or -print-ddg to print the data dependence graph in DOT format. ### Schedule of Deliverables Week 01: The first week I'll be involved in constructing the control dependence graph. Week 02: The second week I'll have finished the CDG and start implementing the data dependence graph construction. Week 03: In this week I'll work to integrate the CDG and the DDG to form the program dependence graph. Week 04: On the first week I'll work to construct the DAG_SCC, that is, to find the SCCs in the PDG and coalesce them forming the DAG_SCC. At the end of the first month I expect to have all analyzis pass implemented, that is, at this point I'll have the infrastructure to construct a program dependence graph, find its strongly connected components and coalesce them to form the DAG_SCC. Week 05: I plan to spend this week implementing the algorith to partition the SCCs in pipeline stages. Week 06: In this week and the next I'll work on implementing the code to split the original loop body among the pipeline stages. Week 07: Split the original loop body into stages. Week 08: In this week and the next I'll work on implementing the necessary infrastructure to enable the communication and synchronization between the pipeline stages. Week 09: Insert communication and synchronization channels. By the end of the nineth week I expect to have implemented all mechanism to enable DSWP working. That is, at this point I'll be ready to start testing the system with real benchmarks. Week 10: Benchmark the system using LLVM test-suite and correct errors. Week 11: Benchmark the system using LLVM test-suite and correct errors. Week 12: Produce final report and documentation of the tasks realized. ### Open Source Development Experience I've started a project by my own for educational purposes, its goal is to enable the emulation of Linux x86 processes [4]. Currently the system is capable of emulating statically linked programs compiled with micro-libc. Plans are to improve the system to emulate dynamically linked programs compiled with micro-libc and libc. ### Academic Experience Currently I'm PhD. student at University of Campinas - Brazil. My research is focused in Compilers and Virtual Machines, specifically I work with automatic parallelization and adaptative optimization systems. ### Why Me and Why LLVM I've always looked for a opportunity to contribute to LLVM, however I find it hard to contribute without defining clear goals. I expect this project to be the first of many contributions I can give to LLVM. In fact I'm decided to start contributing to LLVM independent of this project being accepted or not. I see LLVM as a great opportunity to put in practice many of the concepts I've seem just in theory. ### References [1] http://dl.acm.org/citation.cfm?id=1100543 [2] http://dl.acm.org/citation.cfm?id=24041 [3] http://www.graphviz.org/ [4] https://github.com/JohnTortugo/Box