A third approach is to decouple the backend compilation and parallelism strategy from the partitioning. The partitioning can spits out partition BC files and some action records in some standard format. All of this can be fed into some driver tools that converts the compilation action file into make/build file of the underlying build system of your choice: 1) it can simply a compiler driver that does thread level parallelism; 2) or a tool that generates Makfiles which are fed into parallel make to explore single node parallelism; 3) or a tool that generates BUILD files that feed into distributed build system (such as Google's blaze: http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) Another benefit is it will make compiler debugging easier. thanks, David On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote:> > On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > > 3.2 Compile partitions independently > -------------------------------------- > > There are two camps: one camp advocate compiling partitions via > multi-process, > the other one favor multi-thread. > > Inside Apple compiler teams, I'm the only one belong to the 1st comp. I > think > while multi-proc sounds bit red-neck, it has its advantage for this purpose, > and > while multi-thread is certainly more eye-popping, it has its advantage > as well. > > The advantage of multi-proc are: > 1) easier to implement, the process run in its own address space. > We don't need to worry about they can interfere with each other. > > 2)huge, or not unlimited, address space. > > The disadvantage is that it's expensive. But I guess the cost is > almost negligible compared to the overall IPO compilation. > > The advantage of multi-threads I can imagine are: > 1) sound fancy > 2) it is light-weight > 3) inter-thread communication is easier than IPC. > > Its disadvantage are: > 1). Oftentime we will come across race-condition, and it took > awful long time to figure it out. While the code is supposed > to be mult-thread safe, we might miss some tricky case. > Trouble-shooting race condition is a nightmare. > > 2) Small address space. This is big problem if we the compiler > is built 32-bit . In that case, the compiler is not able to bring > lots of stuff in memory even if the HW dose > provide ample mem. > > 3) The thread-safe run-time lib is more expensive. > I once linked a compiler using -lpthread (I dose not have to) on a > UNIX platform, and saw the compiler slow down by about 1/3. > > I'm not able to convince the folks in other camp, neither are they > able to convince me. I decide to implement both. Fortunately, this > part is not difficult, it seems to be rather easy to crank out one within > short > period of time. It would be interesting to compare them side-by-side, > and see which camp lose:-). On the other hand, if we run into race-condition > problem, we choose multi-proc version as a fall-back. > > > While I am a self-proclaimed multi-process red-neck, in this case I would > prefer to see a multi-threaded implementation because I want to verify that > LLVMContext can be used as advertised. I'm sure some extra care will be > needed to report failures/diagnostics, but we should start with the > assumption that this approach is not significantly harder than multi-process > because that's how we advertise the design. > > If any of the multi-threaded disadvantages you point out are real, I would > like to find out about it. > > 1. Race Conditions: We should be able to verify that the thread-parallel vs. > sequential or multi-process compilation generate the same result. If they > diverge, we would like to know about the bug so it can be fixed--independent > of LTO. > > 2. Small Address Space with LTO. We don't need to design around this > hypothetical case. > > 3. Expensive thread-safe runtime lib. We should not speculate that platforms > that we, as the LLVM community, care about have this problem. Let's assume > that our platforms are well implemented unless we have data to the contrary. > (Personally, I would even love to use TLS in the compiler to vastly simplify > API design in the backend, but I am not going to be popular for saying so). > > We should be able to decompose each step of compilation for debugging. So > the multi-process "implementation" should just be a degenerate form of > threading with a bit of driver magic if you want to automate it. > > -Andy > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
I have actually came up the 3 approaches to build the post-ipo object independently. The "3rd approach" here is the 1st solution in my original proposal. Almost all coworkers call it sucks:-) Now I accept it because the it has no way to be adaptive. Consider the scenario we compile the llvm compiler. We use "make -j16" for computer with 8 processor, each make-thread invoke a compiler which may blindly invoke 16 threads! So, we end up to have 16*16 threads. Being adaptive will render it possible to pick up right factor judiciously and adpatively. In any case, I will support this approach (i.e. the 3rd approach you mentioned) at very least at beginning. On 7/16/13 1:35 PM, Xinliang David Li wrote:> A third approach is to decouple the backend compilation and > parallelism strategy from the partitioning. The partitioning can > spits out partition BC files and some action records in some standard > format. All of this can be fed into some driver tools that converts > the compilation action file into make/build file of the underlying > build system of your choice: > > 1) it can simply a compiler driver that does thread level parallelism; > 2) or a tool that generates Makfiles which are fed into parallel make > to explore single node parallelism; > 3) or a tool that generates BUILD files that feed into distributed > build system (such as Google's blaze: > http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) > > Another benefit is it will make compiler debugging easier. > > thanks, > > David > > On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote: >> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: >> >> 3.2 Compile partitions independently >> -------------------------------------- >> >> There are two camps: one camp advocate compiling partitions via >> multi-process, >> the other one favor multi-thread. >> >> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I >> think >> while multi-proc sounds bit red-neck, it has its advantage for this purpose, >> and >> while multi-thread is certainly more eye-popping, it has its advantage >> as well. >> >> The advantage of multi-proc are: >> 1) easier to implement, the process run in its own address space. >> We don't need to worry about they can interfere with each other. >> >> 2)huge, or not unlimited, address space. >> >> The disadvantage is that it's expensive. But I guess the cost is >> almost negligible compared to the overall IPO compilation. >> >> The advantage of multi-threads I can imagine are: >> 1) sound fancy >> 2) it is light-weight >> 3) inter-thread communication is easier than IPC. >> >> Its disadvantage are: >> 1). Oftentime we will come across race-condition, and it took >> awful long time to figure it out. While the code is supposed >> to be mult-thread safe, we might miss some tricky case. >> Trouble-shooting race condition is a nightmare. >> >> 2) Small address space. This is big problem if we the compiler >> is built 32-bit . In that case, the compiler is not able to bring >> lots of stuff in memory even if the HW dose >> provide ample mem. >> >> 3) The thread-safe run-time lib is more expensive. >> I once linked a compiler using -lpthread (I dose not have to) on a >> UNIX platform, and saw the compiler slow down by about 1/3. >> >> I'm not able to convince the folks in other camp, neither are they >> able to convince me. I decide to implement both. Fortunately, this >> part is not difficult, it seems to be rather easy to crank out one within >> short >> period of time. It would be interesting to compare them side-by-side, >> and see which camp lose:-). On the other hand, if we run into race-condition >> problem, we choose multi-proc version as a fall-back. >> >> >> While I am a self-proclaimed multi-process red-neck, in this case I would >> prefer to see a multi-threaded implementation because I want to verify that >> LLVMContext can be used as advertised. I'm sure some extra care will be >> needed to report failures/diagnostics, but we should start with the >> assumption that this approach is not significantly harder than multi-process >> because that's how we advertise the design. >> >> If any of the multi-threaded disadvantages you point out are real, I would >> like to find out about it. >> >> 1. Race Conditions: We should be able to verify that the thread-parallel vs. >> sequential or multi-process compilation generate the same result. If they >> diverge, we would like to know about the bug so it can be fixed--independent >> of LTO. >> >> 2. Small Address Space with LTO. We don't need to design around this >> hypothetical case. >> >> 3. Expensive thread-safe runtime lib. We should not speculate that platforms >> that we, as the LLVM community, care about have this problem. Let's assume >> that our platforms are well implemented unless we have data to the contrary. >> (Personally, I would even love to use TLS in the compiler to vastly simplify >> API design in the backend, but I am not going to be popular for saying so). >> >> We should be able to decompose each step of compilation for debugging. So >> the multi-process "implementation" should just be a degenerate form of >> threading with a bit of driver magic if you want to automate it. >> >> -Andy >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
On Tue, Jul 16, 2013 at 1:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:> I have actually came up the 3 approaches to build the post-ipo object > independently. > > The "3rd approach" here is the 1st solution in my original proposal. Almost > all coworkers call it sucks:-) > Now I accept it because the it has no way to be adaptive. > > Consider the scenario we compile the llvm compiler. We use "make -j16" for > computer with 8 processor, each make-thread invoke a compiler which may > blindly invoke 16 threads! > So, we end up to have 16*16 threads. >Determining the right parallelism is not the job of the compiler (builtin) nor that of a developer -- the underlying build system should take care of the scheduling :) David> Being adaptive will render it possible to pick up right factor judiciously > and adpatively. > > In any case, I will support this approach (i.e. the 3rd approach you > mentioned) at very least at beginning. > > > > On 7/16/13 1:35 PM, Xinliang David Li wrote: >> >> A third approach is to decouple the backend compilation and >> parallelism strategy from the partitioning. The partitioning can >> spits out partition BC files and some action records in some standard >> format. All of this can be fed into some driver tools that converts >> the compilation action file into make/build file of the underlying >> build system of your choice: >> >> 1) it can simply a compiler driver that does thread level parallelism; >> 2) or a tool that generates Makfiles which are fed into parallel make >> to explore single node parallelism; >> 3) or a tool that generates BUILD files that feed into distributed >> build system (such as Google's blaze: >> >> http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html) >> >> Another benefit is it will make compiler debugging easier. >> >> thanks, >> >> David >> >> On Sun, Jul 14, 2013 at 5:56 PM, Andrew Trick <atrick at apple.com> wrote: >>> >>> On Jul 12, 2013, at 3:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: >>> >>> 3.2 Compile partitions independently >>> -------------------------------------- >>> >>> There are two camps: one camp advocate compiling partitions via >>> multi-process, >>> the other one favor multi-thread. >>> >>> Inside Apple compiler teams, I'm the only one belong to the 1st comp. I >>> think >>> while multi-proc sounds bit red-neck, it has its advantage for this >>> purpose, >>> and >>> while multi-thread is certainly more eye-popping, it has its advantage >>> as well. >>> >>> The advantage of multi-proc are: >>> 1) easier to implement, the process run in its own address space. >>> We don't need to worry about they can interfere with each other. >>> >>> 2)huge, or not unlimited, address space. >>> >>> The disadvantage is that it's expensive. But I guess the cost is >>> almost negligible compared to the overall IPO compilation. >>> >>> The advantage of multi-threads I can imagine are: >>> 1) sound fancy >>> 2) it is light-weight >>> 3) inter-thread communication is easier than IPC. >>> >>> Its disadvantage are: >>> 1). Oftentime we will come across race-condition, and it took >>> awful long time to figure it out. While the code is supposed >>> to be mult-thread safe, we might miss some tricky case. >>> Trouble-shooting race condition is a nightmare. >>> >>> 2) Small address space. This is big problem if we the compiler >>> is built 32-bit . In that case, the compiler is not able to bring >>> lots of stuff in memory even if the HW dose >>> provide ample mem. >>> >>> 3) The thread-safe run-time lib is more expensive. >>> I once linked a compiler using -lpthread (I dose not have to) on a >>> UNIX platform, and saw the compiler slow down by about 1/3. >>> >>> I'm not able to convince the folks in other camp, neither are they >>> able to convince me. I decide to implement both. Fortunately, this >>> part is not difficult, it seems to be rather easy to crank out one within >>> short >>> period of time. It would be interesting to compare them side-by-side, >>> and see which camp lose:-). On the other hand, if we run into >>> race-condition >>> problem, we choose multi-proc version as a fall-back. >>> >>> >>> While I am a self-proclaimed multi-process red-neck, in this case I would >>> prefer to see a multi-threaded implementation because I want to verify >>> that >>> LLVMContext can be used as advertised. I'm sure some extra care will be >>> needed to report failures/diagnostics, but we should start with the >>> assumption that this approach is not significantly harder than >>> multi-process >>> because that's how we advertise the design. >>> >>> If any of the multi-threaded disadvantages you point out are real, I >>> would >>> like to find out about it. >>> >>> 1. Race Conditions: We should be able to verify that the thread-parallel >>> vs. >>> sequential or multi-process compilation generate the same result. If they >>> diverge, we would like to know about the bug so it can be >>> fixed--independent >>> of LTO. >>> >>> 2. Small Address Space with LTO. We don't need to design around this >>> hypothetical case. >>> >>> 3. Expensive thread-safe runtime lib. We should not speculate that >>> platforms >>> that we, as the LLVM community, care about have this problem. Let's >>> assume >>> that our platforms are well implemented unless we have data to the >>> contrary. >>> (Personally, I would even love to use TLS in the compiler to vastly >>> simplify >>> API design in the backend, but I am not going to be popular for saying >>> so). >>> >>> We should be able to decompose each step of compilation for debugging. So >>> the multi-process "implementation" should just be a degenerate form of >>> threading with a bit of driver magic if you want to automate it. >>> >>> -Andy >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >