Thank you very much for the quick reply. I was trying to confirm what I did was correct. I did a test that could enable a simple way of sc-preserving compilation by inserting fences for every load/store instruction before any opts, applying standard opts and then removing them after assembly code generation. It turned out that such sc-preserving compilation only caused ~4% slowdown for 18 benchmarks on average on a Intel Xeon machine. The result surprised me a lot because it was reported that such naive way of compilation can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted this question. I will try to examine if the generated binary code really respects sc fences. Yuelu ________________________________________ From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw] Sent: Wednesday, October 17, 2012 9:00 AM To: Duan, Yue Lu Cc: llvmdev at cs.illinois.edu Subject: Re: [LLVMdev] Question on Fence Instruction On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote:> Hi, > > I have a question with the latest released LLVM which supports Fence > Instruction as IR. Say if I intentionally place a Sequentially Consistent Fence > Instruction somewhere in the code, then would the other transformation passes > that applied later respect the Fence and do not perform any reordering across > it?In theory, all optimization passes should respect sc. If you find any counter example, I think it's a bug. HTH, chenwj -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj
On 10/17/12 9:21 AM, Duan, Yue Lu wrote:> Thank you very much for the quick reply. I was trying to confirm what I did was correct. I did a test that could enable a simple way of sc-preserving compilation by inserting fences for every load/store instruction before any opts, applying standard opts and then removing them after assembly code generation. It turned out that such sc-preserving compilation only caused ~4% slowdown for 18 benchmarks on average on a Intel Xeon machine. The result surprised me a lot because it was reported that such naive way of compilation can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted this question. I will try to examine if the generated binary code really respects sc fences.Perhaps I'm misunderstanding something, but why are you removing the fences before code generation? I would think that removing the fences would permit the hardware to re-order loads and stores in a way that violates sequential consistency. In other words, while you've ensured that the compiler doesn't do anything to violate sc, you're letting the hardware violate sc. Are you compiling for a machine that is sequentially consistent by default? Also, to what PLDI paper are you referring? -- John T.> > Yuelu > ________________________________________ > From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw] > Sent: Wednesday, October 17, 2012 9:00 AM > To: Duan, Yue Lu > Cc: llvmdev at cs.illinois.edu > Subject: Re: [LLVMdev] Question on Fence Instruction > > On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote: >> Hi, >> >> I have a question with the latest released LLVM which supports Fence >> Instruction as IR. Say if I intentionally place a Sequentially Consistent Fence >> Instruction somewhere in the code, then would the other transformation passes >> that applied later respect the Fence and do not perform any reordering across >> it? > In theory, all optimization passes should respect sc. If you find any > counter example, I think it's a bug. > > HTH, > chenwj > > -- > Wei-Ren Chen (陳韋任) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Wed, Oct 17, 2012 at 02:21:40PM +0000, Duan, Yue Lu wrote:> Thank you very much for the quick reply. I was trying to confirm what I did was correct. I did a test that could enable a simple way of sc-preserving compilation by inserting fences for every load/store instruction before any opts, applying standard opts and then removing them after assembly code generation. It turned out that such sc-preserving compilation only caused ~4% slowdown for 18 benchmarks on average on a Intel Xeon machine. The result surprised me a lot because it was reported that such naive way of compilation can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted this question. I will try to examine if the generated binary code really respects sc fences.My 2 cents is that maybe x86 already has a pretty strong memory model that it doesn't cause much performance loss if you remove those SC fence. HTH, chenwj -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj
Hi, The paper is "A Case for an SC-Preserving Compiler" from PLDI 2011. What I did is following their "naive SC preserving compilation", that restricts the compiler to do any reordering for potentially shared load/store instructions. The paper says the resulting code running on x86 machine (SC-preserving binary run on non-SC hardware) will get 22% slowdown comparing with a normally optimized code running on same machine (non-SC binary run on non-SC hardware). The experiment is to see how much performance will be lost by restricting the reordering of shared load/store instructions because of those disabled compiler transformations. The fences are removed from the assembly code because they are too costly so that the performance lost of compilation restriction can not be checked independently. The result I get shows such reordering restriction in compilation only lead to 4% slowdown, way less than the paper's report. The reason could be that the compiler does not respect SC fences so unexpected reordering is done and lead to better performance. It could also be that their implementation is different than mine. I am not sure. -Yuelu ________________________________________ From: John Criswell [criswell at illinois.edu] Sent: Wednesday, October 17, 2012 9:45 AM To: Duan, Yue Lu Cc: "陳韋任 (Wei-Ren Chen)"; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Question on Fence Instruction On 10/17/12 9:21 AM, Duan, Yue Lu wrote:> Thank you very much for the quick reply. I was trying to confirm what I did was correct. I did a test that could enable a simple way of sc-preserving compilation by inserting fences for every load/store instruction before any opts, applying standard opts and then removing them after assembly code generation. It turned out that such sc-preserving compilation only caused ~4% slowdown for 18 benchmarks on average on a Intel Xeon machine. The result surprised me a lot because it was reported that such naive way of compilation can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted this question. I will try to examine if the generated binary code really respects sc fences.Perhaps I'm misunderstanding something, but why are you removing the fences before code generation? I would think that removing the fences would permit the hardware to re-order loads and stores in a way that violates sequential consistency. In other words, while you've ensured that the compiler doesn't do anything to violate sc, you're letting the hardware violate sc. Are you compiling for a machine that is sequentially consistent by default? Also, to what PLDI paper are you referring? -- John T.> > Yuelu > ________________________________________ > From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw] > Sent: Wednesday, October 17, 2012 9:00 AM > To: Duan, Yue Lu > Cc: llvmdev at cs.illinois.edu > Subject: Re: [LLVMdev] Question on Fence Instruction > > On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote: >> Hi, >> >> I have a question with the latest released LLVM which supports Fence >> Instruction as IR. Say if I intentionally place a Sequentially Consistent Fence >> Instruction somewhere in the code, then would the other transformation passes >> that applied later respect the Fence and do not perform any reordering across >> it? > In theory, all optimization passes should respect sc. If you find any > counter example, I think it's a bug. > > HTH, > chenwj > > -- > Wei-Ren Chen (陳韋任) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev