thr3ads.net - llvm dev - [LLVMdev] Question on Fence Instruction [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Duan, Yue Lu

2012-Oct-17 14:21 UTC

[LLVMdev] Question on Fence Instruction

Thank you very much for the quick reply. I was trying to confirm what I did was
correct. I did a test that could enable a simple way of sc-preserving
compilation by inserting fences for every load/store instruction before any
opts, applying standard opts and then removing them after assembly code
generation. It turned out that such sc-preserving compilation only caused ~4%
slowdown for 18 benchmarks on average on a Intel Xeon machine. The result
surprised me a lot because it was reported that such naive way of compilation
can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted
this question. I will try to examine if the generated binary code really
respects sc fences.

Yuelu
________________________________________
From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw]
Sent: Wednesday, October 17, 2012 9:00 AM
To: Duan, Yue Lu
Cc: llvmdev at cs.illinois.edu
Subject: Re: [LLVMdev] Question on Fence Instruction

On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu
wrote:> Hi,
>
> I have a question with the latest released LLVM which supports Fence
> Instruction as IR. Say if I intentionally place a Sequentially Consistent
Fence
> Instruction somewhere in the code, then would the other transformation
passes
> that applied later respect the Fence and do not perform any reordering
across
> it?
  In theory, all optimization passes should respect sc. If you find any
counter example, I think it's a bug.

HTH,
chenwj

--
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

John Criswell

2012-Oct-17 14:45 UTC

head link

[LLVMdev] Question on Fence Instruction

On 10/17/12 9:21 AM, Duan, Yue Lu wrote:> Thank you very much for the quick reply. I was trying to confirm what I did
was correct. I did a test that could enable a simple way of sc-preserving
compilation by inserting fences for every load/store instruction before any
opts, applying standard opts and then removing them after assembly code
generation. It turned out that such sc-preserving compilation only caused ~4%
slowdown for 18 benchmarks on average on a Intel Xeon machine. The result
surprised me a lot because it was reported that such naive way of compilation
can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted
this question. I will try to examine if the generated binary code really
respects sc fences.
Perhaps I'm misunderstanding something, but why are you removing the
fences before code generation? I would think that removing the fences
would permit the hardware to re-order loads and stores in a way that
violates sequential consistency. In other words, while you've ensured
that the compiler doesn't do anything to violate sc, you're letting the
hardware violate sc.

Are you compiling for a machine that is sequentially consistent by default?

Also, to what PLDI paper are you referring?

-- John T.
>
> Yuelu
> ________________________________________
> From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw]
> Sent: Wednesday, October 17, 2012 9:00 AM
> To: Duan, Yue Lu
> Cc: llvmdev at cs.illinois.edu
> Subject: Re: [LLVMdev] Question on Fence Instruction
>
> On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote:
>> Hi,
>>
>> I have a question with the latest released LLVM which supports Fence
>> Instruction as IR. Say if I intentionally place a Sequentially
Consistent Fence
>> Instruction somewhere in the code, then would the other transformation
passes
>> that applied later respect the Fence and do not perform any reordering
across
>> it?
>   In theory, all optimization passes should respect sc. If you find any
> counter example, I think it's a bug.
>
> HTH,
> chenwj
>
> --
> Wei-Ren Chen (陳韋任)
> Computer Systems Lab, Institute of Information Science,
> Academia Sinica, Taiwan (R.O.C.)
> Tel:886-2-2788-3799 #1667
> Homepage: http://people.cs.nctu.edu.tw/~chenwj
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

陳韋任 (Wei-Ren Chen)

2012-Oct-17 15:08 UTC

head link

[LLVMdev] Question on Fence Instruction

On Wed, Oct 17, 2012 at 02:21:40PM +0000, Duan, Yue Lu
wrote:> Thank you very much for the quick reply. I was trying to confirm what I did
was correct. I did a test that could enable a simple way of sc-preserving
compilation by inserting fences for every load/store instruction before any
opts, applying standard opts and then removing them after assembly code
generation. It turned out that such sc-preserving compilation only caused ~4%
slowdown for 18 benchmarks on average on a Intel Xeon machine. The result
surprised me a lot because it was reported that such naive way of compilation
can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted
this question. I will try to examine if the generated binary code really
respects sc fences.
  My 2 cents is that maybe x86 already has a pretty strong memory model that
it doesn't cause much performance loss if you remove those SC fence.

HTH,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Duan, Yue Lu

2012-Oct-17 15:28 UTC

head link

[LLVMdev] Question on Fence Instruction

Hi,

The paper is "A Case for an SC-Preserving Compiler" from PLDI 2011.
What I did is following their "naive SC preserving compilation", that
restricts the compiler to do any reordering for potentially shared load/store
instructions. The paper says the resulting code running on x86 machine
(SC-preserving binary run on non-SC hardware) will get 22% slowdown comparing
with a normally optimized code running on same machine (non-SC binary run on
non-SC hardware). The experiment is to see how much performance will be lost by
restricting the reordering of shared load/store instructions because of those
disabled compiler transformations. The fences are removed from the assembly code
because they are too costly so that the performance lost of compilation
restriction can not be checked independently.

The result I get shows such reordering restriction in compilation only lead to
4% slowdown, way less than the paper's report. The reason could be that the
compiler does not respect SC fences so unexpected reordering is done and lead to
better performance. It could also be that their implementation is different than
mine. I am not sure.

-Yuelu

________________________________________
From: John Criswell [criswell at illinois.edu]
Sent: Wednesday, October 17, 2012 9:45 AM
To: Duan, Yue Lu
Cc: "陳韋任 (Wei-Ren Chen)"; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Question on Fence Instruction

On 10/17/12 9:21 AM, Duan, Yue Lu wrote:> Thank you very much for the quick reply. I was trying to confirm what I did
was correct. I did a test that could enable a simple way of sc-preserving
compilation by inserting fences for every load/store instruction before any
opts, applying standard opts and then removing them after assembly code
generation. It turned out that such sc-preserving compilation only caused ~4%
slowdown for 18 benchmarks on average on a Intel Xeon machine. The result
surprised me a lot because it was reported that such naive way of compilation
can cause 20% slowdown in a recent PLDI paper (they also use LLVM), so I posted
this question. I will try to examine if the generated binary code really
respects sc fences.
Perhaps I'm misunderstanding something, but why are you removing the
fences before code generation? I would think that removing the fences
would permit the hardware to re-order loads and stores in a way that
violates sequential consistency. In other words, while you've ensured
that the compiler doesn't do anything to violate sc, you're letting the
hardware violate sc.

Are you compiling for a machine that is sequentially consistent by default?

Also, to what PLDI paper are you referring?

-- John T.
>
> Yuelu
> ________________________________________
> From: 陳韋任 (Wei-Ren Chen) [chenwj at iis.sinica.edu.tw]
> Sent: Wednesday, October 17, 2012 9:00 AM
> To: Duan, Yue Lu
> Cc: llvmdev at cs.illinois.edu
> Subject: Re: [LLVMdev] Question on Fence Instruction
>
> On Tue, Oct 16, 2012 at 01:44:57PM +0000, Duan, Yue Lu wrote:
>> Hi,
>>
>> I have a question with the latest released LLVM which supports Fence
>> Instruction as IR. Say if I intentionally place a Sequentially
Consistent Fence
>> Instruction somewhere in the code, then would the other transformation
passes
>> that applied later respect the Fence and do not perform any reordering
across
>> it?
>   In theory, all optimization passes should respect sc. If you find any
> counter example, I think it's a bug.
>
> HTH,
> chenwj
>
> --
> Wei-Ren Chen (陳韋任)
> Computer Systems Lab, Institute of Information Science,
> Academia Sinica, Taiwan (R.O.C.)
> Tel:886-2-2788-3799 #1667
> Homepage: http://people.cs.nctu.edu.tw/~chenwj
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Oct 2012 - [LLVMdev] Question on Fence Instruction

[LLVMdev] Question on Fence Instruction

[LLVMdev] Question on Fence Instruction

[LLVMdev] Question on Fence Instruction

[LLVMdev] Question on Fence Instruction

Possibly Parallel Threads