thr3ads.net - llvm dev - [LLVMdev] [RFC] Add compiler scheduling barriers [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Yi Kong

2014-Jun-19 16:35 UTC

[LLVMdev] [RFC] Add compiler scheduling barriers

Hi all,

I'm currently working on implementing ACLE extensions for ARM. There
are some memory barrier intrinsics, i.e.__dsb and __isb that require
the compiler not to reorder instructions around their corresponding
built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
non-memory-access instructions.[1] This is currently not possible.

It is sometimes useful to prevent the compiler from reordering
memory-access instructions as well. The only way to do that in both
GCC and LLVM is using a in-line assembly hack:
  asm volatile("" ::: "memory")

I propose adding two compiler scheduling barriers intrinsics to LLVM:
__schedule_barrier_memory and __schedule_barrier_full. The former only
prevents memory-access instructions reordering around the instruction
and the latter stops all. So that __isb, for example, can be
implemented something like:
  inline void __isb() {
    __schedule_barrier_full();
    __builtin_arm_isb();
    __schedule_barrier_full();
  }

To implement these intrinsics, I think the best method is to add
target-independent pseudo-instructions with appropriate
properties(hasSideEffects for memory barrier and isTerminator for full
barrier) and a pseudo-instruction elimination pass after the
scheduling pass.

What do people think of this idea?

Cheers,

Yi

------

[1] A piece of code that requires such behaviour is:

  Data_array[n] = x; // memory access
  __DSB();
  __WFI();           // This cannot get executed until DSB completed

Moving WFI to before DSB will cause wrong behaviour. Code is taken
from DAI0321A 4.14,
(http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/DAI0321A_programming_guide_memory_barriers_for_m_profile.pdf)

Matt Arsenault

2014-Jun-19 16:51 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On Jun 19, 2014, at 9:35 AM, Yi Kong <kongy.dev at gmail.com> wrote:
> Hi all,
> 
> I'm currently working on implementing ACLE extensions for ARM. There
> are some memory barrier intrinsics, i.e.__dsb and __isb that require
> the compiler not to reorder instructions around their corresponding
> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
> non-memory-access instructions.[1] This is currently not possible.
> 
> It is sometimes useful to prevent the compiler from reordering
> memory-access instructions as well. The only way to do that in both
> GCC and LLVM is using a in-line assembly hack:
>  asm volatile("" ::: "memory")
> 
> I propose adding two compiler scheduling barriers intrinsics to LLVM:
> __schedule_barrier_memory and __schedule_barrier_full. The former only
> prevents memory-access instructions reordering around the instruction
> and the latter stops all. So that __isb, for example, can be
> implemented something like:
>  inline void __isb() {
>    __schedule_barrier_full();
>    __builtin_arm_isb();
>    __schedule_barrier_full();
>  }
> 
> To implement these intrinsics, I think the best method is to add
> target-independent pseudo-instructions with appropriate
> properties(hasSideEffects for memory barrier and isTerminator for full
> barrier) and a pseudo-instruction elimination pass after the
> scheduling pass.
> 
> What do people think of this idea?
> 
> Cheers,
> 
> Yi
This sounds similar to the problem I want to solve with the nomemfence attribute
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-January/069129.html

I had this about half implemented in December but I haven’t gotten back to
finishing it yet

-Matt

David Chisnall

2014-Jun-19 16:54 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On 19 Jun 2014, at 11:35, Yi Kong <kongy.dev at gmail.com> wrote:
> It is sometimes useful to prevent the compiler from reordering
> memory-access instructions as well. The only way to do that in both
> GCC and LLVM is using a in-line assembly hack:
>  asm volatile("" ::: "memory")
> 
> I propose adding two compiler scheduling barriers intrinsics to LLVM:
> __schedule_barrier_memory and __schedule_barrier_full. The former only
> prevents memory-access instructions reordering around the instruction
> and the latter stops all. So that __isb, for example, can be
> implemented something like:
>  inline void __isb() {
>    __schedule_barrier_full();
>    __builtin_arm_isb();
>    __schedule_barrier_full();
>  }
> 
> To implement these intrinsics, I think the best method is to add
> target-independent pseudo-instructions with appropriate
> properties(hasSideEffects for memory barrier and isTerminator for full
> barrier) and a pseudo-instruction elimination pass after the
> scheduling pass.
> 
> What do people think of this idea?
C11 defines the atomic_thread_fence() function for the memory-only part.  Clang
exposes this as __c11_atomic_thread_fence().  This is more flexible than that
part of your proposal, as it allows relaxing the restrictions based on the
memory model.

I can see the use case for preventing any reordering, but it seems somewhat
specialised.  Wouldn't it be simpler to just model the wfi operation as
loading from memory (which, effectively, it does, if you have a memory-mapped
interrupt controller)?

David

Yi Kong

2014-Jun-19 21:47 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On 19 June 2014 17:54, David Chisnall <David.Chisnall at cl.cam.ac.uk>
wrote:> On 19 Jun 2014, at 11:35, Yi Kong <kongy.dev at gmail.com> wrote:
>
>> It is sometimes useful to prevent the compiler from reordering
>> memory-access instructions as well. The only way to do that in both
>> GCC and LLVM is using a in-line assembly hack:
>>  asm volatile("" ::: "memory")
>>
>> I propose adding two compiler scheduling barriers intrinsics to LLVM:
>> __schedule_barrier_memory and __schedule_barrier_full. The former only
>> prevents memory-access instructions reordering around the instruction
>> and the latter stops all. So that __isb, for example, can be
>> implemented something like:
>>  inline void __isb() {
>>    __schedule_barrier_full();
>>    __builtin_arm_isb();
>>    __schedule_barrier_full();
>>  }
>>
>> To implement these intrinsics, I think the best method is to add
>> target-independent pseudo-instructions with appropriate
>> properties(hasSideEffects for memory barrier and isTerminator for full
>> barrier) and a pseudo-instruction elimination pass after the
>> scheduling pass.
>>
>> What do people think of this idea?
>
> C11 defines the atomic_thread_fence() function for the memory-only part. 
Clang exposes this as __c11_atomic_thread_fence().  This is more flexible than
that part of your proposal, as it allows relaxing the restrictions based on the
memory model.
atomic_thread_fence() always inserts a machine memory fence on weak
memory model, which is different from simply stopping reordering.
Memory fences are expensive and might be overkill.

We can leave this intrinsic out for now if it's not particularly
useful, as the in-line assembly hack does work, although not elegant.
> I can see the use case for preventing any reordering, but it seems somewhat
specialised.  Wouldn't it be simpler to just model the wfi operation as
loading from memory (which, effectively, it does, if you have a memory-mapped
interrupt controller)?
__WFI() is just one of the instructions where reordering isn't
allowed. There are more examples in the DAI0321A document. I think it
might be required on other architectures as well.
>
> David
>

Hal Finkel

2014-Jun-20 08:03 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

----- Original Message -----> From: "Yi Kong" <kongy.dev at gmail.com>
> To: "LLVM Dev" <llvmdev at cs.uiuc.edu>
> Sent: Thursday, June 19, 2014 11:35:05 AM
> Subject: [LLVMdev] [RFC] Add compiler scheduling barriers
> 
> Hi all,
> 
> I'm currently working on implementing ACLE extensions for ARM. There
> are some memory barrier intrinsics, i.e.__dsb and __isb that require
> the compiler not to reorder instructions around their corresponding
> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
> non-memory-access instructions.[1] This is currently not possible.
> 
> It is sometimes useful to prevent the compiler from reordering
> memory-access instructions as well. The only way to do that in both
> GCC and LLVM is using a in-line assembly hack:
>   asm volatile("" ::: "memory")
> 
> I propose adding two compiler scheduling barriers intrinsics to LLVM:
> __schedule_barrier_memory and __schedule_barrier_full. The former
> only
> prevents memory-access instructions reordering around the instruction
> and the latter stops all. So that __isb, for example, can be
> implemented something like:
>   inline void __isb() {
>     __schedule_barrier_full();
>     __builtin_arm_isb();
>     __schedule_barrier_full();
>   }
> 
> To implement these intrinsics, I think the best method is to add
> target-independent pseudo-instructions with appropriate
> properties(hasSideEffects for memory barrier and isTerminator for
> full
> barrier) and a pseudo-instruction elimination pass after the
> scheduling pass.
> 
> What do people think of this idea?
I don't believe that we currently support calls that are terminators, and
doing so would be a large change to the infrastructure. I think, however, that
declaring an intrinsic that is marked such that mayHaveSideEffects() is true
will also prevent reordering of anything around the intrinsic that touches
observable state. Is there a reason why this would be insufficient?

 -Hal
> 
> Cheers,
> 
> Yi
> 
> ------
> 
> [1] A piece of code that requires such behaviour is:
> 
>   Data_array[n] = x; // memory access
>   __DSB();
>   __WFI();           // This cannot get executed until DSB completed
> 
> Moving WFI to before DSB will cause wrong behaviour. Code is taken
> from DAI0321A 4.14,
>
(http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/DAI0321A_programming_guide_memory_barriers_for_m_profile.pdf)
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Philip Reames

2014-Jun-24 00:55 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On 06/19/2014 09:35 AM, Yi Kong wrote:> Hi all,
>
> I'm currently working on implementing ACLE extensions for ARM. There
> are some memory barrier intrinsics, i.e.__dsb and __isb that require
> the compiler not to reorder instructions around their corresponding
> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
> non-memory-access instructions.[1] This is currently not possible.
>
> It is sometimes useful to prevent the compiler from reordering
> memory-access instructions as well. The only way to do that in both
> GCC and LLVM is using a in-line assembly hack:
>    asm volatile("" ::: "memory")
>
> I propose adding two compiler scheduling barriers intrinsics to LLVM:
> __schedule_barrier_memory and __schedule_barrier_full. The former only
> prevents memory-access instructions reordering around the instruction
> and the latter stops all. So that __isb, for example, can be
> implemented something like:
>    inline void __isb() {
>      __schedule_barrier_full();
>      __builtin_arm_isb();
>      __schedule_barrier_full();
>    }Given your examples are in C, I want to ask a clarification question.  
Are you proposing adding such intrinsics to the LLVM IR? Or to some 
runtime library?  If the later, *specifically* which one? Or at the 
MachineInst layer?

I'm going to run under the assumption you're using C pseudo code for 
IR.  If this is not the case, the rest of this will be off base.

I'm not familiar with the exact semantics of an "isb" barrier, but
I
think you should look at the existing fence IR instructions.  These 
restrict memory reorderings in the IR.  Depending on the platform, they 
may imply hardware barriers, but they always imply compiler barriers.

If all you want is a compiler barrier with the existing fence semantics 
w.r.t. reordering, we could consider extending fence with a "compiler 
only" (bikeshed needed!) attribute.

If you're describing a new memory ordering for existing fences, that 
would seem like a reasonable extension.

I'm not familiar with how we currently handle intrinsics for 
architecture specific memory barriers.  Can anyone else comment on 
that?  Is there a way to tag a particular intrinsic function as *also* 
being a full fence?
>
> To implement these intrinsics, I think the best method is to add
> target-independent pseudo-instructions with appropriate
> properties(hasSideEffects for memory barrier and isTerminator for full
> barrier) and a pseudo-instruction elimination pass after the
> scheduling pass.Why would your barrier need to be a basic block terminator?  That 
doesn't parse for me.  Could you explain?>
> What do people think of this idea?I'm honestly unclear on what your problem is and what you're trying to 
propose.  It make take a few rounds of conversation to clarify.

Philip

Yi Kong

2014-Jun-27 13:19 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On 24 June 2014 01:55, Philip Reames <listmail at philipreames.com>
wrote:>
> On 06/19/2014 09:35 AM, Yi Kong wrote:
>>
>> Hi all,
>>
>> I'm currently working on implementing ACLE extensions for ARM.
There
>> are some memory barrier intrinsics, i.e.__dsb and __isb that require
>> the compiler not to reorder instructions around their corresponding
>> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
>> non-memory-access instructions.[1] This is currently not possible.
>>
>> It is sometimes useful to prevent the compiler from reordering
>> memory-access instructions as well. The only way to do that in both
>> GCC and LLVM is using a in-line assembly hack:
>>    asm volatile("" ::: "memory")
>>
>> I propose adding two compiler scheduling barriers intrinsics to LLVM:
>> __schedule_barrier_memory and __schedule_barrier_full. The former only
>> prevents memory-access instructions reordering around the instruction
>> and the latter stops all. So that __isb, for example, can be
>> implemented something like:
>>    inline void __isb() {
>>      __schedule_barrier_full();
>>      __builtin_arm_isb();
>>      __schedule_barrier_full();
>>    }
>
> Given your examples are in C, I want to ask a clarification question.  Are
> you proposing adding such intrinsics to the LLVM IR? Or to some runtime
> library?  If the later, *specifically* which one? Or at the MachineInst
> layer?
>
> I'm going to run under the assumption you're using C pseudo code
for IR.  If
> this is not the case, the rest of this will be off base.
Yes, IR.
> I'm not familiar with the exact semantics of an "isb"
barrier, but I think
> you should look at the existing fence IR instructions.  These restrict
> memory reorderings in the IR.  Depending on the platform, they may imply
> hardware barriers, but they always imply compiler barriers.
>
> If all you want is a compiler barrier with the existing fence semantics
> w.r.t. reordering, we could consider extending fence with a "compiler
only"
> (bikeshed needed!) attribute.
AFAIK, there isn't an existing fence strong enough for the memory
barrier intrinsics. The current strongest fence still allows
register-register data-processing instructions reordering across. For
DSB and ISB, no instruction should be allowed.
> If you're describing a new memory ordering for existing fences, that
would
> seem like a reasonable extension.
>
> I'm not familiar with how we currently handle intrinsics for
architecture
> specific memory barriers.  Can anyone else comment on that?  Is there a way
> to tag a particular intrinsic function as *also* being a full fence?
I'm interested in this as well.
>> To implement these intrinsics, I think the best method is to add
>> target-independent pseudo-instructions with appropriate
>> properties(hasSideEffects for memory barrier and isTerminator for full
>> barrier) and a pseudo-instruction elimination pass after the
>> scheduling pass.
>
> Why would your barrier need to be a basic block terminator?  That
doesn't
> parse for me.  Could you explain?
Compiler shouldn't allow instructions to be reordered between basic
blocks. By implementing as a basic block terminator, it will stop any
instruction from reordering.

I'm not very familiar with LLVM, can you propose the correct way of
implementing it?
>> What do people think of this idea?
>
> I'm honestly unclear on what your problem is and what you're trying
to
> propose.  It make take a few rounds of conversation to clarify.
>
> Philip

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jun 2014 - [LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

Seemingly Similar Threads