thr3ads.net - llvm dev - [LLVMdev] [RFC] Add compiler scheduling barriers [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Yi Kong

2014-Jun-27 13:19 UTC

[LLVMdev] [RFC] Add compiler scheduling barriers

On 24 June 2014 01:55, Philip Reames <listmail at philipreames.com>
wrote:>
> On 06/19/2014 09:35 AM, Yi Kong wrote:
>>
>> Hi all,
>>
>> I'm currently working on implementing ACLE extensions for ARM.
There
>> are some memory barrier intrinsics, i.e.__dsb and __isb that require
>> the compiler not to reorder instructions around their corresponding
>> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
>> non-memory-access instructions.[1] This is currently not possible.
>>
>> It is sometimes useful to prevent the compiler from reordering
>> memory-access instructions as well. The only way to do that in both
>> GCC and LLVM is using a in-line assembly hack:
>>    asm volatile("" ::: "memory")
>>
>> I propose adding two compiler scheduling barriers intrinsics to LLVM:
>> __schedule_barrier_memory and __schedule_barrier_full. The former only
>> prevents memory-access instructions reordering around the instruction
>> and the latter stops all. So that __isb, for example, can be
>> implemented something like:
>>    inline void __isb() {
>>      __schedule_barrier_full();
>>      __builtin_arm_isb();
>>      __schedule_barrier_full();
>>    }
>
> Given your examples are in C, I want to ask a clarification question.  Are
> you proposing adding such intrinsics to the LLVM IR? Or to some runtime
> library?  If the later, *specifically* which one? Or at the MachineInst
> layer?
>
> I'm going to run under the assumption you're using C pseudo code
for IR.  If
> this is not the case, the rest of this will be off base.
Yes, IR.
> I'm not familiar with the exact semantics of an "isb"
barrier, but I think
> you should look at the existing fence IR instructions.  These restrict
> memory reorderings in the IR.  Depending on the platform, they may imply
> hardware barriers, but they always imply compiler barriers.
>
> If all you want is a compiler barrier with the existing fence semantics
> w.r.t. reordering, we could consider extending fence with a "compiler
only"
> (bikeshed needed!) attribute.
AFAIK, there isn't an existing fence strong enough for the memory
barrier intrinsics. The current strongest fence still allows
register-register data-processing instructions reordering across. For
DSB and ISB, no instruction should be allowed.
> If you're describing a new memory ordering for existing fences, that
would
> seem like a reasonable extension.
>
> I'm not familiar with how we currently handle intrinsics for
architecture
> specific memory barriers.  Can anyone else comment on that?  Is there a way
> to tag a particular intrinsic function as *also* being a full fence?
I'm interested in this as well.
>> To implement these intrinsics, I think the best method is to add
>> target-independent pseudo-instructions with appropriate
>> properties(hasSideEffects for memory barrier and isTerminator for full
>> barrier) and a pseudo-instruction elimination pass after the
>> scheduling pass.
>
> Why would your barrier need to be a basic block terminator?  That
doesn't
> parse for me.  Could you explain?
Compiler shouldn't allow instructions to be reordered between basic
blocks. By implementing as a basic block terminator, it will stop any
instruction from reordering.

I'm not very familiar with LLVM, can you propose the correct way of
implementing it?
>> What do people think of this idea?
>
> I'm honestly unclear on what your problem is and what you're trying
to
> propose.  It make take a few rounds of conversation to clarify.
>
> Philip

Tim Northover

2014-Jun-27 14:01 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

Hi Yi,
>> If all you want is a compiler barrier with the existing fence semantics
>> w.r.t. reordering, we could consider extending fence with a
"compiler only"
>> (bikeshed needed!) attribute.
>
> AFAIK, there isn't an existing fence strong enough for the memory
> barrier intrinsics. The current strongest fence still allows
> register-register data-processing instructions reordering across. For
> DSB and ISB, no instruction should be allowed.
I disagree. DSB and ISB may need to be insulated against reordering
with other instructions with unmodelled side-effects (e.g. MSR/MRS),
but they don't care about a normal instructions. There's no reason the
compiler shouldn't be free to move an arithmetic add around them, for
example. Fortunately, I think LLVM already handles this.

The main argument I'm aware of in favour of stronger barriers is for
things like performance counters where the intervening instructions
are sometimes critical to what's being measured. I'm not entirely
convinced by that either, since I don't think the compiler can ever
provide the needed guarantees anyway.
>> Why would your barrier need to be a basic block terminator?  That
doesn't
>> parse for me.  Could you explain?
>
> Compiler shouldn't allow instructions to be reordered between basic
> blocks. By implementing as a basic block terminator, it will stop any
> instruction from reordering.
It should, and does. The idealised view is that basic blocks should be
no extra barrier at all to reordering. Of course, we don't reach that
because the control-flow is harder to analyse so it's simply easier to
do optimisations within a basic block.

But you shouldn't be taking that as a point of principle when
designing anything.

Cheers.

Tim.

Philip Reames

2014-Jun-27 16:34 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

On 06/27/2014 06:19 AM, Yi Kong wrote:> On 24 June 2014 01:55, Philip Reames <listmail at philipreames.com>
wrote:
>> On 06/19/2014 09:35 AM, Yi Kong wrote:
>>> Hi all,
>>>
>>> I'm currently working on implementing ACLE extensions for ARM.
There
>>> are some memory barrier intrinsics, i.e.__dsb and __isb that
require
>>> the compiler not to reorder instructions around their corresponding
>>> built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb),
including
>>> non-memory-access instructions.[1] This is currently not possible.
>>>
>>> It is sometimes useful to prevent the compiler from reordering
>>> memory-access instructions as well. The only way to do that in both
>>> GCC and LLVM is using a in-line assembly hack:
>>>     asm volatile("" ::: "memory")
>>>
>>> I propose adding two compiler scheduling barriers intrinsics to
LLVM:
>>> __schedule_barrier_memory and __schedule_barrier_full. The former
only
>>> prevents memory-access instructions reordering around the
instruction
>>> and the latter stops all. So that __isb, for example, can be
>>> implemented something like:
>>>     inline void __isb() {
>>>       __schedule_barrier_full();
>>>       __builtin_arm_isb();
>>>       __schedule_barrier_full();
>>>     }
>> Given your examples are in C, I want to ask a clarification question. 
Are
>> you proposing adding such intrinsics to the LLVM IR? Or to some runtime
>> library?  If the later, *specifically* which one? Or at the MachineInst
>> layer?
>>
>> I'm going to run under the assumption you're using C pseudo
code for IR.  If
>> this is not the case, the rest of this will be off base.
> Yes, IR.
>
>> I'm not familiar with the exact semantics of an "isb"
barrier, but I think
>> you should look at the existing fence IR instructions.  These restrict
>> memory reorderings in the IR.  Depending on the platform, they may
imply
>> hardware barriers, but they always imply compiler barriers.
>>
>> If all you want is a compiler barrier with the existing fence semantics
>> w.r.t. reordering, we could consider extending fence with a
"compiler only"
>> (bikeshed needed!) attribute.
> AFAIK, there isn't an existing fence strong enough for the memory
> barrier intrinsics. The current strongest fence still allows
> register-register data-processing instructions reordering across. For
> DSB and ISB, no instruction should be allowed. From what I can read about the ISB barrier, this is untrue (in it's 
full generality.)  There are certainly some instructions which can't be 
moved past an ISB (memory accesses, cpu control instructions.) It's not 
clear to me that *all* instruction reorderings need to be blocked.

Aside: From the documentation, it's actually unclear how strong a 
barrier ISB actually is.  The "fetched after the ISB" gives a lot of 
room for interpretation.  (i.e. is it legal for a cpu to fetch every 
instruction in a function, queue them for execution under wait 
conditions, *then* fetch the ISB?  This would be a *really* weird 
implementation, but would it be legal according to this spec?  If so, 
the ISB provides *no* guarantees.)

Reference:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html>
>> If you're describing a new memory ordering for existing fences,
that would
>> seem like a reasonable extension.
>>
>> I'm not familiar with how we currently handle intrinsics for
architecture
>> specific memory barriers.  Can anyone else comment on that?  Is there a
way
>> to tag a particular intrinsic function as *also* being a full fence?
> I'm interested in this as well.
>
>>> To implement these intrinsics, I think the best method is to add
>>> target-independent pseudo-instructions with appropriate
>>> properties(hasSideEffects for memory barrier and isTerminator for
full
>>> barrier) and a pseudo-instruction elimination pass after the
>>> scheduling pass.
>> Why would your barrier need to be a basic block terminator?  That
doesn't
>> parse for me.  Could you explain?
> Compiler shouldn't allow instructions to be reordered between basic
> blocks.Er, no.  Any decent optimizer compiler can, should, and
does.> By implementing as a basic block terminator, it will stop any
> instruction from reordering.
>
> I'm not very familiar with LLVM, can you propose the correct way of
> implementing it?This seems like a fundamentally flawed approach.  Conceptually, there 
should be no difference between:
bb: inst1; inst2;
and
bb: inst1; goto next
next: inst2;

If there is, you've created a nearly impossible to optimize IR.

Given a strong enough barrier, you also don't need such a construct.  
These two will always have the same semantics:
bb: inst1; barrier; inst2;
and
bb: inst1; barrier; goto next;
next: inst2

I would suggest you simply drop the bit about being a
terminator.>
>>> What do people think of this idea?
>> I'm honestly unclear on what your problem is and what you're
trying to
>> propose.  It make take a few rounds of conversation to clarify.
>>
>> Philip

Tim Northover

2014-Jun-27 16:46 UTC

head link

[LLVMdev] [RFC] Add compiler scheduling barriers

Hi Philip,
> Aside: From the documentation, it's actually unclear how strong a
barrier
> ISB actually is.  The "fetched after the ISB" gives a lot of room
for
> interpretation.  (i.e. is it legal for a cpu to fetch every instruction in
a
> function, queue them for execution under wait conditions, *then* fetch the
> ISB?  This would be a *really* weird implementation, but would it be legal
> according to this spec?  If so, the ISB provides *no* guarantees.)
>
> Reference:
>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html
You've actually linked to the assembler (as in "armasm") reference
manual there. The architectural documentation of these barriers is
better, though still not airtight in my opinion. From section A3.8.3,
the bit about "program order" is new, and it's after the ISB has
*completed*:

"An ISB instruction flushes the pipeline in the processor, so that all
instructions that come after the ISB instruction in program order are
fetched from cache or memory only after the ISB instruction has
completed."

It's behind a registration wall, but the v7 version can be had as a
.pdf here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html

Cheers.

Tim.

llvm dev - Jun 2014 - [LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers

[LLVMdev] [RFC] Add compiler scheduling barriers