thr3ads.net - llvm dev - [llvm-dev] RFC: non-temporal fencing in LLVM IR [Jan 2016]

If this information is useful, please help other people find it:
Share via:

JF Bastien via llvm-dev

2016-Jan-14 21:13 UTC

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I agree with Tim's assessment for ARM.  That's interesting; I
wasn't
>> previously aware of that instruction.
>>
>> My understanding is that Alpha would have the same problem for normal
>> loads.
>>
>> I'm all in favor of more systematic handling of the fences
associated
>> with x86 non-temporal accesses.
>>
>> AFAICT, nontemporal loads and stores seem to have different fencing
rules
>> on x86, none of them very clear.  Nontemporal stores should probably
>> ideally use an SFENCE.  Locked instructions seem to be documented to
work
>> with MOVNTDQA.  In both cases, there seems to be only empirical
evidence as
>> to which side(s) of the nontemporal operations they should go on?
>>
>> I finally decided that I was OK with using a LOCKed top-of-stack update
>> as a fence in Java on x86.  I'm significantly less enthusiastic for
C++.  I
>> also think that risks unexpected coherence miss problems, though they
would
>> probably be very rare.  But they would be very surprising if they did
occur.
>>
>
> Today's LLVM already emits 'lock or %eax, (%esp)' for
'fence
> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
when
> targeting 32-bit x86 machines which do not support mfence.  What
> instruction sequence should we be using instead?
>
Do they have non-temporal accesses in the ISA?


On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at
gmail.com>>> wrote:
>>
>>> > I haven't touched ARMv8 in a few years so I'm rusty on
the non-temporal
>>> > details for that ISA. I lifted this example from here:
>>> >
>>> >
>>>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html
>>> >
>>> > Which is correct?
>>>
>>> FWIW, I agree with John here. The example I'd give for the
unexpected
>>> behaviour allowed in the spec is:
>>>
>>> .Lwait_for_data:
>>>     ldr x0, [x3]
>>>     cbz x0, .Lwait_for_data
>>>     ldnp x2, x1, [x0]
>>>
>>> where another thread first writes to a buffer then tells us where
that
>>> buffer is. For a normal ldp, the address dependency rule means we
>>> don't need a barrier or acquiring load to ensure we see the
real data
>>> in the buffer. For ldnp, we would need a barrier to prevent stale
>>> data.
>>>
>>> I suspect this is actually even closer to the x86 situation than
what
>>> the guide implies (which looks like a straight-up exposed pipeline
to
>>> me, beyond even what Alpha would have done).
>>>
>>> Cheers.
>>>
>>> Tim.
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/9e37b3ce/attachment.html>

David Majnemer via llvm-dev

2016-Jan-14 21:35 UTC

head link

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com> wrote:
> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> I agree with Tim's assessment for ARM.  That's interesting;
I wasn't
>>> previously aware of that instruction.
>>>
>>> My understanding is that Alpha would have the same problem for
normal
>>> loads.
>>>
>>> I'm all in favor of more systematic handling of the fences
associated
>>> with x86 non-temporal accesses.
>>>
>>> AFAICT, nontemporal loads and stores seem to have different fencing
>>> rules on x86, none of them very clear.  Nontemporal stores should
probably
>>> ideally use an SFENCE.  Locked instructions seem to be documented
to work
>>> with MOVNTDQA.  In both cases, there seems to be only empirical
evidence as
>>> to which side(s) of the nontemporal operations they should go on?
>>>
>>> I finally decided that I was OK with using a LOCKed top-of-stack
update
>>> as a fence in Java on x86.  I'm significantly less enthusiastic
for C++.  I
>>> also think that risks unexpected coherence miss problems, though
they would
>>> probably be very rare.  But they would be very surprising if they
did occur.
>>>
>>
>> Today's LLVM already emits 'lock or %eax, (%esp)' for
'fence
>> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
when
>> targeting 32-bit x86 machines which do not support mfence.  What
>> instruction sequence should we be using instead?
>>
>
> Do they have non-temporal accesses in the ISA?
>
I thought not but there appear to be instructions like movntps.  mfence was
introduced in SSE2 while movntps and sfence were introduced in SSE.

>
>
> On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at
gmail.com>
>>> wrote:
>>>
>>>> > I haven't touched ARMv8 in a few years so I'm
rusty on the
>>>> non-temporal
>>>> > details for that ISA. I lifted this example from here:
>>>> >
>>>> >
>>>>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html
>>>> >
>>>> > Which is correct?
>>>>
>>>> FWIW, I agree with John here. The example I'd give for the
unexpected
>>>> behaviour allowed in the spec is:
>>>>
>>>> .Lwait_for_data:
>>>>     ldr x0, [x3]
>>>>     cbz x0, .Lwait_for_data
>>>>     ldnp x2, x1, [x0]
>>>>
>>>> where another thread first writes to a buffer then tells us
where that
>>>> buffer is. For a normal ldp, the address dependency rule means
we
>>>> don't need a barrier or acquiring load to ensure we see the
real data
>>>> in the buffer. For ldnp, we would need a barrier to prevent
stale
>>>> data.
>>>>
>>>> I suspect this is actually even closer to the x86 situation
than what
>>>> the guide implies (which looks like a straight-up exposed
pipeline to
>>>> me, beyond even what Alpha would have done).
>>>>
>>>> Cheers.
>>>>
>>>> Tim.
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/a2dd815e/attachment.html>

JF Bastien via llvm-dev

2016-Jan-14 21:37 UTC

head link

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer <david.majnemer at
gmail.com>
wrote:
>
>
> On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com>
wrote:
>
>> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> I agree with Tim's assessment for ARM.  That's
interesting; I wasn't
>>>> previously aware of that instruction.
>>>>
>>>> My understanding is that Alpha would have the same problem for
normal
>>>> loads.
>>>>
>>>> I'm all in favor of more systematic handling of the fences
associated
>>>> with x86 non-temporal accesses.
>>>>
>>>> AFAICT, nontemporal loads and stores seem to have different
fencing
>>>> rules on x86, none of them very clear.  Nontemporal stores
should probably
>>>> ideally use an SFENCE.  Locked instructions seem to be
documented to work
>>>> with MOVNTDQA.  In both cases, there seems to be only empirical
evidence as
>>>> to which side(s) of the nontemporal operations they should go
on?
>>>>
>>>> I finally decided that I was OK with using a LOCKed
top-of-stack update
>>>> as a fence in Java on x86.  I'm significantly less
enthusiastic for C++.  I
>>>> also think that risks unexpected coherence miss problems,
though they would
>>>> probably be very rare.  But they would be very surprising if
they did occur.
>>>>
>>>
>>> Today's LLVM already emits 'lock or %eax, (%esp)' for
'fence
>>>
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when
>>> targeting 32-bit x86 machines which do not support mfence.  What
>>> instruction sequence should we be using instead?
>>>
>>
>> Do they have non-temporal accesses in the ISA?
>>
>
> I thought not but there appear to be instructions like movntps.  mfence
> was introduced in SSE2 while movntps and sfence were introduced in SSE.
>
So the new builtin could be sfence? I think the codegen you point out for
SEQ_CST is fine if we fix the memory model as suggested.


On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at
gmail.com>>>>> wrote:
>>>>
>>>>> > I haven't touched ARMv8 in a few years so I'm
rusty on the
>>>>> non-temporal
>>>>> > details for that ISA. I lifted this example from here:
>>>>> >
>>>>> >
>>>>>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html
>>>>> >
>>>>> > Which is correct?
>>>>>
>>>>> FWIW, I agree with John here. The example I'd give for
the unexpected
>>>>> behaviour allowed in the spec is:
>>>>>
>>>>> .Lwait_for_data:
>>>>>     ldr x0, [x3]
>>>>>     cbz x0, .Lwait_for_data
>>>>>     ldnp x2, x1, [x0]
>>>>>
>>>>> where another thread first writes to a buffer then tells us
where that
>>>>> buffer is. For a normal ldp, the address dependency rule
means we
>>>>> don't need a barrier or acquiring load to ensure we see
the real data
>>>>> in the buffer. For ldnp, we would need a barrier to prevent
stale
>>>>> data.
>>>>>
>>>>> I suspect this is actually even closer to the x86 situation
than what
>>>>> the guide implies (which looks like a straight-up exposed
pipeline to
>>>>> me, beyond even what Alpha would have done).
>>>>>
>>>>> Cheers.
>>>>>
>>>>> Tim.
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/81f513e2/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2016 - RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

Possibly Parallel Threads