thr3ads.net - llvm dev - [llvm-dev] RFC: non-temporal fencing in LLVM IR [Jan 2016]

If this information is useful, please help other people find it:
Share via:

JF Bastien via llvm-dev

2016-Jan-14 21:37 UTC

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer <david.majnemer at
gmail.com>
wrote:
>
>
> On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com>
wrote:
>
>> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> I agree with Tim's assessment for ARM.  That's
interesting; I wasn't
>>>> previously aware of that instruction.
>>>>
>>>> My understanding is that Alpha would have the same problem for
normal
>>>> loads.
>>>>
>>>> I'm all in favor of more systematic handling of the fences
associated
>>>> with x86 non-temporal accesses.
>>>>
>>>> AFAICT, nontemporal loads and stores seem to have different
fencing
>>>> rules on x86, none of them very clear.  Nontemporal stores
should probably
>>>> ideally use an SFENCE.  Locked instructions seem to be
documented to work
>>>> with MOVNTDQA.  In both cases, there seems to be only empirical
evidence as
>>>> to which side(s) of the nontemporal operations they should go
on?
>>>>
>>>> I finally decided that I was OK with using a LOCKed
top-of-stack update
>>>> as a fence in Java on x86.  I'm significantly less
enthusiastic for C++.  I
>>>> also think that risks unexpected coherence miss problems,
though they would
>>>> probably be very rare.  But they would be very surprising if
they did occur.
>>>>
>>>
>>> Today's LLVM already emits 'lock or %eax, (%esp)' for
'fence
>>>
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when
>>> targeting 32-bit x86 machines which do not support mfence.  What
>>> instruction sequence should we be using instead?
>>>
>>
>> Do they have non-temporal accesses in the ISA?
>>
>
> I thought not but there appear to be instructions like movntps.  mfence
> was introduced in SSE2 while movntps and sfence were introduced in SSE.
>
So the new builtin could be sfence? I think the codegen you point out for
SEQ_CST is fine if we fix the memory model as suggested.


On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at
gmail.com>>>>> wrote:
>>>>
>>>>> > I haven't touched ARMv8 in a few years so I'm
rusty on the
>>>>> non-temporal
>>>>> > details for that ISA. I lifted this example from here:
>>>>> >
>>>>> >
>>>>>
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html
>>>>> >
>>>>> > Which is correct?
>>>>>
>>>>> FWIW, I agree with John here. The example I'd give for
the unexpected
>>>>> behaviour allowed in the spec is:
>>>>>
>>>>> .Lwait_for_data:
>>>>>     ldr x0, [x3]
>>>>>     cbz x0, .Lwait_for_data
>>>>>     ldnp x2, x1, [x0]
>>>>>
>>>>> where another thread first writes to a buffer then tells us
where that
>>>>> buffer is. For a normal ldp, the address dependency rule
means we
>>>>> don't need a barrier or acquiring load to ensure we see
the real data
>>>>> in the buffer. For ldnp, we would need a barrier to prevent
stale
>>>>> data.
>>>>>
>>>>> I suspect this is actually even closer to the x86 situation
than what
>>>>> the guide implies (which looks like a straight-up exposed
pipeline to
>>>>> me, beyond even what Alpha would have done).
>>>>>
>>>>> Cheers.
>>>>>
>>>>> Tim.
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/81f513e2/attachment.html>

Hans Boehm via llvm-dev

2016-Jan-15 00:05 UTC

head link

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On Thu, Jan 14, 2016 at 1:37 PM, JF Bastien <jfb at google.com> wrote:
> On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer <david.majnemer at
gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com>
wrote:
>>
>>> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> I agree with Tim's assessment for ARM.  That's
interesting; I wasn't
>>>>> previously aware of that instruction.
>>>>>
>>>>> My understanding is that Alpha would have the same problem
for normal
>>>>> loads.
>>>>>
>>>>> I'm all in favor of more systematic handling of the
fences associated
>>>>> with x86 non-temporal accesses.
>>>>>
>>>>> AFAICT, nontemporal loads and stores seem to have different
fencing
>>>>> rules on x86, none of them very clear.  Nontemporal stores
should probably
>>>>> ideally use an SFENCE.  Locked instructions seem to be
documented to work
>>>>> with MOVNTDQA.  In both cases, there seems to be only
empirical evidence as
>>>>> to which side(s) of the nontemporal operations they should
go on?
>>>>>
>>>>> I finally decided that I was OK with using a LOCKed
top-of-stack
>>>>> update as a fence in Java on x86.  I'm significantly
less enthusiastic for
>>>>> C++.  I also think that risks unexpected coherence miss
problems, though
>>>>> they would probably be very rare.  But they would be very
surprising if
>>>>> they did occur.
>>>>>
>>>>
>>>> Today's LLVM already emits 'lock or %eax, (%esp)'
for 'fence
>>>>
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when
>>>> targeting 32-bit x86 machines which do not support mfence. 
What
>>>> instruction sequence should we be using instead?
>>>>
>>>
>>> Do they have non-temporal accesses in the ISA?
>>>
>>
>> I thought not but there appear to be instructions like movntps.  mfence
>> was introduced in SSE2 while movntps and sfence were introduced in SSE.
>>
>
> So the new builtin could be sfence? I think the codegen you point out for
> SEQ_CST is fine if we fix the memory model as suggested.
>
I agree that it's fine to use a locked instruction as a seq_cst fence if
MFENCE is not available.  If you have to dirty a cache line, (%esp) seems
like relatively safe one.  (I'm assuming that CPUID is appreciably slower
and out of the running?  I haven't tried.  But it also probably clobbers
too many registers.)  It's only the idea of writing to a memory location
when MFENCE is available, and could be used instead, that seems
questionable.

What exactly would the non-temporal fences be?  It seems that on x86, the
load and store case may differ.  In theory, there's also a before vs. after
question.  In practice code using MOVNTA seems to assume that you only need
an SFENCE afterwards.  I can't back that up with spec verbiage.  I don't
know about MOVNTDQA.  What about ARM?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/ff7e6e46/attachment.html>

Philip Reames via llvm-dev

2016-Jan-15 00:27 UTC

head link

[llvm-dev] RFC: non-temporal fencing in LLVM IR

On 01/14/2016 04:05 PM, Hans Boehm via llvm-dev wrote:>
>
> On Thu, Jan 14, 2016 at 1:37 PM, JF Bastien <jfb at google.com 
> <mailto:jfb at google.com>> wrote:
>
>     On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer
>     <david.majnemer at gmail.com <mailto:david.majnemer at
gmail.com>> wrote:
>
>
>
>         On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com
>         <mailto:jfb at google.com>> wrote:
>
>             On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via
>             llvm-dev <llvm-dev at lists.llvm.org
>             <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>
>
>                 On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via
>                 llvm-dev <llvm-dev at lists.llvm.org
>                 <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>                     I agree with Tim's assessment for ARM.  That's
>                     interesting; I wasn't previously aware of that
>                     instruction.
>
>                     My understanding is that Alpha would have the same
>                     problem for normal loads.
>
>                     I'm all in favor of more systematic handling of
>                     the fences associated with x86 non-temporal accesses.
>
>                     AFAICT, nontemporal loads and stores seem to have
>                     different fencing rules on x86, none of them very
>                     clear. Nontemporal stores should probably ideally
>                     use an SFENCE. Locked instructions seem to be
>                     documented to work with MOVNTDQA.  In both cases,
>                     there seems to be only empirical evidence as to
>                     which side(s) of the nontemporal operations they
>                     should go on?
>
>                     I finally decided that I was OK with using a
>                     LOCKed top-of-stack update as a fence in Java on
>                     x86.  I'm significantly less enthusiastic for
>                     C++.  I also think that risks unexpected coherence
>                     miss problems, though they would probably be very
>                     rare. But they would be very surprising if they
>                     did occur.
>
>
>                 Today's LLVM already emits 'lock or %eax,
(%esp)' for
>                 'fence
>                
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
>                 when targeting 32-bit x86 machines which do not
>                 support mfence.  What instruction sequence should we
>                 be using instead?
>
>
>             Do they have non-temporal accesses in the ISA?
>
>
>         I thought not but there appear to be instructions
>         like movntps.  mfence was introduced in SSE2 while movntps and
>         sfence were introduced in SSE.
>
>
>     So the new builtin could be sfence? I think the codegen you point
>     out for SEQ_CST is fine if we fix the memory model as suggested.
>
>
> I agree that it's fine to use a locked instruction as a seq_cst fence 
> if MFENCE is not available.It's not clear to me this is true if the seq_cst fence is expected to 
fence non-temporal stores.  I think in practice, you'd be very unlikely 
to notice a difference, but I can't point to anything in the Intel docs 
which justifies a lock prefixed instruction as sufficient to fence any 
non-temporal access.
> If you have to dirty a cache line, (%esp) seems like relatively safe one.Agreed.  As we discussed previously, it is possible to false sharing in 
C++, but this would require one thread to be accessing information 
stored in the last frame of another running thread's stack.  That seems 
sufficiently unlikely to be ignored.
> (I'm assuming that CPUID is appreciably slower and out of the 
> running?  I haven't tried.  But it also probably clobbers too many 
> registers.)This is my belief.  I haven't actually tried this experiment, but I've 
seen no reports that CPUID is a good choice here.
> It's only the idea of writing to a memory location when MFENCE is 
> available, and could be used instead, that seems questionable.While in principal I agree, it appears in practice that this tradeoff is 
worthwhile.  The hardware doesn't seem to optimize for the MFENCE case 
whereas lock prefix instructions appear to be handled much
better.>
> What exactly would the non-temporal fences be?  It seems that on x86, 
> the load and store case may differ.  In theory, there's also a before 
> vs. after question.  In practice code using MOVNTA seems to assume 
> that you only need an SFENCE afterwards.  I can't back that up with 
> spec verbiage.  I don't know about MOVNTDQA.  What about ARM?I'll leave this to JF to answer.  I'm not knowledgeable enough about 
non-temporals to answer without substantial research
first.>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/cd9f16a8/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Jan 2016 - RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

[llvm-dev] RFC: non-temporal fencing in LLVM IR

Maybe Matching Threads