JF Bastien via llvm-dev
2016-Jan-14 21:13 UTC
[llvm-dev] RFC: non-temporal fencing in LLVM IR
On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> I agree with Tim's assessment for ARM. That's interesting; I wasn't >> previously aware of that instruction. >> >> My understanding is that Alpha would have the same problem for normal >> loads. >> >> I'm all in favor of more systematic handling of the fences associated >> with x86 non-temporal accesses. >> >> AFAICT, nontemporal loads and stores seem to have different fencing rules >> on x86, none of them very clear. Nontemporal stores should probably >> ideally use an SFENCE. Locked instructions seem to be documented to work >> with MOVNTDQA. In both cases, there seems to be only empirical evidence as >> to which side(s) of the nontemporal operations they should go on? >> >> I finally decided that I was OK with using a LOCKed top-of-stack update >> as a fence in Java on x86. I'm significantly less enthusiastic for C++. I >> also think that risks unexpected coherence miss problems, though they would >> probably be very rare. But they would be very surprising if they did occur. >> > > Today's LLVM already emits 'lock or %eax, (%esp)' for 'fence > seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when > targeting 32-bit x86 machines which do not support mfence. What > instruction sequence should we be using instead? >Do they have non-temporal accesses in the ISA? On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at gmail.com>>> wrote: >> >>> > I haven't touched ARMv8 in a few years so I'm rusty on the non-temporal >>> > details for that ISA. I lifted this example from here: >>> > >>> > >>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html >>> > >>> > Which is correct? >>> >>> FWIW, I agree with John here. The example I'd give for the unexpected >>> behaviour allowed in the spec is: >>> >>> .Lwait_for_data: >>> ldr x0, [x3] >>> cbz x0, .Lwait_for_data >>> ldnp x2, x1, [x0] >>> >>> where another thread first writes to a buffer then tells us where that >>> buffer is. For a normal ldp, the address dependency rule means we >>> don't need a barrier or acquiring load to ensure we see the real data >>> in the buffer. For ldnp, we would need a barrier to prevent stale >>> data. >>> >>> I suspect this is actually even closer to the x86 situation than what >>> the guide implies (which looks like a straight-up exposed pipeline to >>> me, beyond even what Alpha would have done). >>> >>> Cheers. >>> >>> Tim. >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/9e37b3ce/attachment.html>
David Majnemer via llvm-dev
2016-Jan-14 21:35 UTC
[llvm-dev] RFC: non-temporal fencing in LLVM IR
On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com> wrote:> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> >> >> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> I agree with Tim's assessment for ARM. That's interesting; I wasn't >>> previously aware of that instruction. >>> >>> My understanding is that Alpha would have the same problem for normal >>> loads. >>> >>> I'm all in favor of more systematic handling of the fences associated >>> with x86 non-temporal accesses. >>> >>> AFAICT, nontemporal loads and stores seem to have different fencing >>> rules on x86, none of them very clear. Nontemporal stores should probably >>> ideally use an SFENCE. Locked instructions seem to be documented to work >>> with MOVNTDQA. In both cases, there seems to be only empirical evidence as >>> to which side(s) of the nontemporal operations they should go on? >>> >>> I finally decided that I was OK with using a LOCKed top-of-stack update >>> as a fence in Java on x86. I'm significantly less enthusiastic for C++. I >>> also think that risks unexpected coherence miss problems, though they would >>> probably be very rare. But they would be very surprising if they did occur. >>> >> >> Today's LLVM already emits 'lock or %eax, (%esp)' for 'fence >> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when >> targeting 32-bit x86 machines which do not support mfence. What >> instruction sequence should we be using instead? >> > > Do they have non-temporal accesses in the ISA? >I thought not but there appear to be instructions like movntps. mfence was introduced in SSE2 while movntps and sfence were introduced in SSE.> > > On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at gmail.com> >>> wrote: >>> >>>> > I haven't touched ARMv8 in a few years so I'm rusty on the >>>> non-temporal >>>> > details for that ISA. I lifted this example from here: >>>> > >>>> > >>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html >>>> > >>>> > Which is correct? >>>> >>>> FWIW, I agree with John here. The example I'd give for the unexpected >>>> behaviour allowed in the spec is: >>>> >>>> .Lwait_for_data: >>>> ldr x0, [x3] >>>> cbz x0, .Lwait_for_data >>>> ldnp x2, x1, [x0] >>>> >>>> where another thread first writes to a buffer then tells us where that >>>> buffer is. For a normal ldp, the address dependency rule means we >>>> don't need a barrier or acquiring load to ensure we see the real data >>>> in the buffer. For ldnp, we would need a barrier to prevent stale >>>> data. >>>> >>>> I suspect this is actually even closer to the x86 situation than what >>>> the guide implies (which looks like a straight-up exposed pipeline to >>>> me, beyond even what Alpha would have done). >>>> >>>> Cheers. >>>> >>>> Tim. >>>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/a2dd815e/attachment.html>
JF Bastien via llvm-dev
2016-Jan-14 21:37 UTC
[llvm-dev] RFC: non-temporal fencing in LLVM IR
On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer <david.majnemer at gmail.com> wrote:> > > On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com> wrote: > >> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> >>> >>> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> I agree with Tim's assessment for ARM. That's interesting; I wasn't >>>> previously aware of that instruction. >>>> >>>> My understanding is that Alpha would have the same problem for normal >>>> loads. >>>> >>>> I'm all in favor of more systematic handling of the fences associated >>>> with x86 non-temporal accesses. >>>> >>>> AFAICT, nontemporal loads and stores seem to have different fencing >>>> rules on x86, none of them very clear. Nontemporal stores should probably >>>> ideally use an SFENCE. Locked instructions seem to be documented to work >>>> with MOVNTDQA. In both cases, there seems to be only empirical evidence as >>>> to which side(s) of the nontemporal operations they should go on? >>>> >>>> I finally decided that I was OK with using a LOCKed top-of-stack update >>>> as a fence in Java on x86. I'm significantly less enthusiastic for C++. I >>>> also think that risks unexpected coherence miss problems, though they would >>>> probably be very rare. But they would be very surprising if they did occur. >>>> >>> >>> Today's LLVM already emits 'lock or %eax, (%esp)' for 'fence >>> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when >>> targeting 32-bit x86 machines which do not support mfence. What >>> instruction sequence should we be using instead? >>> >> >> Do they have non-temporal accesses in the ISA? >> > > I thought not but there appear to be instructions like movntps. mfence > was introduced in SSE2 while movntps and sfence were introduced in SSE. >So the new builtin could be sfence? I think the codegen you point out for SEQ_CST is fine if we fix the memory model as suggested. On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at gmail.com>>>>> wrote: >>>> >>>>> > I haven't touched ARMv8 in a few years so I'm rusty on the >>>>> non-temporal >>>>> > details for that ISA. I lifted this example from here: >>>>> > >>>>> > >>>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html >>>>> > >>>>> > Which is correct? >>>>> >>>>> FWIW, I agree with John here. The example I'd give for the unexpected >>>>> behaviour allowed in the spec is: >>>>> >>>>> .Lwait_for_data: >>>>> ldr x0, [x3] >>>>> cbz x0, .Lwait_for_data >>>>> ldnp x2, x1, [x0] >>>>> >>>>> where another thread first writes to a buffer then tells us where that >>>>> buffer is. For a normal ldp, the address dependency rule means we >>>>> don't need a barrier or acquiring load to ensure we see the real data >>>>> in the buffer. For ldnp, we would need a barrier to prevent stale >>>>> data. >>>>> >>>>> I suspect this is actually even closer to the x86 situation than what >>>>> the guide implies (which looks like a straight-up exposed pipeline to >>>>> me, beyond even what Alpha would have done). >>>>> >>>>> Cheers. >>>>> >>>>> Tim. >>>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/81f513e2/attachment.html>