David Zarzycki via llvm-dev
2019-Sep-14 06:12 UTC
[llvm-dev] Side-channel resistant values
Hi Chandler, I feel like this conversation has come full circle. So to ask again: how does one force CMOV to be emitted? You suggested “__builtin_unpredictable()” but that gets lost in various optimization passes. Given other architectures have CMOV like instructions, and given the usefulness of the instruction for performance tuning, it seems like a direct intrinsic would be best. What am I missing? Dave> On Sep 14, 2019, at 8:35 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > > > The x86 backend is extremely aggressive in turning cmov with memory operands into branches because that is often faster even for poorly predicted branches due to the forced stall in the cmov. > >> On Fri, Sep 13, 2019 at 11:19 PM David Zarzycki <dave at znu.io> wrote: >> I’m struggling to find cases where __builtin_unpredictable() works at all. Even if we ignore cmp/br into switch conversion, it still doesn’t work: >> >> int test_cmov(int left, int right, int *alt) { >> return __builtin_unpredictable(left < right) ? *alt : 999; >> } >> >> Should generate: >> >> test_cmov: >> movl $999, %eax >> cmpl %esi, %edi >> cmovll (%rdx), %eax >> retq >> >> But currently generates: >> >> test_cmov: >> movl $999, %eax >> cmpl %esi, %edi >> jge .LBB0_2 >> movl (%rdx), %eax >> .LBB0_2: >> retq >> >> >> >> > On Sep 14, 2019, at 12:18 AM, Sanjay Patel <spatel at rotateright.com> wrote: >> > >> > I'm not sure if this is the entire problem, but SimplifyCFG loses the 'unpredictable' metadata when it converts a set of cmp/br into a switch: >> > https://godbolt.org/z/neLzN3 >> > >> > Filed here: >> > https://bugs.llvm.org/show_bug.cgi?id=43313 >> > >> > On Fri, Sep 13, 2019 at 4:02 AM David Zarzycki via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> > >> > >> >> On Sep 13, 2019, at 10:45 AM, Chandler Carruth <chandlerc at gmail.com> wrote: >> >> >> >> On Fri, Sep 13, 2019 at 1:33 AM David Zarzycki via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi Chandler, >> >> >> >> The data-invariant feature sounds great but what about the general case? When performance tuning code, people sometimes need the ability to reliably generate CMOV, and right now the best advice is either “use inline assembly” or “keep refactoring until CMOV is emited” (and hope that future compilers continue to generate CMOV). >> >> >> >> Given that a patch already exists to reliably generate CMOV, are there any good arguments against adding the feature? >> >> >> >> For *performance* tuning, the builtin that Hal mentioned is IMO the correct design. >> >> >> >> Is there some reason why it doesn't work? >> > >> > I wasn’t aware of __builtin_unpredictable() until now and I haven’t debugged why it doesn’t work, but here are a couple examples, one using the ternary operator and one using a switch statement: >> > >> > https://godbolt.org/z/S46I_q >> > >> > Dave >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190914/661856f0/attachment.html>
Chandler Carruth via llvm-dev
2019-Sep-14 06:18 UTC
[llvm-dev] Side-channel resistant values
On Sat, Sep 14, 2019 at 12:12 AM David Zarzycki <dave at znu.io> wrote:> Hi Chandler, > > I feel like this conversation has come full circle. So to ask again: how > does one force CMOV to be emitted? You suggested > “__builtin_unpredictable()” but that gets lost in various optimization > passes. Given other architectures have CMOV like instructions, and given > the usefulness of the instruction for performance tuning, it seems like a > direct intrinsic would be best. What am I missing? >LLVM operates at a higher level of abstraction IMO, so I don't really feel like there is something missing here. LLVM is just choosing a lowering that is expected to be superior even in the face of an unpredictable branch. If there are real world benchmarks that show it this lowering strategy is a problem, file bugs with those benchmarks? We can always change the heuristics based on new information. I think if you want to force a particular instruction to be used, there is already a pretty reasonable approach: inline assembly.> > Dave > > On Sep 14, 2019, at 8:35 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > > > The x86 backend is extremely aggressive in turning cmov with memory > operands into branches because that is often faster even for poorly > predicted branches due to the forced stall in the cmov. > > On Fri, Sep 13, 2019 at 11:19 PM David Zarzycki <dave at znu.io> wrote: > >> I’m struggling to find cases where __builtin_unpredictable() works at >> all. Even if we ignore cmp/br into switch conversion, it still doesn’t work: >> >> int test_cmov(int left, int right, int *alt) { >> return __builtin_unpredictable(left < right) ? *alt : 999; >> } >> >> Should generate: >> >> test_cmov: >> movl $999, %eax >> cmpl %esi, %edi >> cmovll (%rdx), %eax >> retq >> >> But currently generates: >> >> test_cmov: >> movl $999, %eax >> cmpl %esi, %edi >> jge .LBB0_2 >> movl (%rdx), %eax >> .LBB0_2: >> retq >> >> >> >> > On Sep 14, 2019, at 12:18 AM, Sanjay Patel <spatel at rotateright.com> >> wrote: >> > >> > I'm not sure if this is the entire problem, but SimplifyCFG loses the >> 'unpredictable' metadata when it converts a set of cmp/br into a switch: >> > https://godbolt.org/z/neLzN3 >> > >> > Filed here: >> > https://bugs.llvm.org/show_bug.cgi?id=43313 >> > >> > On Fri, Sep 13, 2019 at 4:02 AM David Zarzycki via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> > >> > >> >> On Sep 13, 2019, at 10:45 AM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >> >> >> On Fri, Sep 13, 2019 at 1:33 AM David Zarzycki via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> Hi Chandler, >> >> >> >> The data-invariant feature sounds great but what about the general >> case? When performance tuning code, people sometimes need the ability to >> reliably generate CMOV, and right now the best advice is either “use inline >> assembly” or “keep refactoring until CMOV is emited” (and hope that future >> compilers continue to generate CMOV). >> >> >> >> Given that a patch already exists to reliably generate CMOV, are there >> any good arguments against adding the feature? >> >> >> >> For *performance* tuning, the builtin that Hal mentioned is IMO the >> correct design. >> >> >> >> Is there some reason why it doesn't work? >> > >> > I wasn’t aware of __builtin_unpredictable() until now and I haven’t >> debugged why it doesn’t work, but here are a couple examples, one using the >> ternary operator and one using a switch statement: >> > >> > https://godbolt.org/z/S46I_q >> > >> > Dave >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190914/e0070f61/attachment.html>
To confirm with Dave's minimal example, the metadata does survive the IR optimizer in that case: $ clang -O2 unpred.c -S -o - -emit-llvm | grep unpredictable br i1 %cmp, label %cond.true, label %cond.end, !unpredictable !3 So yes, it's the backend (x86 cmov conversion pass) that doesn't have access to the metadata and transforms to branch. This was discussed here with some compelling perf claims: https://bugs.llvm.org/show_bug.cgi?id=40027 And similarly/originally filed including different perf harm: https://bugs.llvm.org/show_bug.cgi?id=37144 And there are side-channel/crypto/constant-time comments here: https://bugs.llvm.org/show_bug.cgi?id=42901 I'm not sure if this changes/adds anything for the above bugs and the examples in this thread, but we recently made an x86 change to favor cmov for perf in: https://reviews.llvm.org/D67087 based on perf numbers in: https://bugs.llvm.org/show_bug.cgi?id=43197 On Sat, Sep 14, 2019 at 2:18 AM Chandler Carruth <chandlerc at gmail.com> wrote:> > > On Sat, Sep 14, 2019 at 12:12 AM David Zarzycki <dave at znu.io> wrote: > >> Hi Chandler, >> >> I feel like this conversation has come full circle. So to ask again: how >> does one force CMOV to be emitted? You suggested >> “__builtin_unpredictable()” but that gets lost in various optimization >> passes. Given other architectures have CMOV like instructions, and given >> the usefulness of the instruction for performance tuning, it seems like a >> direct intrinsic would be best. What am I missing? >> > > LLVM operates at a higher level of abstraction IMO, so I don't really feel > like there is something missing here. LLVM is just choosing a lowering that > is expected to be superior even in the face of an unpredictable branch. > > If there are real world benchmarks that show it this lowering strategy is > a problem, file bugs with those benchmarks? We can always change the > heuristics based on new information. > > I think if you want to force a particular instruction to be used, there is > already a pretty reasonable approach: inline assembly. > > >> >> Dave >> >> On Sep 14, 2019, at 8:35 AM, Chandler Carruth <chandlerc at gmail.com> >> wrote: >> >> >> The x86 backend is extremely aggressive in turning cmov with memory >> operands into branches because that is often faster even for poorly >> predicted branches due to the forced stall in the cmov. >> >> On Fri, Sep 13, 2019 at 11:19 PM David Zarzycki <dave at znu.io> wrote: >> >>> I’m struggling to find cases where __builtin_unpredictable() works at >>> all. Even if we ignore cmp/br into switch conversion, it still doesn’t work: >>> >>> int test_cmov(int left, int right, int *alt) { >>> return __builtin_unpredictable(left < right) ? *alt : 999; >>> } >>> >>> Should generate: >>> >>> test_cmov: >>> movl $999, %eax >>> cmpl %esi, %edi >>> cmovll (%rdx), %eax >>> retq >>> >>> But currently generates: >>> >>> test_cmov: >>> movl $999, %eax >>> cmpl %esi, %edi >>> jge .LBB0_2 >>> movl (%rdx), %eax >>> .LBB0_2: >>> retq >>> >>> >>> >>> > On Sep 14, 2019, at 12:18 AM, Sanjay Patel <spatel at rotateright.com> >>> wrote: >>> > >>> > I'm not sure if this is the entire problem, but SimplifyCFG loses the >>> 'unpredictable' metadata when it converts a set of cmp/br into a switch: >>> > https://godbolt.org/z/neLzN3 >>> > >>> > Filed here: >>> > https://bugs.llvm.org/show_bug.cgi?id=43313 >>> > >>> > On Fri, Sep 13, 2019 at 4:02 AM David Zarzycki via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> > >>> > >>> >> On Sep 13, 2019, at 10:45 AM, Chandler Carruth <chandlerc at gmail.com> >>> wrote: >>> >> >>> >> On Fri, Sep 13, 2019 at 1:33 AM David Zarzycki via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >> Hi Chandler, >>> >> >>> >> The data-invariant feature sounds great but what about the general >>> case? When performance tuning code, people sometimes need the ability to >>> reliably generate CMOV, and right now the best advice is either “use inline >>> assembly” or “keep refactoring until CMOV is emited” (and hope that future >>> compilers continue to generate CMOV). >>> >> >>> >> Given that a patch already exists to reliably generate CMOV, are >>> there any good arguments against adding the feature? >>> >> >>> >> For *performance* tuning, the builtin that Hal mentioned is IMO the >>> correct design. >>> >> >>> >> Is there some reason why it doesn't work? >>> > >>> > I wasn’t aware of __builtin_unpredictable() until now and I haven’t >>> debugged why it doesn’t work, but here are a couple examples, one using the >>> ternary operator and one using a switch statement: >>> > >>> > https://godbolt.org/z/S46I_q >>> > >>> > Dave >>> > _______________________________________________ >>> > LLVM Developers mailing list >>> > llvm-dev at lists.llvm.org >>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190915/d651f524/attachment.html>