Paweł Bylica via llvm-dev
2018-Nov-20 10:45 UTC
[llvm-dev] A pattern for portable __builtin_add_overflow()
Hi LLVM, clang, I'm trying to write a portable version of __builtin_add_overflow() it a way that the compiler would recognize the pattern and use the add_overflow intrinsic / the best possible machine instruction. Here are docs about these builtins: https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins . With unsigned types this is easy: int uaddo_native(unsigned a, unsigned b, unsigned* s) { return __builtin_add_overflow(a, b, s); } int uaddo_portable(unsigned a, unsigned b, unsigned* s) { *s = a + b; return *s < a; } We get exactly the same assembly: uaddo_native: # @uaddo_native xor eax, eax add edi, esi setb al mov dword ptr [rdx], edi ret uaddo_portable: # @uaddo_portable xor eax, eax add edi, esi setb al mov dword ptr [rdx], edi ret But with signed types it is not so easy. I tried 2 versions, but the result is quite far away from the optimal assembly. int saddo_native(int a, int b, int* s) { return __builtin_add_overflow(a, b, s); } int saddo_portable(int a, int b, int* s) { *s = (unsigned)a + (unsigned)b; return (a > 0) ? *s <= b : *s > b; } int saddo_portable2(int a, int b, int* s) { *s = (unsigned)a + (unsigned)b; int cond = a > 0; int check = *s > b; return (cond & !check) | (!cond & check); } Assembly: saddo_native: # @saddo_native xor eax, eax add edi, esi seto al mov dword ptr [rdx], edi ret saddo_portable: # @saddo_portable lea eax, [rsi + rdi] mov dword ptr [rdx], eax cmp eax, esi setle al setg cl test edi, edi jg .LBB3_2 mov eax, ecx .LBB3_2: movzx eax, al ret saddo_portable2: # @saddo_portable2 lea eax, [rsi + rdi] mov dword ptr [rdx], eax test edi, edi setg cl cmp eax, esi setg al xor al, cl movzx eax, al ret Do you know the trick to force the compiler to use the seto instruction? I also noticed that the transformation for uaddo_portable happens in CodeGen, not in IR. Bests, Paweł -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181120/1779e2db/attachment-0001.html>
Sanjoy Das via llvm-dev
2018-Nov-22 17:54 UTC
[llvm-dev] A pattern for portable __builtin_add_overflow()
Going by InstCombiner::foldICmpWithConstant, https://gcc.godbolt.org/z/To_qmm should work. -- Sanjoy On Tue, Nov 20, 2018 at 2:46 AM Paweł Bylica via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi LLVM, clang, > > I'm trying to write a portable version of __builtin_add_overflow() it a way that the compiler would > recognize the pattern and use the add_overflow intrinsic / the best possible machine instruction. > > Here are docs about these builtins: https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins. > > With unsigned types this is easy: > > int uaddo_native(unsigned a, unsigned b, unsigned* s) > { > return __builtin_add_overflow(a, b, s); > } > > int uaddo_portable(unsigned a, unsigned b, unsigned* s) > { > *s = a + b; > return *s < a; > } > > We get exactly the same assembly: > uaddo_native: # @uaddo_native > xor eax, eax > add edi, esi > setb al > mov dword ptr [rdx], edi > ret > uaddo_portable: # @uaddo_portable > xor eax, eax > add edi, esi > setb al > mov dword ptr [rdx], edi > ret > > But with signed types it is not so easy. I tried 2 versions, but the result is quite far away from the optimal assembly. > > > int saddo_native(int a, int b, int* s) > { > return __builtin_add_overflow(a, b, s); > } > > int saddo_portable(int a, int b, int* s) > { > *s = (unsigned)a + (unsigned)b; > return (a > 0) ? *s <= b : *s > b; > } > > int saddo_portable2(int a, int b, int* s) > { > *s = (unsigned)a + (unsigned)b; > int cond = a > 0; > int check = *s > b; > return (cond & !check) | (!cond & check); > } > > Assembly: > > saddo_native: # @saddo_native > xor eax, eax > add edi, esi > seto al > mov dword ptr [rdx], edi > ret > saddo_portable: # @saddo_portable > lea eax, [rsi + rdi] > mov dword ptr [rdx], eax > cmp eax, esi > setle al > setg cl > test edi, edi > jg .LBB3_2 > mov eax, ecx > .LBB3_2: > movzx eax, al > ret > saddo_portable2: # @saddo_portable2 > lea eax, [rsi + rdi] > mov dword ptr [rdx], eax > test edi, edi > setg cl > cmp eax, esi > setg al > xor al, cl > movzx eax, al > ret > > > Do you know the trick to force the compiler to use the seto instruction? > > I also noticed that the transformation for uaddo_portable happens in CodeGen, not in IR. > > Bests, > Paweł > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Paweł Bylica via llvm-dev
2018-Nov-22 20:07 UTC
[llvm-dev] A pattern for portable __builtin_add_overflow()
Thanks Sanjoy, It looks it works only up to 32-bit integers ( http://llvm.org/doxygen/InstCombineCompares_8cpp_source.html#l01255). I will need it also for 64-bit integers. Even if the pattern match would handle the 64-bit case I would need to use 128-bit types what is not portable still. // P. On Thu, Nov 22, 2018 at 6:54 PM Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Going by InstCombiner::foldICmpWithConstant, > https://gcc.godbolt.org/z/To_qmm should work. > > -- Sanjoy > On Tue, Nov 20, 2018 at 2:46 AM Paweł Bylica via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hi LLVM, clang, > > > > I'm trying to write a portable version of __builtin_add_overflow() it a > way that the compiler would > > recognize the pattern and use the add_overflow intrinsic / the best > possible machine instruction. > > > > Here are docs about these builtins: > https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins > . > > > > With unsigned types this is easy: > > > > int uaddo_native(unsigned a, unsigned b, unsigned* s) > > { > > return __builtin_add_overflow(a, b, s); > > } > > > > int uaddo_portable(unsigned a, unsigned b, unsigned* s) > > { > > *s = a + b; > > return *s < a; > > } > > > > We get exactly the same assembly: > > uaddo_native: # @uaddo_native > > xor eax, eax > > add edi, esi > > setb al > > mov dword ptr [rdx], edi > > ret > > uaddo_portable: # @uaddo_portable > > xor eax, eax > > add edi, esi > > setb al > > mov dword ptr [rdx], edi > > ret > > > > But with signed types it is not so easy. I tried 2 versions, but the > result is quite far away from the optimal assembly. > > > > > > int saddo_native(int a, int b, int* s) > > { > > return __builtin_add_overflow(a, b, s); > > } > > > > int saddo_portable(int a, int b, int* s) > > { > > *s = (unsigned)a + (unsigned)b; > > return (a > 0) ? *s <= b : *s > b; > > } > > > > int saddo_portable2(int a, int b, int* s) > > { > > *s = (unsigned)a + (unsigned)b; > > int cond = a > 0; > > int check = *s > b; > > return (cond & !check) | (!cond & check); > > } > > > > Assembly: > > > > saddo_native: # @saddo_native > > xor eax, eax > > add edi, esi > > seto al > > mov dword ptr [rdx], edi > > ret > > saddo_portable: # @saddo_portable > > lea eax, [rsi + rdi] > > mov dword ptr [rdx], eax > > cmp eax, esi > > setle al > > setg cl > > test edi, edi > > jg .LBB3_2 > > mov eax, ecx > > .LBB3_2: > > movzx eax, al > > ret > > saddo_portable2: # @saddo_portable2 > > lea eax, [rsi + rdi] > > mov dword ptr [rdx], eax > > test edi, edi > > setg cl > > cmp eax, esi > > setg al > > xor al, cl > > movzx eax, al > > ret > > > > > > Do you know the trick to force the compiler to use the seto instruction? > > > > I also noticed that the transformation for uaddo_portable happens in > CodeGen, not in IR. > > > > Bests, > > Paweł > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181122/27ebe627/attachment.html>
Maybe Matching Threads
- How to call an (x86) cleanup/catchpad funclet
- Small inconsistencies in configure checks
- Small inconsistencies in configure checks
- Where's the optimiser gone (part 11): use the proper instruction for sign extension
- [LLVMdev] question about enabling cfl-aa and collecting a57 numbers