Stephen Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Why do you think this requires new intrinsics instead of teaching the optimizer what to do with the existing intrinsics?IMO, as a multiprecision math library maker, the "teaching the optimizer what to do with the existing intrinsics" approach is much better as long as it can be made to work. If one is careful, MSVC does optimize its intrinsics into ADC instructions in a reasonable way, so I think it is probably doable. (Below, all math is multiprecision.) There are actually a few different operations besides pure addition and subtraction, e.g. `a - (b >> 1)` instead of just `a - b`. Also consider that we sometimes want "a - b" to be side-channel free and other times we'd rather "a - b" to be as fast as possible and optimized for the case where carries are unlikely to propagate (far). That's already three different operations, just for subtraction. Cheers, Brian -- https://briansmith.org/
I feel like there are two things being conflated here: 1) Does LLVM need new *LLVM* intrinsics to model this. The add-with-overflow intrinsics seem likely fine for as-is, don't require memory, but we don't lower the code to use adc for some reason. That seems like an LLVM bug that should be fixed in the x86 backend. If optimizing the LLVM intrinsics is hard, then maybe we need better ones, but that's likely about the (somewhat ridiculous) dance we have to do to "sum" the overflow flags rather than anything to do with memory. 2) Does Clang need new *C* builtins to avoid needing the carry-in / carry-out being in memory? I don't have a strong opinion about this. But the example in the original email gets optimized to have no unnecessary memory accesses already: https://godbolt.org/g/GFij97 #1 seems like the most interesting problem to solve. File a bug? On Wed, Feb 15, 2017 at 4:24 PM Brian Smith via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Stephen Canon via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > Why do you think this requires new intrinsics instead of teaching the > optimizer what to do with the existing intrinsics? > > IMO, as a multiprecision math library maker, the "teaching the > optimizer what to do with the existing intrinsics" approach is much > better as long as it can be made to work. If one is careful, MSVC does > optimize its intrinsics into ADC instructions in a reasonable way, so > I think it is probably doable. > > (Below, all math is multiprecision.) > > There are actually a few different operations besides pure addition > and subtraction, e.g. `a - (b >> 1)` instead of just `a - b`. Also > consider that we sometimes want "a - b" to be side-channel free and > other times we'd rather "a - b" to be as fast as possible and > optimized for the case where carries are unlikely to propagate (far). > That's already three different operations, just for subtraction. > > Cheers, > Brian > -- > https://briansmith.org/ > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170216/ef6208bf/attachment.html>
I was suggesting #1. I don't care about clang. Am I supposed to file a bug against every backend? On 02/15/2017 09:44 PM, Chandler Carruth wrote:> I feel like there are two things being conflated here: > > 1) Does LLVM need new *LLVM* intrinsics to model this. The add-with-overflow > intrinsics seem likely fine for as-is, don't require memory, but we don't lower > the code to use adc for some reason. That seems like an LLVM bug that should be > fixed in the x86 backend. If optimizing the LLVM intrinsics is hard, then maybe > we need better ones, but that's likely about the (somewhat ridiculous) dance we > have to do to "sum" the overflow flags rather than anything to do with memory. > > 2) Does Clang need new *C* builtins to avoid needing the carry-in / carry-out > being in memory? I don't have a strong opinion about this. But the example in > the original email gets optimized to have no unnecessary memory accesses > already: https://godbolt.org/g/GFij97 > > #1 seems like the most interesting problem to solve. File a bug? > > On Wed, Feb 15, 2017 at 4:24 PM Brian Smith via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Stephen Canon via llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>> wrote: > > > > Why do you think this requires new intrinsics instead of teaching the > optimizer what to do with the existing intrinsics? > > IMO, as a multiprecision math library maker, the "teaching the > optimizer what to do with the existing intrinsics" approach is much > better as long as it can be made to work. If one is careful, MSVC does > optimize its intrinsics into ADC instructions in a reasonable way, so > I think it is probably doable. > > (Below, all math is multiprecision.) > > There are actually a few different operations besides pure addition > and subtraction, e.g. `a - (b >> 1)` instead of just `a - b`. Also > consider that we sometimes want "a - b" to be side-channel free and > other times we'd rather "a - b" to be as fast as possible and > optimized for the case where carries are unlikely to propagate (far). > That's already three different operations, just for subtraction. > > Cheers, > Brian > -- > https://briansmith.org/ > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >