Amaury SECHET via llvm-dev
2017-Feb-25 17:51 UTC
[llvm-dev] rL296252 Made large integer operation codegen significantly worse.
Hi, I'm working with workload where the bottleneck is cryptographic signature checks. Or, in compiler terms, most large integer operations. Looking at rL296252 , the state of affair in that area degraded quite significantly, see test/CodeGen/X86/i256-add.ll for instance. Is there some kind of work in progress here and it is expected to get better ? Because if not, that's a big problem. It looks like the problem is that the compiler now choose to use pushfq/popfq in some cases rather than chaining adc to propagate the carry in additions. I hope this can get sorted out quickly. I'm happy to help if that is necessary. Thanks, Amaury SECHET -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170225/d4f97afd/attachment.html>
Nirav Davé via llvm-dev
2017-Feb-25 20:06 UTC
[llvm-dev] rL296252 Made large integer operation codegen significantly worse.
rL296252's main change was to turn on anti-aliasing in the DAGCombiner. This should generally be a mild improvement to code due to the relaxed memory constraints, modulo any patterns downstream that are no longer general enough. This looks to be the case here. I'm going to leave this for a little while longer to get a check that all the buildbots pass, but I'll revert this and make sure this test case looks more reasonable. -Nirav On Sat, Feb 25, 2017 at 12:51 PM, Amaury SECHET <deadalnix at gmail.com> wrote:> Hi, > > I'm working with workload where the bottleneck is cryptographic signature > checks. Or, in compiler terms, most large integer operations. > > Looking at rL296252 , the state of affair in that area degraded quite > significantly, see test/CodeGen/X86/i256-add.ll for instance. > > Is there some kind of work in progress here and it is expected to get > better ? Because if not, that's a big problem. It looks like the problem is > that the compiler now choose to use pushfq/popfq in some cases rather than > chaining adc to propagate the carry in additions. > > I hope this can get sorted out quickly. I'm happy to help if that is > necessary. > > Thanks, > > Amaury SECHET >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170225/71253e23/attachment.html>
James Y Knight via llvm-dev
2017-Feb-28 00:01 UTC
[llvm-dev] rL296252 Made large integer operation codegen significantly worse.
This patch only results in relaxing dependencies. This now *allows* new orderings that were not previously allowed, but, the fact that we then actually get such a suboptimal output likely indicates an issue elsewhere, that this allowance is exacerbating. Some observations: 1. For some reason, memop folding seems to be generating seriously non-optimal instructions. It is somehow causing there to be 7 adds in the output instead of 4 -- some with the store integrated, but also keeping the original adds without the store integrated. That's no good...and didn't used to happen. I expect this is the main problem. 2. The scheduler is then choosing an ordering that requires spilling eflags. Not sure why; possibly due to the former it's pushed itself into a corner where this appears like it's required. 3. Then, even if you need to spill, it's a shame that the x86 backend isn't tracking bits in the flag register separately... Thus, a definition and use of the carry bit requires saving/restoring the entire flags register, even if all you cared about was the one carry bit. That's quite unfortunate, as saving/restoring just the carry bit would be a LOT cheaper than saving/restoring the entire register. I suspect it'd be possible to define a 1-bit subregister of eflags and mark the various carry-in ops as only using that. Might be worthwhile doing that, separately, even if fixing #1 makes this particular issue disappear for this test case. On Sat, Feb 25, 2017 at 3:06 PM, Nirav Davé via llvm-dev < llvm-dev at lists.llvm.org> wrote:> rL296252's main change was to turn on anti-aliasing in the DAGCombiner. > This should generally be a mild improvement to code due to the relaxed > memory constraints, modulo any patterns downstream that are no longer > general enough. This looks to be the case here. > > I'm going to leave this for a little while longer to get a check that all > the buildbots pass, but I'll revert this and make sure this test case looks > more reasonable. > > -Nirav > > > > On Sat, Feb 25, 2017 at 12:51 PM, Amaury SECHET <deadalnix at gmail.com> > wrote: > >> Hi, >> >> I'm working with workload where the bottleneck is cryptographic signature >> checks. Or, in compiler terms, most large integer operations. >> >> Looking at rL296252 , the state of affair in that area degraded quite >> significantly, see test/CodeGen/X86/i256-add.ll for instance. >> >> Is there some kind of work in progress here and it is expected to get >> better ? Because if not, that's a big problem. It looks like the problem is >> that the compiler now choose to use pushfq/popfq in some cases rather than >> chaining adc to propagate the carry in additions. >> >> I hope this can get sorted out quickly. I'm happy to help if that is >> necessary. >> >> Thanks, >> >> Amaury SECHET >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/00255d16/attachment.html>
Apparently Analagous Threads
- rL296252 Made large integer operation codegen significantly worse.
- rL296252 Made large integer operation codegen significantly worse.
- Optimizing diamond pattern in DAGCombine
- Optimizing diamond pattern in DAGCombine
- x86: How to Force 2-byte `jmp` instruction in lowering