thr3ads.net - llvm dev - [llvm-dev] rL296252 Made large integer operation codegen significantly worse. [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Amaury SECHET via llvm-dev

2017-Feb-25 17:51 UTC

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

Hi,

I'm working with workload where the bottleneck is cryptographic signature
checks. Or, in compiler terms, most large integer operations.

Looking at rL296252 , the state of affair in that area degraded quite
significantly, see test/CodeGen/X86/i256-add.ll for instance.

Is there some kind of work in progress here and it is expected to get
better ? Because if not, that's a big problem. It looks like the problem is
that the compiler now choose to use pushfq/popfq in some cases rather than
chaining adc to propagate the carry in additions.

I hope this can get sorted out quickly. I'm happy to help if that is
necessary.

Thanks,

Amaury SECHET
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170225/d4f97afd/attachment.html>

Nirav Davé via llvm-dev

2017-Feb-25 20:06 UTC

head link

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

rL296252's  main change was to turn on anti-aliasing in the DAGCombiner.
This should generally be a mild improvement to code due to the relaxed
memory constraints, modulo any patterns downstream that are no longer
general enough. This looks to be the case here.

I'm going to leave this for a little while longer to get a check that all
the buildbots pass, but I'll revert this and make sure this test case looks
more reasonable.

-Nirav

On Sat, Feb 25, 2017 at 12:51 PM, Amaury SECHET <deadalnix at gmail.com>
wrote:
> Hi,
>
> I'm working with workload where the bottleneck is cryptographic
signature
> checks. Or, in compiler terms, most large integer operations.
>
> Looking at rL296252 , the state of affair in that area degraded quite
> significantly, see test/CodeGen/X86/i256-add.ll for instance.
>
> Is there some kind of work in progress here and it is expected to get
> better ? Because if not, that's a big problem. It looks like the
problem is
> that the compiler now choose to use pushfq/popfq in some cases rather than
> chaining adc to propagate the carry in additions.
>
> I hope this can get sorted out quickly. I'm happy to help if that is
> necessary.
>
> Thanks,
>
> Amaury SECHET
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170225/71253e23/attachment.html>

James Y Knight via llvm-dev

2017-Feb-28 00:01 UTC

head link

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

This patch only results in relaxing dependencies. This now *allows* new
orderings that were not previously allowed, but, the fact that we then
actually get such a suboptimal output likely indicates an issue elsewhere,
that this allowance is exacerbating.

Some observations:

1. For some reason, memop folding seems to be generating seriously
non-optimal instructions. It is somehow causing there to be 7 adds in the
output instead of 4 -- some with the store integrated, but also keeping the
original adds without the store integrated. That's no good...and didn't
used to happen. I expect this is the main problem.

2. The scheduler is then choosing an ordering that requires spilling
eflags. Not sure why; possibly due to the former it's pushed itself into a
corner where this appears like it's required.

3. Then, even if you need to spill, it's a shame that the x86 backend
isn't
tracking bits in the flag register separately... Thus, a definition and use
of the carry bit requires saving/restoring the entire flags register, even
if all you cared about was the one carry bit. That's quite unfortunate, as
saving/restoring just the carry bit would be a LOT cheaper than
saving/restoring the entire register. I suspect it'd be possible to define
a 1-bit subregister of eflags and mark the various carry-in ops as only
using that. Might be worthwhile doing that, separately, even if fixing #1
makes this particular issue disappear for this test case.

On Sat, Feb 25, 2017 at 3:06 PM, Nirav Davé via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> rL296252's  main change was to turn on anti-aliasing in the
DAGCombiner.
> This should generally be a mild improvement to code due to the relaxed
> memory constraints, modulo any patterns downstream that are no longer
> general enough. This looks to be the case here.
>
> I'm going to leave this for a little while longer to get a check that
all
> the buildbots pass, but I'll revert this and make sure this test case
looks
> more reasonable.
>
> -Nirav
>
>
>
> On Sat, Feb 25, 2017 at 12:51 PM, Amaury SECHET <deadalnix at
gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm working with workload where the bottleneck is cryptographic
signature
>> checks. Or, in compiler terms, most large integer operations.
>>
>> Looking at rL296252 , the state of affair in that area degraded quite
>> significantly, see test/CodeGen/X86/i256-add.ll for instance.
>>
>> Is there some kind of work in progress here and it is expected to get
>> better ? Because if not, that's a big problem. It looks like the
problem is
>> that the compiler now choose to use pushfq/popfq in some cases rather
than
>> chaining adc to propagate the carry in additions.
>>
>> I hope this can get sorted out quickly. I'm happy to help if that
is
>> necessary.
>>
>> Thanks,
>>
>> Amaury SECHET
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/00255d16/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Feb 2017 - rL296252 Made large integer operation codegen significantly worse.

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

[llvm-dev] rL296252 Made large integer operation codegen significantly worse.

Possibly Parallel Threads