thr3ads.net - llvm dev - [llvm-dev] Sub-optimal register allocation [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Mohamed Aly via llvm-dev

2018-Jun-08 00:06 UTC

[llvm-dev] Sub-optimal register allocation

Hi

I am using Halide, and trying to generate a simplified version of the
inner kernel in a GEMM operation, similar to this
<https://github.com/google/gemmlowp/blob/master/internal/kernel_neon.h#L59>.
Basically it multiplies a 12x1 column vector with a 1x4 row vector and
updates an accumulator cell of size 12x4. I am targeting 32-bit ARM NEON.

Ideally, all the accumulators and operands should fit in the q registers,
without spilling to the stack. However, the generated ARM assembly uses the
registers in a sub-optimal way, and keeps spilling registers onto the stack
and reloading them.

The relevant part of the LLVM IR is here
<https://gist.github.com/mohamedadaly/57d62a71f6acee21be0883b487cf2e7d>,
and the corresponding arm32 assembly is here
<https://gist.github.com/mohamedadaly/b815a4e59e7fcdbae9bc5968f844dab5>.

Any help to how to solve this, or what might be causing it, will be
greatly appreciated.

Thanks a lot,
Mohamed
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180607/75c06eba/attachment-0001.html>

llvm dev - Jun 2018 - Sub-optimal register allocation

[llvm-dev] Sub-optimal register allocation