thr3ads.net - llvm dev - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Sebastien DELDON-GNB

2013-Feb-12 10:25 UTC

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Understood,

Same architecture, different micro-arch (implementation). Could this be the case
that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9
implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test
have made show significant improvements when disabled for cortex-A9 (STEricsson
Nova platform).

Best Regards
Seb

From: David Tweed [mailto:david.tweed at arm.com]
Sent: Tuesday, February 12, 2013 11:11 AM
To: Sebastien DELDON-GNB; Lang Hames; Bob Wilson
Cc: llvmdev at cs.uiuc.edu
Subject: RE: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32
instruction ?

| Sorry for my naïve question but what is Swift ?

It's a complicated area. There's the standard Cortex-a9 design from ARM,
Swift is the CPU that Apple that's used in their latest products that is
significantly modified from a basic ARM design and then there's the next
generation Cortex-a15 design from ARM. Each of them handles the same instruction
set, but the implementation detaiis of each mean that different instruction
sequences may perform better on each.

Cheers,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/2472f5ef/attachment.html>

Renato Golin

2013-Feb-12 11:08 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at
st.com>wrote:
> Same architecture, different micro-arch (implementation). Could this be
> the case that vmlx-forwarding make senses for SWIFT and not for ARM
> Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is
> used but test have made show significant improvements when disabled for
> cortex-A9 (STEricsson Nova platform).
>
Hi Sebastien,

The optimization does make sense for cortex-a9, I remember to have reviewed
the patch myself and the A9 document clearly states the delays involved
between VMLAs and that this was a solution.

However, due to micro-architecture differences (as David explained), it may
interfere with other non-Swift steps (or the lack of Swift steps) and
produce worse code. It's not uncommon to see "is (isSwift())"
around the
code generation or optimization passes.

I haven't done any benchmarking on that particular issue, but if you can
show that the performance regression occur on more than one cortex-A9 core
(ST, TI), than I'd be inclined to suggest only enable VMLx-forward by
default on Swift.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/54b55011/attachment.html>

Evan Cheng

2013-Feb-12 15:47 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 /
a9 due to micro-architecture difference (i believe a8 TRM talks about vmla
hazards) and extensive testing. That said, given the limitation of the current
pre-RA scheduling pass, it's likely the use of vmla can caused regressions.

Im not opposed to changing the setting for a9. However, it's not a good idea
to base the decision on one benchmark. I'd like to see minimally performance
data of the entire llvm test suite.

Evan

Sent from my iPad

On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at
st.com> wrote:
>> Same architecture, different micro-arch (implementation). Could this be
the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9
implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test
have made show significant improvements when disabled for cortex-A9 (STEricsson
Nova platform).
>> 
> 
> Hi Sebastien,
> 
> The optimization does make sense for cortex-a9, I remember to have reviewed
the patch myself and the A9 document clearly states the delays involved between
VMLAs and that this was a solution.
> 
> However, due to micro-architecture differences (as David explained), it may
interfere with other non-Swift steps (or the lack of Swift steps) and produce
worse code. It's not uncommon to see "is (isSwift())" around the
code generation or optimization passes.
> 
> I haven't done any benchmarking on that particular issue, but if you
can show that the performance regression occur on more than one cortex-A9 core
(ST, TI), than I'd be inclined to suggest only enable VMLx-forward by
default on Swift.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/3b45ca90/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Feb 2013 - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Possibly Parallel Threads