thr3ads.net - llvm dev - [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Evan Cheng

2013-Feb-12 15:47 UTC

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 /
a9 due to micro-architecture difference (i believe a8 TRM talks about vmla
hazards) and extensive testing. That said, given the limitation of the current
pre-RA scheduling pass, it's likely the use of vmla can caused regressions.

Im not opposed to changing the setting for a9. However, it's not a good idea
to base the decision on one benchmark. I'd like to see minimally performance
data of the entire llvm test suite.

Evan

Sent from my iPad

On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at
st.com> wrote:
>> Same architecture, different micro-arch (implementation). Could this be
the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9
implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test
have made show significant improvements when disabled for cortex-A9 (STEricsson
Nova platform).
>> 
> 
> Hi Sebastien,
> 
> The optimization does make sense for cortex-a9, I remember to have reviewed
the patch myself and the A9 document clearly states the delays involved between
VMLAs and that this was a solution.
> 
> However, due to micro-architecture differences (as David explained), it may
interfere with other non-Swift steps (or the lack of Swift steps) and produce
worse code. It's not uncommon to see "is (isSwift())" around the
code generation or optimization passes.
> 
> I haven't done any benchmarking on that particular issue, but if you
can show that the performance regression occur on more than one cortex-A9 core
(ST, TI), than I'd be inclined to suggest only enable VMLx-forward by
default on Swift.
> 
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/3b45ca90/attachment.html>

Renato Golin

2013-Feb-12 15:53 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 15:47, Evan Cheng <evan.cheng at apple.com> wrote:
> Im not opposed to changing the setting for a9.
>
At least until we identify what is the problem and how to fix it, if it's
another pass messing up the patterns.


However, it's not a good idea to base the decision on one benchmark.
I'd> like to see minimally performance data of the entire llvm test suite.
>
Absolutely.

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/2fef2c1e/attachment.html>

Sebastien DELDON-GNB

2013-Feb-12 16:56 UTC

head link

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

If this helps taking your decision, there are at least two benchmarks for which
disabling vmlx-forwarding makes a significant difference.
If I get lucky I may be able to run on a panda board by next week and have more
info to share
Best Regards
Seb

________________________________________
De : Evan Cheng [evan.cheng at apple.com]
Date d'envoi : mardi 12 février 2013 16:47
À : Renato Golin
Cc : Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32
instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 /
a9 due to micro-architecture difference (i believe a8 TRM talks about vmla
hazards) and extensive testing. That said, given the limitation of the current
pre-RA scheduling pass, it's likely the use of vmla can caused regressions.

Im not opposed to changing the setting for a9. However, it's not a good idea
to base the decision on one benchmark. I'd like to see minimally performance
data of the entire llvm test suite.

Evan

Sent from my iPad

On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at
linaro.org<mailto:renato.golin at linaro.org>> wrote:

On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
Same architecture, different micro-arch (implementation). Could this be the case
that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9
implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test
have made show significant improvements when disabled for cortex-A9 (STEricsson
Nova platform).

Hi Sebastien,

The optimization does make sense for cortex-a9, I remember to have reviewed the
patch myself and the A9 document clearly states the delays involved between
VMLAs and that this was a solution.

However, due to micro-architecture differences (as David explained), it may
interfere with other non-Swift steps (or the lack of Swift steps) and produce
worse code. It's not uncommon to see "is (isSwift())" around the
code generation or optimization passes.

I haven't done any benchmarking on that particular issue, but if you can
show that the performance regression occur on more than one cortex-A9 core (ST,
TI), than I'd be inclined to suggest only enable VMLx-forward by default on
Swift.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2013-Feb-12 17:05 UTC

head link

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at
st.com>wrote:
> If this helps taking your decision, there are at least two benchmarks for
> which disabling vmlx-forwarding makes a significant difference.
>
I think Evan's worry was to base this decision on visible and
comprehensible benchmarks, such as the test-suite.


If I get lucky I may be able to run on a panda board by next week and
have> more info to share
>
That'd be great, thanks!

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/ab837564/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Feb 2013 - [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Possibly Parallel Threads