thr3ads.net - llvm dev - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Sebastien DELDON-GNB

2013-Feb-11 15:51 UTC

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato,

Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from
SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete
benchmark execution ! So it is worth a try. Now going back to vmla generation
through LLMV intrinsic usage. I've looked at .td file and it seems to me
that when there is a "pattern" to generate instruction, no intrinsic
is defined to generate it, correct ?
Is it possible for an instruction that is generated through a
"pattern" to add also an LLVM intrinsic. My goal here is to not rely
on LLVM to generate VMLA but rather having my front-end to generate call to a
VLMA intrinsic I would have defined when it thinks it's appropriate to
generate one.
Hope that's clear.
Thanks for your answer
Seb

From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Friday, February 08, 2013 1:49 PM
To: Sebastien DELDON-GNB
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32
instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee
correctness or for performance purpose ? I've made some experiments and
DISABLING vmlx-forwarding for cortex-a9 leads to generation of more vmla/vmls
.f32 and  significantly improve some benchmarks. I've not enter into a case
where it significantly  degrades performance or give incorrect answers.
I believe this is what you're looking for:

http://article.gmane.org/gmane.comp.compilers.llvm.cvs/90709

Performance only, but if you're seeing regressions, I'm interested to
know what benchmarks and how much are they regressing/improving.


Thus my goal is to use my front-end to generate llvm neon intrinsics that maps
to LLVM vmla/vmls f32 when I think it is appropriate and not to rely on
disabling/enabling vmlx-forwarding.
In that case, you must disable the pass when you call the back-end.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130211/e51eb50f/attachment.html>

Renato Golin

2013-Feb-11 16:12 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 11 February 2013 15:51, Sebastien DELDON-GNB <sebastien.deldon at
st.com>wrote:
> Indeed problem is with generation of vmla.f64. Affected benchmark is MILC
> from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on
> complete benchmark execution ! So it is worth a try.
>
Hi Sebastien,

Ineed, worth having a look. Including Bob Wilson (who introduced the code
in the first place, and is a connoisseur of NEON in LLVM) to see if he has
a better idea of the problem.

Now going back to vmla generation through LLMV intrinsic usage. I’ve
looked> at .td file and it seems to me that when there is a “pattern” to generate
> instruction, no intrinsic is defined to generate it, correct ?
>
Correct.

Is it possible for an instruction that is generated through a “pattern”
to> add also an LLVM intrinsic. My goal here is to not rely on LLVM to generate
> VMLA but rather having my front-end to generate call to a VLMA intrinsic I
> would have defined when it thinks it’s appropriate to generate one.
>No, and I'm not sure we should have one.

I understand why you want one, but that's too much back-end knowledge to a
front-end, and any pass that can transform a pair of VMLAs into an
intrinsic call, can also transform into VMLA+VMUL+VADD. In this case,
disabling the optimization is probably the best course of action.

In your compiler, you may prefer to leave it always disabled, then you
should set it when creating the Target.

If we find that this optimization produces worse code in more cases than
not, than we should leave it disable by default and let the user enable
when necessary. I'll let Bob follow up on that, since I don't know what
benchmarks he used.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130211/c1371c14/attachment.html>

David Tweed

2013-Feb-11 16:45 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi,

 

| If we find that this optimization produces worse code in more cases than
not, than we should leave it disable by default and let the user enable when
necessary. I'll let Bob follow up on that, since I don't know what
benchmarks he used.

 

Note that it may well be the case that the most "generally performant"
default may vary between different ARM cores as well as various types of
code. It would certainly be as well to try benchmarking on different cores.
as theoretical discussion of which code sequence is better is often add odds
with empirically observed results.

 

Cheers,

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130211/57fcc657/attachment.html>

Bob Wilson

2013-Feb-11 18:21 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

In theory, the backend should choose the best instructions for the selected
target processor.  VMLA is not always the best choice.  Lang Hames did some
measurements a while back to come up with the current behavior, but I don't
remember exactly what he found.  CC'ing Lang.

On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 11 February 2013 15:51, Sebastien DELDON-GNB <sebastien.deldon at
st.com> wrote:
> Indeed problem is with generation of vmla.f64. Affected benchmark is MILC
from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on
complete benchmark execution ! So it is worth a try.
> 
> 
> Hi Sebastien,
> 
> Ineed, worth having a look. Including Bob Wilson (who introduced the code
in the first place, and is a connoisseur of NEON in LLVM) to see if he has a
better idea of the problem.
> 
> 
> Now going back to vmla generation through LLMV intrinsic usage. I’ve looked
at .td file and it seems to me that when there is a “pattern” to generate
instruction, no intrinsic is defined to generate it, correct ?
> 
> 
> Correct.
> 
> 
> Is it possible for an instruction that is generated through a “pattern” to
add also an LLVM intrinsic. My goal here is to not rely on LLVM to generate VMLA
but rather having my front-end to generate call to a VLMA intrinsic I would have
defined when it thinks it’s appropriate to generate one.
> 
> No, and I'm not sure we should have one.
> 
> I understand why you want one, but that's too much back-end knowledge
to a front-end, and any pass that can transform a pair of VMLAs into an
intrinsic call, can also transform into VMLA+VMUL+VADD. In this case, disabling
the optimization is probably the best course of action.
> 
> In your compiler, you may prefer to leave it always disabled, then you
should set it when creating the Target.
> 
> If we find that this optimization produces worse code in more cases than
not, than we should leave it disable by default and let the user enable when
necessary. I'll let Bob follow up on that, since I don't know what
benchmarks he used.
> 
> cheers,
> --renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130211/2d013c57/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Feb 2013 - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Reasonably Related Threads