thr3ads.net - llvm dev - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Sebastien DELDON-GNB

2013-Feb-08 12:28 UTC

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato,

Thanks for the answer, it confirms what I was suspecting. My problem is that
this behavior is controlled by vmlx forwarding on cortex-a9 for which despite
asking on this list, I couldn't get a clear understanding what this option
is meant for.
So here are my new questions:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee
correctness or for performance purpose ? I've made some experiments and
DISABLING vmlx-forwarding for cortex-a9 leads to generation of more vmla/vmls
.f32 and  significantly improve some benchmarks. I've not enter into a case
where it significantly  degrades performance or give incorrect answers.
Thus my goal is to use my front-end to generate llvm neon intrinsics that maps
to LLVM vmla/vmls f32 when I think it is appropriate and not to rely on
disabling/enabling vmlx-forwarding.

Best Regards
Seb

From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Friday, February 08, 2013 11:54 AM
To: Sebastien DELDON-GNB
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32
instruction ?

On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
Hi all,

Everything is in the tile, I would like to enforce generation of vmla.f32
instruction for scalar operations on cortex-a9, so is there a LLMV neon
intrinsic available for that  ?

Hi Sebastien,

LLVM doesn't use intrinsics when there is a clear way of representing the
same thing on standard IR. In the case of VMLA, it is generated from a pattern:

%mul = mul <N x type> %a, %b
%sum = add <N x type> %mul, %c

So, if you generate FAdd(FMull(a, b), c), you'll probably get an FMLA.

It's not common, but also not impossible that the two instructions will be
reordered, or even removed, so you need to make sure the intermediate result is
not used  (or it'll probably use VMUL/VADD) and the final result is used (or
it'll be removed) and keep the body of the function/basic block small (to
avoid reordering).

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130208/54734f1f/attachment.html>

Renato Golin

2013-Feb-08 12:48 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at
st.com>wrote:
> Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to
> guarantee correctness or for performance purpose ? I’ve made some
> experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation
> of more vmla/vmls .f32 and  significantly improve some benchmarks. I’ve not
> enter into a case where it significantly  degrades performance or give
> incorrect answers.
>I believe this is what you're looking for:

http://article.gmane.org/gmane.comp.compilers.llvm.cvs/90709

Performance only, but if you're seeing regressions, I'm interested to
know
what benchmarks and how much are they regressing/improving.


> ****
>
> Thus my goal is to use my front-end to generate llvm neon intrinsics that
> maps to LLVM vmla/vmls f32 when I think it is appropriate and not to rely
> on disabling/enabling vmlx-forwarding.
>In that case, you must disable the pass when you call the back-end.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130208/79e6b702/attachment.html>

Sebastien DELDON-GNB

2013-Feb-11 15:51 UTC

head link

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato,

Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from
SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete
benchmark execution ! So it is worth a try. Now going back to vmla generation
through LLMV intrinsic usage. I've looked at .td file and it seems to me
that when there is a "pattern" to generate instruction, no intrinsic
is defined to generate it, correct ?
Is it possible for an instruction that is generated through a
"pattern" to add also an LLVM intrinsic. My goal here is to not rely
on LLVM to generate VMLA but rather having my front-end to generate call to a
VLMA intrinsic I would have defined when it thinks it's appropriate to
generate one.
Hope that's clear.
Thanks for your answer
Seb

From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Friday, February 08, 2013 1:49 PM
To: Sebastien DELDON-GNB
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32
instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee
correctness or for performance purpose ? I've made some experiments and
DISABLING vmlx-forwarding for cortex-a9 leads to generation of more vmla/vmls
.f32 and  significantly improve some benchmarks. I've not enter into a case
where it significantly  degrades performance or give incorrect answers.
I believe this is what you're looking for:

http://article.gmane.org/gmane.comp.compilers.llvm.cvs/90709

Performance only, but if you're seeing regressions, I'm interested to
know what benchmarks and how much are they regressing/improving.


Thus my goal is to use my front-end to generate llvm neon intrinsics that maps
to LLVM vmla/vmls f32 when I think it is appropriate and not to rely on
disabling/enabling vmlx-forwarding.
In that case, you must disable the pass when you call the back-end.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130211/e51eb50f/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Feb 2013 - [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Possibly Parallel Threads