thr3ads.net - llvm dev - [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Sebastien DELDON-GNB

2013-Feb-15 16:00 UTC

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Lang & Renato,

I eventually set up a panda board with latest linaro delivery (eabi-hf). I did
some experiments using my own compiler and LLVM 3.2 as back-end.
I use same flagset for my compiler (front-end) and just invoke llc with and
without vmlx-forwarding attribute. So base arguments to llc are:

llc  -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard

to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for
cortex-a9.
When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset
for MILC. So I'm observing something similar to what I've observed on
STE platform available on SNOWBALL board.

Hope this helps
Best Regards
Seb


From: Lang Hames [mailto:lhames at gmail.com]
Sent: Wednesday, February 13, 2013 8:31 AM
To: Renato Golin
Cc: Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] RE : Is there any llvm neon intrinsic that maps to
vmla.f32 instruction ?

Hi Sebastien,

How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding?
As I mentioned earlier, I saw only two additional integer vmlx instructions when
I tested.

Could you send me your 433.milc compile setup? (os, flags, compiler version,
etc.). I'd like to try to reproduce your results.

Cheers,
Lang.

On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at
linaro.org<mailto:renato.golin at linaro.org>> wrote:
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
If this helps taking your decision, there are at least two benchmarks for which
disabling vmlx-forwarding makes a significant difference.

I think Evan's worry was to base this decision on visible and comprehensible
benchmarks, such as the test-suite.


If I get lucky I may be able to run on a panda board by next week and have more
info to share

That'd be great, thanks!

--renato

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/9520c1bb/attachment.html>

Renato Golin

2013-Feb-15 16:16 UTC

head link

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at
st.com>wrote:
> to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for
> cortex-a9.
>
> When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset
> for MILC. So I’m observing something similar to what I’ve observed on STE
> platform available on SNOWBALL board.
>
Hi Seb,

Thanks for doing this, as we expected, it's a general A9 issue. Ok, now we
need to get the LNT going. Have you ever done that? It's quite simple.

Just follow this guide:
http://llvm.org/docs/lnt/quickstart.html

Optionally, compile LLVM+Clang and use the attached run.sh file to run the
LNT (changing the paths accordingly).

That should take a few hours and, if you have the Perf server running, will
allow you to run it twice (with and without vmlx) and compare the results.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/c249aa4e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.sh
Type: application/x-sh
Size: 281 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/c249aa4e/attachment.sh>

Sebastien DELDON-GNB

2013-Feb-15 17:06 UTC

head link

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato,

No I've used LNT before and it might not be as simple as you think to get it
working here. I'll see what I can do, but It's unlikely I'll have
much time to spend on this topic in the coming weeks.

I'm more interested coming back to my original question, and would like to
know how to proceed if I want to define my own LLVM intrinsic to generate VMLA
instruction. My goal is not to get my work committed to LLVM trunk and thus
pollute community work with intrinsics that are only useful  to me.

Thanks for your answer
Best Regards
Seb

From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Friday, February 15, 2013 5:17 PM
To: Sebastien DELDON-GNB
Cc: Lang Hames; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] RE : Is there any llvm neon intrinsic that maps to
vmla.f32 instruction ?

On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at
st.com<mailto:sebastien.deldon at st.com>> wrote:
to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for
cortex-a9.
When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset
for MILC. So I'm observing something similar to what I've observed on
STE platform available on SNOWBALL board.

Hi Seb,

Thanks for doing this, as we expected, it's a general A9 issue. Ok, now we
need to get the LNT going. Have you ever done that? It's quite simple.

Just follow this guide:
http://llvm.org/docs/lnt/quickstart.html

Optionally, compile LLVM+Clang and use the attached run.sh file to run the LNT
(changing the paths accordingly).

That should take a few hours and, if you have the Perf server running, will
allow you to run it twice (with and without vmlx) and compare the results.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/fefd807b/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Feb 2013 - [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Possibly Parallel Threads