Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] Vectorizer using Instruction, not opcodes"
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi all,
My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do).
I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
On 4 February 2013 18:25, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> For cases where this approach breaks really badly we could consider adding
> a specialized api or parameters (like the type of a user/use). But we
> should do so only as a last resort and backed by actual code that would
> benefit from doing so.
>
Very sensible, more or less what I had in
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
> Sent: Monday, February 4, 2013 1:38:03 PM
>
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
The loop vectorized does not estimate the cost of vectorization by looking at the IR you list below. It does not vectorize and then run the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization.
The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it
2012 Jul 05
2
[LLVMdev] RE : Vector argument passing abi for ARM ?
Hi Duncan,
I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG.
I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer.
Here is a small example to reproduce the problem I'm experiencing:
; ModuleID = 'bugparam.ll'
target datalayout =
2012 Jul 05
0
[LLVMdev] RE : Vector argument passing abi for ARM ?
Hi Sebastien,
> I also thought it was a bug, especially since it worked with LLVM 3.0, but since it is not defined by ABI, I was not sure if I need to submit it as a BUG.
yes it is a bug.
> I wanted to be sure that it is an actual BUG before submitting it and got the not-a-bug answer.
I didn't read Nadav's reply as saying there was no bug, in fact he explicitly
said in his email
2012 Jul 05
0
[LLVMdev] Vector argument passing abi for ARM ?
Hi Sebastien,
> Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
the code generators are supposed to produce working code no matter what the
parameter type is. The fact that the ARM ABI doesn't specify how <2 x i8>
is passed just means that the code generators can pass it using whatever
technique it feels like (since it
2014 Dec 07
3
[LLVMdev] NEON intrinsics preventing redundant load optimization?
Hi all,
I’m not sure if this is the right list, so apologies if not.
Doing some profiling I noticed some of my hand-tuned matrix multiply code with NEON intrinsics was much slower through a C++ template wrapper vs calling the intrinsics function directly. It turned out clang/LLVM was unable to eliminate a temporary even though the case seemed quite straightforward. Unfortunately any loads
2012 Sep 21
5
[LLVMdev] Question about LLVM NEON intrinsics
Hi all,
I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units.
Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:
; ModuleID = 'vmax.ll'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple =
2013 Oct 14
1
[LLVMdev] Vectorization of pointer PHI nodes
On 14 October 2013 19:31, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> Renato, can you post the c code for the function and the assembly that gcc
> produces?
>
Attached.
Your initial example could be well handled by vectorization of strided
> loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that
> this is what happened). But the LLVM-IR you
2012 Jul 05
3
[LLVMdev] Vector argument passing abi for ARM ?
Hi Rotem,
Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
Best Regards
Seb
> -----Original Message-----
> From: Rotem, Nadav [mailto:nadav.rotem at intel.com]
> Sent: Thursday, July 05, 2012 11:21 AM
> To: Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
> Subject: RE: Vector argument passing abi for ARM ?
>
> The
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop:
for (int i = start ; i < end ; ++i )
for (int p = 0 ; p < 4 ; ++p )
a[i*4+p] = b[i*4+p] + c[i*4+p];
define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float*
noalias %arg4, float* noalias %arg5, float* noalias %arg6) {
entrypoint:
br i1 %arg2, label %L0, label %L1
L0:
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
// Add the various vectorization passes and relevant cleanup passes for
// them since we are no longer in the middle of the main scalar pipeline.
MPM.add(createLoopVectorizePass(DisableUnrollLoops));
MPM.add(createInstructionCombiningPass());
2011 Sep 01
6
[PATCH 0/5] ARM NEON optimization for samplerate converter
From: Jyri Sarha <jsarha at ti.com>
I optimized Speex resampler for NEON capable ARM CPUs. The first patch
should speed up resampling on any platform that can spare the
increased memory usage. It would be nice to have these merged to the
master branch. Please let me know if there is anything I can do to
help the the merge. The patches have been rebased on top of master
branch in
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast
(!!) etc. instructions left? The original loop is so clean, a textbook
example I'd say. There is no need to shuffle anything.At least I don't
see it.
Frank
vector.ph: ; preds = %L5
%broadcast.splatinsert1 = insertelement <4 x
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi,
I am trying to understand LLVM vectorization implementation and was looking
into both loop and SLP vectorization.
test case 1:
*int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return
sum;}*
This code is vectorized by loop vectorizer where we calculate scalar loop
cost as 4 and vector loop cost as 2.
Since vector loop cost is less and above reduction is legal to
2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:
> Hi all,
>
> I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units.
> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:
>
>
> ; ModuleID = 'vmax.ll'
> target datalayout =
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav,
Thanks for the quick reply !!
Ok, so as of now we are lacking capability to handle flat large reductions.
I did go through function vectorizeChainsInBlock() (line number 2862). In
this function,
we try to vectorize if we have phi nodes in the IR (several if's check for
phi nodes) i.e we try to
construct tree that starts at chains.
Any pointers on how to join multiple trees? I
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hi Eli,
Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ?
Or shall I define my own intrinsics for non supported types ?
Best Regards
Seb
________________________________________
De : Eli Friedman [eli.friedman at gmail.com]
Date d'envoi : vendredi 21 septembre 2012 11:54
À : Sebastien
2015 Jan 05
2
[LLVMdev] NEON intrinsics preventing redundant load optimization?
On 4 Jan 2015, at 21:06, Tim Northover <t.p.northover at gmail.com> wrote:
>>> I’ve managed to replace the load/store intrinsics with pointer dereferences (along with a typedef to get the alignment correct). This generates 100% the same IR + asm as the auto-vectorized C version (both using -O3), and works with the toolchain in the latest XCode. Are there any concerns around doing