similar to: generate vectorized code

Displaying 20 results from an estimated 1000 matches similar to: "generate vectorized code"

2016 Mar 16
3
vectorization for X86
I'm trying to vectorize a simple C code. My problem is that I don't quite understand the relationship between clang --target option and the cores mentioned in X86.td as well as other X86 related options (such as -mtune). Below are the command and the code that I'm trying to vectorize. The code compiles but I don't see any vectors. What am I doing wrong? Any help is appreciated.
2016 Mar 17
2
generate vectorized code
On Wed, Mar 16, 2016 at 11:48 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > Hi Rail, > > Two hints to begin with: > > 1) Makes sure you example is vectorized on X86 for example > 2) Is your target correctly overriding the TTI (declaring the vector > register size for example) so that the vectorizer can kicks-in (see > X86TTIImpl::getRegisterBitWidth for
2016 Mar 17
2
generate vectorized code
On Wed, Mar 16, 2016 at 6:38 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Mar 16, 2016, at 5:38 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > > On Wed, Mar 16, 2016 at 11:48 AM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > >> Hi Rail, >> >> Two hints to begin with: >> >> 1) Makes sure you example is
2016 Mar 12
4
clang triple and clang target
> > I assume with target you mean the backend? Consider the x86 backend. It > supports 32bit and 64bit mode, with the GNU x32 ABI in between. There > are three different executable formats support (ELF, PE, MachO) with > different constraints. Some platforms require 32bit alignment of the > stack, others require 128bit alignment. The list goes on. The triple > specifies >
2016 Mar 14
3
clang triple and clang target
On Sat, Mar 12, 2016 at 2:38 PM, Tim Northover <t.p.northover at gmail.com> wrote: > On 12 March 2016 at 11:51, Rail Shafigulin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > I tried every possible combination of --target I could think of but > nothing > > worked. Would you mind helping me out? > > First, 64-bit x86 is "x86_64", and 32-bit
2016 Mar 18
3
generate vectorized code
On Thu, Mar 17, 2016 at 2:41 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > On Thu, Mar 17, 2016 at 10:10 AM, Rail Shafigulin <rail at esenciatech.com> > wrote: > >> On Wed, Mar 16, 2016 at 6:38 PM, Mehdi Amini <mehdi.amini at apple.com> >> wrote: >> >>> >>> On Mar 16, 2016, at 5:38 PM, Rail Shafigulin <rail at
2016 Mar 11
2
clang triple and clang target
Can someone explain what exactly a clang triple is (--triple option) and what is the connection between triple and a target? I know there is an article ( http://clang.llvm.org/docs/CrossCompilation.html) that show how to cross compile code, but I'm not clear about is why I need to specify triple, why I can't just say compile for a given target? Any help is appreciated. -- Rail
2016 Mar 18
2
generate vectorized code
> On Mar 18, 2016, at 12:52 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> >> On Mar 18, 2016, at 12:45 PM, Rail Shafigulin <rail at esenciatech.com <mailto:rail at esenciatech.com>> wrote: >> >> On Thu, Mar 17, 2016 at 2:41 PM, Rail Shafigulin <rail at esenciatech.com <mailto:rail at esenciatech.com>> wrote: >> On Thu,
2018 Jul 24
2
KNL Vectorization with larger vector width
Thank You. Right now to see the effect i did following changes; unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) { if (Vector) { if (ST->hasAVX512()) return 65536; here i changed 512 to 65536. Then in loopvectorize.cpp i did following; assert(MaxVectorSize <= 2048 && "Did not expect to pack so many elements" " into
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such
2018 Jul 24
2
KNL Vectorization with larger vector width
Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and
2018 Jul 23
2
KNL Vectorization with larger vector width
Thank You. I got it. Version issue. TTI.getRegisterBitWidth(true) How to put my target machine info in TTI? Please help. On Mon, Jul 23, 2018 at 11:33 PM, Friedman, Eli <efriedma at codeaurora.org> wrote: > On 7/23/2018 10:49 AM, hameeza ahmed via llvm-dev wrote: > > Thank You. > > But I cannot find your mentioned function
2016 Mar 18
2
generate vectorized code
> On Mar 18, 2016, at 1:37 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > >> I think you created a cycle, this is easy to do with SelectionDAG :) >> Basically SelecitonDAG will iterate until it does not see anything to change. So if you insert a transformation on a pattern A, that generates pattern B, while you have another transformation that matches B and
2016 Mar 18
2
generate vectorized code
> On Mar 18, 2016, at 1:47 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > > Yes this IR does not build or shuffle any vector. Try to write a function that takes 8 ints and a pointer to a <4xi32>, builds two vectors with the 8 ints, > > This might sound like a dumb question, but how does one build a vector of ints out of regular ints in IR? See:
2013 Dec 12
0
[LLVMdev] AVX code gen
It probably does not pick the right processor architecture. You could try “clang -mavx” or “clang -march=corei7-avx” for ivy-bridge and “clang -march=core-avx2” or “clang -mavx2" for haswell. $ clang -march=core-avx2 -O3 -S -o - test.c .section __TEXT,__text,regular,pure_instructions .globl _f .align 4, 0x90 _f: ## @f
2016 Mar 18
2
generate vectorized code
> > Here is how I started with SelectionDAG: > > - small IR (bugpoint can help) > Did you mean a break poing? - the magic flag: -debug > - read the output of SelectionDAG debugging (especially with cycles) > - matching the log to source code > What log are you talking about? > - single stepping in a debugger sometimes. > > > -- > Mehdi > > -- Rail
2016 Mar 18
4
generate vectorized code
On Fri, Mar 18, 2016 at 2:03 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > On Fri, Mar 18, 2016 at 1:53 PM, Mehdi Amini <mehdi.amini at apple.com> > wrote: > >> >> On Mar 18, 2016, at 1:47 PM, Rail Shafigulin <rail at esenciatech.com> >> wrote: >> >> Yes this IR does not build or shuffle any vector. Try to write a function
2016 Mar 05
2
Enable / Disable a processor feature
I'm trying to enable/disable a target feature through clang. Here is how my target looks like // Esencia subtarget features //===----------------------------------------------------------------------===// def FeatureMul : SubtargetFeature<"mul", "HasMul", "true", "Enable hardware multiplier">; def FeatureDiv
2014 Sep 14
2
[LLVMdev] Testing the new CFL alias analysis
In lto+pgo some (5 out of 12 with usual suspect like perlbench and gcc among them using -flto -Wl,-mllvm,-use-cfl-aa -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t compile. Has the implementation been tested with lto? If not, please stress the implementation more. Do we know reasons for gains? Where did you expect the biggest gains? Some of the losses will likely boil down to
2016 May 27
0
sum elements in the vector
Hi Shahid. Do you mind providing a concrete example of X86 code where an intrinsic was added (preferrable with filenames and line numbers)? I'm having difficulty tracking down the steps you provided. Any help is appreciated. On Mon, Apr 4, 2016 at 9:02 PM, Shahid, Asghar-ahmad < Asghar-ahmad.Shahid at amd.com> wrote: > Hi Rail, > > > > We had done this for generation