similar to: [LLVMdev] Alignment of pointee

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Alignment of pointee"

2014 Mar 25
3
[LLVMdev] Alignment of pointee
On 03/25/2014 10:08 AM, Krzysztof Parzyszek wrote: > On 3/25/2014 8:53 AM, Frank Winter wrote: >> >> One can see that if the initial alignment of the pointee of %arg0 was 32 >> bytes and since the vectorizer operates on a loop with a fixed trip >> count of 4 and the size of double is 8 bytes, the vector loads and >> stores could be ideally aligned with 32 bytes
2014 Mar 25
2
[LLVMdev] Alignment of pointee
Related to this subject is the __attribute__(aligned(X)) that can be set on a type in C/C++. It is being used when generating the load / stores / memcpy / ... but is lost with respect to the type's attribute. In many a case this could help various analysis or transforms to provide more accurate results when such a type is used. The __builtin_assume_aligned could be an way to solve this.
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP vectorizer which vectorized the code. Potentially I could do the same in the IR. However, the loop body that is generated in the IR can get very large. Thus, the loop unroller will refuse to unroll the loop in a large number of (important) cases. Isn't there a way to convince the loop vectorizer that it should
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it: from Transforms/IPO/PassManagerBuilder.cpp: // Add the various vectorization passes and relevant cleanup passes for // them since we are no longer in the middle of the main scalar pipeline. MPM.add(createLoopVectorizePass(DisableUnrollLoops)); MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop: for (int i = start ; i < end ; ++i ) for (int p = 0 ; p < 4 ; ++p ) a[i*4+p] = b[i*4+p] + c[i*4+p]; define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* noalias %arg4, float* noalias %arg5, float* noalias %arg6) { entrypoint: br i1 %arg2, label %L0, label %L1 L0:
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
With LLVM 3.8 the JIT compiler engine generates an AVX512 instruction although I target an 'avx2' CPU (intel Core I7). I just downloaded the most recent 3.8 and still it happens. It happens with this input module: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1 %ordered, i64 %start, i32* noalias align 32
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
On 06/23/2016 12:56 PM, Craig Topper wrote: > Can you check what value "getHostCPUName" returned? getHostCPUName() = skylake > > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > With LLVM 3.8 the JIT compiler engine generates an AVX512 > instruction although I
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
Yes, you need the latest ToT version of llvm or you run -loop-vectorize -earlycse -instcombine -simplifycfg The bitcast essentially is a noop to satisfy the type system. This is how your example looks like for me: vector.body: ; preds = %vector.body, %vector.ph %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %.lhs = shl i64 %6, 2
2013 Nov 11
0
[LLVMdev] loop vectorizer: JIT + AVX segfaults
I changed the code to use the MCJIT engine. As Josh suspected it's the same issue: The program runs fine on SSE based machines, but SEGFAULTs on a CPU with AVX extensions. I attach the repro case. Should I file a bug report? P.S. On bugzilla there is the component 'new-bugs'. Should all new bugs be filed there? Frank On 11/11/13 08:45, Josh Klontz wrote: > For what it's
2013 Nov 10
2
[LLVMdev] loop vectorizer: JIT + AVX segfaults
Is it possible that the AVX support in the JIT engine or x86-64 backend is not mature? I am getting segfaults when switching from a vector length 4 to 8 in my application. I isolated the barfing function and it still segfaults in the minimal setup: The IR attached implements the following simple function: void bar(int start, int end, int ignore , bool add , bool addme , float* out, float* in) {
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
Hi! When compiling the attached module with the JIT engine on an Intel KNL I see wrong code getting emitted. I attach a complete exploit program which shows the bug in LLVM 3.8. It loads and JIT compiles the module and prints the assembler. I stumbled on this since the result of an actual calculation was wrong. So, it's not only the text version of the assembler also the machine
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
2020 Jan 14
3
Alignment parameter attributes
"Doerfert, Johannes via llvm-dev" <llvm-dev at lists.llvm.org> writes: > We could use a custom tag on `llvm.assume`, as an extension of > https://reviews.llvm.org/D72475, but that is not yet implemented. I had thought about using llvm.assume but was wondering if there is a better way. Tagging the loads with metadata really seems about the same amount of effort. Both
2013 Nov 10
0
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
I looked more into this. For the previously sent IR the vector width of 256 bit is found mistakenly (and reproducibly) on this hardware: model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz For the same IR the loop vectorizer finds the correct vector width (128 bit) on: model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz model name : Intel(R) Core(TM) i7 CPU M 640 @
2015 May 09
4
[LLVMdev] [LSR] hoisting loop invariants in reverse order
Hi, I was tracking down a performance regression and noticed that LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of indvars) in the reverse order of how they appear in the loop. This reverse order creates troubles for the StraightLineStrengthReduce pass I recently add. While I understand ultimately SLSR should be able to sort independent candidates in an optimal order,
2013 Nov 10
3
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is
2015 Jun 22
2
[LLVMdev] bb-vectorizer transforms only part of the block
The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:
2013 Nov 10
2
[LLVMdev] loop vectorizer erroneously finds 256 bit vectors
Hi Frank, I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which does have 256-bit vectors. The other two only supports SSE instructions, which are only 128-bit long. cheers, --renato On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote: > I looked more into this. For the previously sent IR the vector width of > 256 bit is found mistakenly