similar to: Vectorizing remainder loop

Displaying 20 results from an estimated 900 matches similar to: "Vectorizing remainder loop"

2018 Aug 02
2
Vectorizing remainder loop
Hi Hameeza, Aside from Ashutosh's patch..... When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example. VF=2048 // main vector loop VF=1024 // vectorized remainder 1 VF=512 // vectorized remainder 2 ... Vectorize remainder until trip count is
2013 Dec 31
4
[LLVMdev] [Patch][RFC] Change R600 data layout
Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2013 Feb 07
1
[LLVMdev] alloca scalarization with dynamic indexing into vectors
Hi all, I have a question regarding dynamic indexing into a vector with GEP. I see that in the ScalarReplAggregates pass in the LLVM 3.2 release the call SROA::isSafeGEP() will now allow alloca scalarization in the case where a GEP index into a vector isn’t a constant. My question is: what is the expected behavior when the index is out of bounds of the vector? Is it undefined? I have an
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for
2013 Aug 10
2
[LLVMdev] Address space extension
> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally
2017 Jul 01
3
Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3
2017 Oct 24
3
Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark:
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
Hello, I am trying to vectorize following stencil code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[restrict][N]) { int i, j, k; for (k = 0; k < 100; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j++) { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
2017 Oct 23
3
Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather
2019 Apr 23
5
StringRef Iterator Variable Display
Hello, I want to display the variable names in stringref iterator. But it is not displayed using following code. for (set<StringRef>::iterator sit = L.begin(); sit != L.end(); sit++) { errs() << *sit << " "; } How to do this? Please help.. -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2017 Jul 08
5
Error in v64i32 type in x86 backend
Thank You. I have seen the opcode is 8 bits and all the combinations are already used in llvm x86. Now what to do? On Sat, Jul 8, 2017 at 10:57 AM, Craig Topper <craig.topper at gmail.com> wrote: > Yes its an opcode conflict. You'll have to look through Intel documents > and find an unused opcode. I've only added instructions based on a real > spec so I don't know
2017 Jul 14
3
error:Ran out of lanemask bits to represent subregister
Do your 32768 registers also have sub registers? I can't tell you exactly what to change. I'm not familiar with the code. I would just be running grep or something. ~Craig On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Thank you so much. I think there is no issue with my definitions since i > have to use larger registers i.e 65536 bit
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i restored my original instructions and for that i have to include bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp then used this condition; if(HasTA) ++SrcRegNum; in order to emit binary correctly. Is it right? On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote: >
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
Thank You, I changed TA to EVEX or EVEX_4V. But now i am getting following error: Invalid prefix! UNREACHABLE executed at /lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647! On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote: > Not all instructions can use EVEX_4V. Move instructions in particular > cannot because they don't have 2 sources. >