thr3ads.net - similar to: "Vectorizing remainder loop"

Displaying 20 results from an estimated 900 matches similar to: "Vectorizing remainder loop"

2018 Aug 02

Vectorizing remainder loop

Hi Hameeza, Aside from Ashutosh's patch..... When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example. VF=2048 // main vector loop VF=1024 // vectorized remainder 1 VF=512 // vectorized remainder 2 ... Vectorize remainder until trip count is

[LLVMdev] [Patch][RFC] Change R600 data layout

2013 Dec 31

[LLVMdev] [Patch][RFC] Change R600 data layout

Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd

Vectorizing remainder loop

2018 Aug 03

Vectorizing remainder loop

>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s

[LLVMdev] alloca scalarization with dynamic indexing into vectors

2013 Feb 07

[LLVMdev] alloca scalarization with dynamic indexing into vectors

Hi all, I have a question regarding dynamic indexing into a vector with GEP. I see that in the ScalarReplAggregates pass in the LLVM 3.2 release the call SROA::isSafeGEP() will now allow alloca scalarization in the case where a GEP index into a vector isn’t a constant. My question is: what is the expected behavior when the index is out of bounds of the vector? Is it undefined? I have an

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

I am able to vectorize it with the following code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[][N], int b[][N]) { int i, j, k; for (k = 0; k < N; k++) { for (i = 1; i <= N-2; i++) for (j = 1; j <= N-2; j++) b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] + a[i][j+1]); for

[LLVMdev] Address space extension

2013 Aug 10

[LLVMdev] Address space extension

> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

Does it happen due to loop carried dependence? if yes what is the solution to vectorize such codes? please reply. i m waiting. On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote: > I even tried polly but still my llvm IR does not contain vector > instructions. i used the following command; > > clang -S -emit-llvm stencil.c -march=knl -O3

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 24

Jacobi 5 Point Stencil Code not Vectorizing

Your problem is due to GVN partial reduction elimination (PRE) which introduces a PHI node the current loop vectorizer cannot handle: opt -O3 stencil.ll -pass-remarks=loop-vectorize -pass-remarks-missed=loop-vectorize -pass-remarks-analysis=loop-vectorize remark: <unknown>:0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop remark:

Jacobi 5 Point Stencil Code not Vectorizing

2017 Jul 01

Jacobi 5 Point Stencil Code not Vectorizing

Hello, I am trying to vectorize following stencil code; #include <stdio.h> #define N 100351 // This function computes 2D-5 point Jacobi stencil void stencil(int a[restrict][N]) { int i, j, k; for (k = 0; k < 100; k++) { for (i = 1; i <= N-2; i++) { for (j = 1; j <= N-2; j++) { a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +

Jacobi 5 Point Stencil Code not Vectorizing

2017 Oct 23

Jacobi 5 Point Stencil Code not Vectorizing

<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather

StringRef Iterator Variable Display

2019 Apr 23

StringRef Iterator Variable Display

Hello, I want to display the variable names in stringref iterator. But it is not displayed using following code. for (set<StringRef>::iterator sit = L.begin(); sit != L.end(); sit++) { errs() << *sit << " "; } How to do this? Please help.. -------------- next part -------------- An HTML attachment was scrubbed... URL:

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Hello, I did as you said, Please tell me whether the following correct now?? def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb), (VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2), "GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}}, $src2}"), [(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32 (GatherNode

VBROADCAST Implementation Issues

2017 Aug 06

VBROADCAST Implementation Issues

i want to implement gather for v64i32. i wrote following code. def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins i2048mem:$src), "GATHER_256B\t{$src, $dst|$dst, $src}", [(set VR_2048:$dst, (v64i32 (masked_gather addr:$src)))], IIC_MOV_MEM>, TA; def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B

Error in v64i32 type in x86 backend

2017 Jul 08

Error in v64i32 type in x86 backend

Thank You. I have seen the opcode is 8 bits and all the combinations are already used in llvm x86. Now what to do? On Sat, Jul 8, 2017 at 10:57 AM, Craig Topper <craig.topper at gmail.com> wrote: > Yes its an opcode conflict. You'll have to look through Intel documents > and find an unused opcode. I've only added instructions based on a real > spec so I don't know

error:Ran out of lanemask bits to represent subregister

2017 Jul 14

error:Ran out of lanemask bits to represent subregister

Do your 32768 registers also have sub registers? I can't tell you exactly what to change. I'm not familiar with the code. I would just be running grep or something. ~Craig On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com> wrote: > Thank you so much. I think there is no issue with my definitions since i > have to use larger registers i.e 65536 bit

Issues in Vector Add Instruction Machine Code Emission

2017 Sep 05

Issues in Vector Add Instruction Machine Code Emission

I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i restored my original instructions and for that i have to include bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp then used this condition; if(HasTA) ++SrcRegNum; in order to emit binary correctly. Is it right? On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote: >

Issues in Vector Add Instruction Machine Code Emission

2017 Sep 05

Issues in Vector Add Instruction Machine Code Emission

Thank You, I changed TA to EVEX or EVEX_4V. But now i am getting following error: Invalid prefix! UNREACHABLE executed at /lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647! On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote: > Not all instructions can use EVEX_4V. Move instructions in particular > cannot because they don't have 2 sources. >

similar to: Vectorizing remainder loop