Displaying 20 results from an estimated 900 matches similar to: "Vectorizing remainder loop"
2018 Aug 02
2
Vectorizing remainder loop
Hi Hameeza,
Aside from Ashutosh's patch.....
When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example.
VF=2048 // main vector loop
VF=1024 // vectorized remainder 1
VF=512 // vectorized remainder 2
...
Vectorize remainder until trip count is
2013 Dec 31
4
[LLVMdev] [Patch][RFC] Change R600 data layout
Hi,
I've prepared patches for both LLVM and Clang to change the
datalayout for R600. This may seem like a bold move, but I think it is
warranted. R600/SI is a strange architecture in that it uses 64bit
pointers but does not support 64 bit arithmetic except for load/store
operations that roughly map onto getelementptr.
The current datalayout for r600 includes n32:64, which is odd
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors
So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch.
I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though.
Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2013 Feb 07
1
[LLVMdev] alloca scalarization with dynamic indexing into vectors
Hi all,
I have a question regarding dynamic indexing into a vector with GEP. I see
that in the ScalarReplAggregates pass in the LLVM 3.2 release the call
SROA::isSafeGEP() will now allow alloca scalarization in the case where a
GEP index into a vector isn’t a constant. My question is: what is the
expected behavior when the index is out of bounds of the vector? Is it
undefined? I have an
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom,
Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
Thanks,
Nadav
On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
>
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi,
I've been playing around with the SLPVectorizer trying to get it to
vectorize this simple program:
define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
%0 = alloca [4 x i32]
%x = getelementptr [4 x i32]* %0, i32 0, i32 0
%y = getelementptr [4 x i32]* %0, i32 0, i32 1
%z = getelementptr [4 x i32]* %0, i32 0, i32 2
%w = getelementptr [4 x i32]* %0, i32 0, i32 3
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
I am able to vectorize it with the following code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[][N], int b[][N])
{
int i, j, k;
for (k = 0; k < N; k++) {
for (i = 1; i <= N-2; i++)
for (j = 1; j <= N-2; j++)
b[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
a[i][j+1]);
for
2013 Aug 10
2
[LLVMdev] Address space extension
> -----Original Message-----
> From: Michele Scandale [mailto:michele.scandale at gmail.com]
> Sent: Saturday, August 10, 2013 6:29 AM
> To: Micah Villmow
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Address space extension
>
> On 08/10/2013 02:47 PM, Micah Villmow wrote:
> > Michele,
> > The information you are trying to gather is fundamentally
2017 Jul 01
3
Jacobi 5 Point Stencil Code not Vectorizing
Does it happen due to loop carried dependence? if yes what is the solution
to vectorize such codes?
please reply. i m waiting.
On Jul 1, 2017 12:30 PM, "hameeza ahmed" <hahmed2305 at gmail.com> wrote:
> I even tried polly but still my llvm IR does not contain vector
> instructions. i used the following command;
>
> clang -S -emit-llvm stencil.c -march=knl -O3
2017 Oct 24
3
Jacobi 5 Point Stencil Code not Vectorizing
Your problem is due to GVN partial reduction elimination (PRE) which
introduces a PHI node the current loop vectorizer cannot handle:
opt -O3 stencil.ll -pass-remarks=loop-vectorize
-pass-remarks-missed=loop-vectorize
-pass-remarks-analysis=loop-vectorize
remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop
remark:
2017 Jul 01
2
Jacobi 5 Point Stencil Code not Vectorizing
Hello,
I am trying to vectorize following stencil code;
#include <stdio.h>
#define N 100351
// This function computes 2D-5 point Jacobi stencil
void stencil(int a[restrict][N])
{
int i, j, k;
for (k = 0; k < 100; k++)
{ for (i = 1; i <= N-2; i++)
{ for (j = 1; j <= N-2; j++)
{ a[i][j] = 0.25 * (a[i][j] + a[i-1][j] + a[i+1][j] + a[i][j-1] +
2017 Oct 23
3
Jacobi 5 Point Stencil Code not Vectorizing
<div> </div><div> </div><div>Hello,</div><div> </div><div>To me this is an issue in llvm loop vectorizer (if N is large enough to prevent complete unrolling of j-loop).</div><div> </div><div>Woud you mind to share stencil.ll than I would say more definitely what the issue
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said
as follows:
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
[(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather
2019 Apr 23
5
StringRef Iterator Variable Display
Hello,
I want to display the variable names in stringref iterator. But it is not
displayed using following code.
for (set<StringRef>::iterator sit = L.begin(); sit != L.end(); sit++) {
errs() << *sit << " ";
}
How to do this?
Please help..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2017 Aug 07
2
VBROADCAST Implementation Issues
Hello,
I did as you said,
Please tell me whether the following correct now??
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, _.KRCWM:$mask_wb),
(VR_2048:$src1, _.KRCWM:$mask, ins i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst}{${mask}}|${dst} {${mask}},
$src2}"),
[(set VR_2048:$dst, _.KRCWM:$mask_wb, (v64i32
(GatherNode
2017 Aug 06
2
VBROADCAST Implementation Issues
i want to implement gather for v64i32. i wrote following code.
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst), (ins
i2048mem:$src),
"GATHER_256B\t{$src, $dst|$dst, $src}",
[(set VR_2048:$dst, (v64i32 (masked_gather
addr:$src)))],
IIC_MOV_MEM>, TA;
def: Pat<(v64f32 (masked_gather addr:$src)), (GATHER_256B
2017 Jul 08
5
Error in v64i32 type in x86 backend
Thank You.
I have seen the opcode is 8 bits and all the combinations are already used
in llvm x86.
Now what to do?
On Sat, Jul 8, 2017 at 10:57 AM, Craig Topper <craig.topper at gmail.com>
wrote:
> Yes its an opcode conflict. You'll have to look through Intel documents
> and find an unused opcode. I've only added instructions based on a real
> spec so I don't know
2017 Jul 14
3
error:Ran out of lanemask bits to represent subregister
Do your 32768 registers also have sub registers?
I can't tell you exactly what to change. I'm not familiar with the code. I
would just be running grep or something.
~Craig
On Fri, Jul 14, 2017 at 10:23 AM, hameeza ahmed <hahmed2305 at gmail.com>
wrote:
> Thank you so much. I think there is no issue with my definitions since i
> have to use larger registers i.e 65536 bit
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
I was getting same error when i keep both EVEX/EVEX_4V and TA. So, i
restored my original instructions and for that i have to include
bool HasTA = TSFlags & X86II::TA; in x86MCCodeEmitter.cpp
then used this condition;
if(HasTA)
++SrcRegNum;
in order to emit binary correctly.
Is it right?
On Tue, Sep 5, 2017 at 5:45 AM, Craig Topper <craig.topper at gmail.com> wrote:
>
2017 Sep 05
2
Issues in Vector Add Instruction Machine Code Emission
Thank You,
I changed TA to EVEX or EVEX_4V. But now i am getting following error:
Invalid prefix!
UNREACHABLE executed at
/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:647!
On Tue, Sep 5, 2017 at 4:36 AM, Craig Topper <craig.topper at gmail.com> wrote:
> Not all instructions can use EVEX_4V. Move instructions in particular
> cannot because they don't have 2 sources.
>