search for: v512

Displaying 19 results from an estimated 19 matches for "v512".

Did you mean: 512
2013 Dec 31
4
[LLVMdev] [Patch][RFC] Change R600 data layout
Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd
2018 Jul 29
2
Vectorizing remainder loop
...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...
2013 Feb 07
1
[LLVMdev] alloca scalarization with dynamic indexing into vectors
...nction argument and could be set to any value. (scalar_repl_store_delete.ll): target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024--a64:64:64-f80:128:128-n8:16:32:64" define void @test_fn(<2 x i32>* %src, <2 x i32>* %results, i32 %alignmentOffsets) nounwind alwaysinline { entry: %sPrivateStorage = alloca [3 x <2 x i32>], align 8 %0 = load <2 x i32>* %src, align 8,...
2018 Jan 17
3
Does it make sense to upstream some MVT's?
...her hand having code that isn't used by current backends in tree isn't great. These are the MVT's that we have added: 16x16 element (2D SIMD) 1-bit predicate registers: v256i1 16x16 element (2D SIMD) 16-bit registers: v256i16 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of v400): v512i16 32-bit versions of the above 16-bit registers (to represent 32-bit accumulators for MAD instructions and also dual-issue "wide" instructions to the dual non-MAD ALU's in each lane) v256i32 v512i32 For those interested in more details about Pixel Visual Cor...
2018 Jan 17
0
Does it make sense to upstream some MVT's?
...g code that isn't used by current backends in tree isn't great. These are the MVT's that we have added: 16x16 element (2D SIMD) 1-bit predicate registers: v256i1 16x16 element (2D SIMD) 16-bit registers: v256i16 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of v400): v512i16 32-bit versions of the above 16-bit registers (to represent 32-bit accumulators for MAD instructions and also dual-issue "wide" instructions to the dual non-MAD ALU's in each lane) v256i32 v512i32 For those interested in more details about Pixel...
2018 Jan 17
1
Does it make sense to upstream some MVT's?
...> > These are the MVT's that we have added: > > > > 16x16 element (2D SIMD) 1-bit predicate registers: > > v256i1 > > > > 16x16 element (2D SIMD) 16-bit registers: > > v256i16 > > > > 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of > v400): > > v512i16 > > > > 32-bit versions of the above 16-bit registers (to represent 32-bit > accumulators for MAD instructions and also dual-issue "wide" instructions > to the dual non-MAD ALU's in each lane) > > v256i32 > > v512i...
2018 Aug 02
2
Vectorizing remainder loop
...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...
2018 Aug 03
2
Vectorizing remainder loop
...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >
2017 Mar 02
5
Structurizing multi-exit regions
...xits, but I don't think this helps any here. -Matt -------------- next part -------------- ; RUN: opt -S -structurizecfg -si-annotate-control-flow %s target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" target triple = "amdgcn-amd-amdhsa-opencl" ; Function Attrs: nounwind define amdgpu_kernel void @multi_divergent_region_exit(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 { entry: %tmp...
2013 Aug 11
0
[LLVMdev] Address space extension
...l_id(0); output[x] = input[x] + mask[get_local_id(0)]; } The IR for R600 now is: /// test.r600.ll /// target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-v2048:2048:2048-n32:64" target triple = "r600-none-none" ; Function Attrs: nounwind define void @convolve(i32 addrspace(1)* nocapture readonly %input, i32 addrspace(2)* nocapture readonly %mask, i32 addrspace(1)* nocapture %output) #0 { entry: %call = tail...
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3
2018 Jan 25
2
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > Hi Alexandre, > The script uses extended-sed syntax, so you need to run sed with the -E > option. > > For example, when preparing the patch I created a file ( script.sed ) > contain...
2018 Jan 25
3
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
.... I'm not sure which side is correct, > isn't it equivalent (that is, this is the natural ABI alignment of that > type)? > > Here is my datalayout: > > target datalayout = "e-m:e-i64:64-n8:16:32:64- > S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256- > v512:512-v1024:1024" > > > 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > >> Hi Alexandre, >> The script uses extended-sed syntax, so you need to run sed with the -E >> option. >> >> For example, when preparing the patch I creat...
2013 Aug 10
2
[LLVMdev] Address space extension
> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally
2018 Jan 25
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) co...
2018 Jan 25
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) co...
2018 Jan 24
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) containing all of the lines that I copied into the commit message. Then, I ran this bash one-liner from the test directory: for f in $(find . -name '*.ll'); do sed -E -i ‘.sedbak' -f script.sed $f; done When I was happy
2018 Jan 24
2
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Hello, Is there a script to update those test cases? I see mention of a sed script in the commit message but when I try it (see attached) on sed I get the following error: sed: file script line 2: invalid reference \3 on `s' command's RHS Did I lose something in a copy-paste? Is it not really a sed script? How do I run it? On Fri, Jan 19, 2018 at 9:15 AM, Daniel Neilson via