thr3ads.net - search: "v512"

[LLVMdev] [Patch][RFC] Change R600 data layout

2013 Dec 31

4

[LLVMdev] [Patch][RFC] Change R600 data layout

Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd

Vectorizing remainder loop

2018 Jul 29

2

Vectorizing remainder loop

...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...

[LLVMdev] alloca scalarization with dynamic indexing into vectors

2013 Feb 07

1

[LLVMdev] alloca scalarization with dynamic indexing into vectors

...nction argument and could be set to any value. (scalar_repl_store_delete.ll): target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024--a64:64:64-f80:128:128-n8:16:32:64" define void @test_fn(<2 x i32>* %src, <2 x i32>* %results, i32 %alignmentOffsets) nounwind alwaysinline { entry: %sPrivateStorage = alloca [3 x <2 x i32>], align 8 %0 = load <2 x i32>* %src, align 8,...

Does it make sense to upstream some MVT's?

2018 Jan 17

3

Does it make sense to upstream some MVT's?

...her hand having code that isn't used by current backends in tree isn't great. These are the MVT's that we have added: 16x16 element (2D SIMD) 1-bit predicate registers: v256i1 16x16 element (2D SIMD) 16-bit registers: v256i16 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of v400): v512i16 32-bit versions of the above 16-bit registers (to represent 32-bit accumulators for MAD instructions and also dual-issue "wide" instructions to the dual non-MAD ALU's in each lane) v256i32 v512i32 For those interested in more details about Pixel Visual Cor...

Does it make sense to upstream some MVT's?

2018 Jan 17

0

Does it make sense to upstream some MVT's?

...g code that isn't used by current backends in tree isn't great. These are the MVT's that we have added: 16x16 element (2D SIMD) 1-bit predicate registers: v256i1 16x16 element (2D SIMD) 16-bit registers: v256i16 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of v400): v512i16 32-bit versions of the above 16-bit registers (to represent 32-bit accumulators for MAD instructions and also dual-issue "wide" instructions to the dual non-MAD ALU's in each lane) v256i32 v512i32 For those interested in more details about Pixel...

Does it make sense to upstream some MVT's?

2018 Jan 17

1

Does it make sense to upstream some MVT's?

...> > These are the MVT's that we have added: > > > > 16x16 element (2D SIMD) 1-bit predicate registers: > > v256i1 > > > > 16x16 element (2D SIMD) 16-bit registers: > > v256i16 > > > > 20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of > v400): > > v512i16 > > > > 32-bit versions of the above 16-bit registers (to represent 32-bit > accumulators for MAD instructions and also dual-issue "wide" instructions > to the dual non-MAD ALU's in each lane) > > v256i32 > > v512i...

Vectorizing remainder loop

2018 Aug 02

2

Vectorizing remainder loop

...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...

Vectorizing remainder loop

2018 Aug 03

2

Vectorizing remainder loop

...ing on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable t...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

0

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >

Structurizing multi-exit regions

2017 Mar 02

5

Structurizing multi-exit regions

...xits, but I don't think this helps any here. -Matt -------------- next part -------------- ; RUN: opt -S -structurizecfg -si-annotate-control-flow %s target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" target triple = "amdgcn-amd-amdhsa-opencl" ; Function Attrs: nounwind define amdgpu_kernel void @multi_divergent_region_exit(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 { entry: %tmp...

[LLVMdev] Address space extension

2013 Aug 11

0

[LLVMdev] Address space extension

...l_id(0); output[x] = input[x] + mask[get_local_id(0)]; } The IR for R600 now is: /// test.r600.ll /// target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-v2048:2048:2048-n32:64" target triple = "r600-none-none" ; Function Attrs: nounwind define void @convolve(i32 addrspace(1)* nocapture readonly %input, i32 addrspace(2)* nocapture readonly %mask, i32 addrspace(1)* nocapture %output) #0 { entry: %call = tail...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

4

[LLVMdev] Vectorizing alloca instructions

Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > Hi Alexandre, > The script uses extended-sed syntax, so you need to run sed with the -E > option. > > For example, when preparing the patch I created a file ( script.sed ) > contain...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

3

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

.... I'm not sure which side is correct, > isn't it equivalent (that is, this is the natural ABI alignment of that > type)? > > Here is my datalayout: > > target datalayout = "e-m:e-i64:64-n8:16:32:64- > S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256- > v512:512-v1024:1024" > > > 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > >> Hi Alexandre, >> The script uses extended-sed syntax, so you need to run sed with the -E >> option. >> >> For example, when preparing the patch I creat...

[LLVMdev] Address space extension

2013 Aug 10

2

[LLVMdev] Address space extension

> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) co...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...tice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) co...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 24

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) containing all of the lines that I copied into the commit message. Then, I ran this bash one-liner from the test directory: for f in $(find . -name '*.ll'); do sed -E -i ‘.sedbak' -f script.sed $f; done When I was happy

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 24

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

Hello, Is there a script to update those test cases? I see mention of a sed script in the commit message but when I try it (see attached) on sed I get the following error: sed: file script line 2: invalid reference \3 on `s' command's RHS Did I lose something in a copy-paste? Is it not really a sed script? How do I run it? On Fri, Jan 19, 2018 at 9:15 AM, Daniel Neilson via

search for: v512