thr3ads.net - search: "v256"

[LLVMdev] [Patch][RFC] Change R600 data layout

2013 Dec 31

4

[LLVMdev] [Patch][RFC] Change R600 data layout

Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd

Vectorizing remainder loop

2018 Jul 29

2

Vectorizing remainder loop

...a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable to figu...

[LLVMdev] alloca scalarization with dynamic indexing into vectors

2013 Feb 07

1

[LLVMdev] alloca scalarization with dynamic indexing into vectors

...ector is a function argument and could be set to any value. (scalar_repl_store_delete.ll): target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024--a64:64:64-f80:128:128-n8:16:32:64" define void @test_fn(<2 x i32>* %src, <2 x i32>* %results, i32 %alignmentOffsets) nounwind alwaysinline { entry: %sPrivateStorage = alloca [3 x <2 x i32>], align 8 %0 = load <2 x i32>* %s...

Limit of matrix + naming

2006 Feb 21

2

Limit of matrix + naming

...ssage such as Error in "colnames<-"(`*tmp*`, value = list(c(2, 3, 4, 5, 6, 7, 8, 9, : length of 'dimnames' [2] not equal to array extent and I don't know why. But, if I look at the original names in terms of V1, V2, the rownames are repeated again after V256. Is there a limitation? Thanks. Yen Lin [[alternative HTML version deleted]]

long data frame selection error

2008 Jul 14

2

long data frame selection error

...V206, V207, V208, V209, V210, V211, V212, V213, V214, V215, V216, V217, V218, V219, V220, V221, V222, V223, V224, V225, V226, V227, V228, V229, V230, V231, V232, V233, V234, V235, V236, V237, V238, V239, V240, V241, V242, V243, V244, V245, V246, V247, V248, V249, V250, V251, V252, V253, V254, V255, V256, V257, V258, V259, V260, V261, V262, V263, V264, V265, V266, V267, V268, V269, V270, V271, V272, V273, V274, V275, V276, V277, V278, V279, V280, V281, V282, V283, V284, V285, V286, V287, V288, V289, V290, V291, V292, V293, V294, V295, V296, V297, V298, V299, V300, V301, V302, V303, V304, V305, V306...

Vectorizing remainder loop

2018 Aug 02

2

Vectorizing remainder loop

...a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable to figu...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

2

IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal integer types, but I'm not sure how that would apply to vector types. Ie, let's say v256 is a legal type in your example. DataLayout doesn't appear to specify what configurations of a 256-bit vector are legal, so I don't think we can currently use that to say v2i128 should be treated differently than v16i16. Is this a valid argument to not canonicalize the IR? On Mon, Jan 16,...

ffmpeg2theora bugs ?

2005 Sep 02

4

ffmpeg2theora bugs ?

...about an error was found; anyway the video output is generated correctly. Using ffmpeg2theora instead, the coding progess goes in loop and we have to stop the process. As you can see below the time [1.21.34] is probably the same in seconds [81.4] where ffmpeg found the error. F:\th15 -x320 -y256 -V256 -A45 -o vogg.ogg vi1.avi Input #0, avi, from 'vi1.avi': Duration: 00:08:04.8, start: 0.000000, bitrate: 30345 kb/s Stream #0.0: Video: dvvideo, yuv420p, 720x576, 25.00 fps Stream #0.1: Audio: pcm_s16le, 48000 Hz, stereo, 1536 kb/s Resize: 720x576 => 320x256 Resample: 48000Hz =&...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 21

2

IR canonicalization: shufflevector or vector trunc?

...lvm.org> > *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector > trunc? > > > > We use InstCombiner::ShouldChangeType() to prevent transforms to illegal > integer types, but I'm not sure how that would apply to vector types. > > Ie, let's say v256 is a legal type in your example. DataLayout doesn't > appear to specify what configurations of a 256-bit vector are legal, so I > don't think we can currently use that to say v2i128 should be treated > differently than v16i16. > > Is this a valid argument to not canonicalize...

Vectorizing remainder loop

2018 Aug 03

2

Vectorizing remainder loop

...a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can't be ignored with large vector width. This is very important and significant to solve this issue. Please help. I m trying to see loopvectorizer.cpp but unable to figu...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

0

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >

Structurizing multi-exit regions

2017 Mar 02

5

Structurizing multi-exit regions

...ultiple exits, but I don't think this helps any here. -Matt -------------- next part -------------- ; RUN: opt -S -structurizecfg -si-annotate-control-flow %s target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" target triple = "amdgcn-amd-amdhsa-opencl" ; Function Attrs: nounwind define amdgpu_kernel void @multi_divergent_region_exit(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 { entry...

IR canonicalization: shufflevector or vector trunc?

2017 Jan 13

2

IR canonicalization: shufflevector or vector trunc?

Right - I think that case looks like this for little endian: define <2 x i32> @zextshuffle(<2 x i16> %x) { %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16> zeroinitializer, <4 x i32> <i32 0, i32 2, i32 1, i32 2> %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32> ret <2 x i32> %bc } define <2 x i32> @zextvec(<2 x i16>

[LLVMdev] Address space extension

2013 Aug 11

0

[LLVMdev] Address space extension

...x = get_global_id(0); output[x] = input[x] + mask[get_local_id(0)]; } The IR for R600 now is: /// test.r600.ll /// target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-v2048:2048:2048-n32:64" target triple = "r600-none-none" ; Function Attrs: nounwind define void @convolve(i32 addrspace(1)* nocapture readonly %input, i32 addrspace(2)* nocapture readonly %mask, i32 addrspace(1)* nocapture %output) #0 { entry:...

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

4

[LLVMdev] Vectorizing alloca instructions

Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...alse) Notice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > Hi Alexandre, > The script uses extended-sed syntax, so you need to run sed with the -E > option. > > For example, when preparing the patch I created a file ( script.sed ) &gt...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

3

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...t;align 1". I'm not sure which side is correct, > isn't it equivalent (that is, this is the natural ABI alignment of that > type)? > > Here is my datalayout: > > target datalayout = "e-m:e-i64:64-n8:16:32:64- > S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256- > v512:512-v1024:1024" > > > 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com>: > >> Hi Alexandre, >> The script uses extended-sed syntax, so you need to run sed with the -E >> option. >> >> For example, when preparing th...

[LLVMdev] Address space extension

2013 Aug 10

2

[LLVMdev] Address space extension

> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...alse) Notice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...alse) Notice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024" 2018-01-23 20:14 GMT-08:00 Daniel Neilson <dneilson at azul.com<mailto:dneilson at azul.com>>: Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script...

search for: v256