similar to: [LLVMdev] [Patch][RFC] Change R600 data layout

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] [Patch][RFC] Change R600 data layout"

2013 Feb 07
1
[LLVMdev] alloca scalarization with dynamic indexing into vectors
Hi all, I have a question regarding dynamic indexing into a vector with GEP. I see that in the ScalarReplAggregates pass in the LLVM 3.2 release the call SROA::isSafeGEP() will now allow alloca scalarization in the case where a GEP index into a vector isn’t a constant. My question is: what is the expected behavior when the index is out of bounds of the vector? Is it undefined? I have an
2013 Oct 24
0
[LLVMdev] Vectorizing alloca instructions
Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >
2018 Jul 29
2
Vectorizing remainder loop
Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has
2013 Oct 24
4
[LLVMdev] Vectorizing alloca instructions
Hi, I've been playing around with the SLPVectorizer trying to get it to vectorize this simple program: define void @vector(i32 addrspace(1)* %out, i32 %index) { entry: %0 = alloca [4 x i32] %x = getelementptr [4 x i32]* %0, i32 0, i32 0 %y = getelementptr [4 x i32]* %0, i32 0, i32 1 %z = getelementptr [4 x i32]* %0, i32 0, i32 2 %w = getelementptr [4 x i32]* %0, i32 0, i32 3
2013 Aug 10
2
[LLVMdev] Address space extension
> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally
2008 Jul 14
2
long data frame selection error
Hello, I am trying to select the following headers from a data frame but when I try and run the command it executes halfway through and give me an error at V188 and V359. Temp <- data.frame(V4, V5, V6, V7, V8, V9, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V20, V21, V22, V23, V24, V25, V26, V27, V28, V29, V30, V31, V32, V33, V34, V35, V36, V37, V38, V39, V40, V41, V42, V43, V44, V45,
2013 Dec 31
2
[LLVMdev] [PATCH] R600 - Fix zero extend of i1
Hi, When trying to compile a trivial opencl kernel such as: __kernel void if_eq(__global int * out, int arg0, int arg1){ out[0] = arg0==arg1?0:1; } Clang generates IR like: %1 = icmp eq i32 %arg0, %arg1 %. = zext i1 %1 to i32 This eventually crashes ISel on R600. Attached patch adds a selector so it will compile. Regards, Jon Pry jonpry at gmail.com -------------- next
2018 Aug 02
2
Vectorizing remainder loop
Hi Hameeza, Aside from Ashutosh's patch..... When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example. VF=2048 // main vector loop VF=1024 // vectorized remainder 1 VF=512 // vectorized remainder 2 ... Vectorize remainder until trip count is
2006 Feb 01
2
sort columns
Hi. I have a simple (I think) question My dataset have these variables: names(data) [1] "v1" "v2" "v3" "v4" "v5" "v6" "v7" "v8" "v9" "v10" "v11" "v12" "v13" "v14" "v15" "v16" "v17"
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2018 Jan 25
2
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Thanks, that worked like a charm except for the following: llvm generate: call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* align 1 bitcast ([512 x float] addrspace(3)* @a_scratchpad to i8 addrspace(3)*), i8 addrspace(1)* align 1 %0, i64 2048, i1 false) And we expected: call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8
2018 Jan 25
3
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Yes, all that is correct. My question is more a long term question: why do the .ll printer specify the alignment if it is equivalent to the default one? That is, it seems the sed script expect the printer to not specify it (this would match the load/store behavior), but the ll-printer does specify it, which either means the printer is not ideal on this case and I should fix it, or in this case
2018 Jan 25
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Good question. AFAIK, the IR-printer doesn’t understand the semantics of parameter attributes. In this case, it only knows that there is an attribute on the parameter that is integer valued (with value 1) and that has the name “align”, so it prints it out. If we don’t want it printing out ‘align 1’ then it’s up to us to not set the alignment parameter attribute to a value if that value would be 1.
2013 Aug 11
0
[LLVMdev] Address space extension
Hello Micah, I first apologize for the mail length, but I think that using an example would be better to clarify the case and the objections. > [Micah Villmow] In the case of OpenCL, you can't correctly use the standard C calling convention and still be OpenCL compliant, the C calling convention is too permissive. The second you use OpenCL, you are using an OpenCL specific calling
2015 Aug 22
3
sprintf error: "only 100 arguments allowed"
I'm trying to apply a function defined in the VW R docs, that attemps to convert a data.table object to Vowpal Wabbit format. In the process i'm getting the error in printf mentioned in the subject. The original function is here: https://github.com/JohnLangford/vowpal_wabbit/blob/master/R/dt2vw.R Below there is a small example that reproduces the error. The function works great with
2018 Jan 25
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Hi Alexandre, Before the change you would have been expecting one of the following, correct? a) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i32 0, i1 false) b) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8
2015 Aug 26
1
sprintf error: "only 100 arguments allowed"
Wouldn't it make sense to have this in the man page? The 8192-byte limitation for 'fmt' is mentioned but not this one. Thanks, H. On 08/25/2015 02:08 AM, Prof Brian Ripley wrote: > From the sources: > > #define MAXNARGS 100 > /* ^^^ not entirely arbitrary, but strongly linked to > allowing %$1 to %$99 !*/ > > > > On 22/08/2015 04:21, Martin
2017 Mar 02
5
Structurizing multi-exit regions
Hi, I'm trying to solve a problem from StructurizeCFG not actually handling regions with multiple exits. Sample IR attached. StructurizeCFG doesn't touch this function, exiting early on the isTopLevelRegion check. SIAnnotateControlFlow then gets confused and ends up inserting an if into one of the blocks, and the matching end.cf into one of the return/unreachable blocks. The input to
2018 Jan 24
0
[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Hi Alexandre, The script uses extended-sed syntax, so you need to run sed with the -E option. For example, when preparing the patch I created a file ( script.sed ) containing all of the lines that I copied into the commit message. Then, I ran this bash one-liner from the test directory: for f in $(find . -name '*.ll'); do sed -E -i ‘.sedbak' -f script.sed $f; done When I was happy
2014 Jan 02
2
[LLVMdev] [PATCH] R600 - Fix zero extend of i1
> This patch looks good, but you need to add a test case. You can add it > to the file test/CodeGen/R600/zero_extend.ll Version 2 of patch attached which includes test case. -Jon -------------- next part --------------