thr3ads.net - search: "a

Displaying 13 results from an estimated 13 matches for "a_ptr".

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...he inline cost model, but I'm unsure how to proceed. Let me begin by first describing the problem I'm trying to solve. Consider the following pseudo C code: *typedef struct element { unsigned idx; } element_t; * *static inline unsigned char fn2 (element_t *dst_ptr, const element_t *a_ptr, const element_t *b_ptr, unsigned char changed) { if (a_ptr && b_ptr && a_ptr->idx == b_ptr->idx) { if (!changed && dst_ptr && dst_ptr->idx == a_ptr->idx) { /* Do something */ } else { changed = 1; if...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...2 and only one of them has the property that the 1st and 2nd argument are the same (as is shown in my pseudo code). Internally, we have another developer, Matt Simpson, working on a function specialization patch that might be of value here. Specifically, you could clone fn2 based on the fact that a_ptr == dst_ptr and then simplify a great deal of the function. However, that patch is still a WIP. GCC will do IPA-CP/const-prop with cloning, and i'm wildly curious if new GCC's catch this case for you at higher optimization levels? GCC does inline fn2 into fn1 in this particular case, but...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...and only one of them has the property that the 1st and 2nd argument are the same (as is shown in my pseudo code). Internally, we have another developer, Matt Simpson, working on a function specialization patch that might be of value here. Specifically, you could clone fn2 based on the fact that a_ptr == dst_ptr and then simplify a great deal of the function. However, that patch is still a WIP. > GCC will do IPA-CP/const-prop with cloning, and i'm wildly curious if > new GCC's catch this case for you at higher optimization levels? GCC does inline fn2 into fn1 in this particula...

[LLVMdev] Pass a struct on windows

2011 Aug 11

[LLVMdev] Pass a struct on windows

...ddr, i8*, i64) nounwind { entry: %agg_addr.0 = getelementptr inbounds %0* %agg_addr, i32 0, i32 0 store i8* %0, i8** %agg_addr.0, align 8 %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 store i64 %1, i64* %agg_addr.1, align 8 ret void } define i64 @call_by_pointer() { %a_ptr = alloca float, align 4 %a_opq = bitcast float* %a_ptr to i8* %agg_addr = alloca %0, align 8 call void @foo_by_pointer(%0* %agg_addr, i8* %a_opq, i64 4) %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 %result.1 = load i64* %agg_addr.1, align 8 ret i64 %result.1 } (2) P...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...and 2nd argument are the same (as is shown in my > pseudo code). Internally, we have another developer, Matt > Simpson, working on a function specialization patch that might > be of value here. Specifically, you could clone fn2 based on > the fact that a_ptr == dst_ptr and then simplify a great deal > of the function. However, that patch is still a WIP. > > > GCC will do IPA-CP/const-prop with cloning, and i'm wildly > curious if new GCC's catch this case for you at higher > optimiz...

[LLVMdev] Pass a struct on windows

2011 Aug 11

[LLVMdev] Pass a struct on windows

....0 = getelementptr inbounds %0* %agg_addr, i32 0, i32 0 > store i8* %0, i8** %agg_addr.0, align 8 > %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 > store i64 %1, i64* %agg_addr.1, align 8 > ret void > } > > define i64 @call_by_pointer() { > %a_ptr = alloca float, align 4 > %a_opq = bitcast float* %a_ptr to i8* > %agg_addr = alloca %0, align 8 > call void @foo_by_pointer(%0* %agg_addr, i8* %a_opq, i64 4) > %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 > %result.1 = load i64* %agg_addr.1, align...

[LLVMdev] Pass a struct on windows

2011 Aug 11

[LLVMdev] Pass a struct on windows

...r, i32 0, i32 0 > > store i8* %0, i8** %agg_addr.0, align 8 > > %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 > > store i64 %1, i64* %agg_addr.1, align 8 > > ret void > > } > > > > define i64 @call_by_pointer() { > > %a_ptr = alloca float, align 4 > > %a_opq = bitcast float* %a_ptr to i8* > > %agg_addr = alloca %0, align 8 > > call void @foo_by_pointer(%0* %agg_addr, i8* %a_opq, i64 4) > > %agg_addr.1 = getelementptr inbounds %0* %agg_addr, i32 0, i32 1 > > %result.1 = load...

[LLVMdev] structured types as function arguments

2011 Aug 15

[LLVMdev] structured types as function arguments

Hi, When calling a function, does the llvm code generator support passing structured types (arrays, structs, etc.) by _value_? I wrote some small examples, and it seemed to work, but I was wondering if anything can go wrong if the structured types are very large... Thanks, N

Use GPU in R with .Call

2012 Jul 21

Use GPU in R with .Call

...*************************************/ /* "VecAdd_cuda.c" adds two double vectors using GPU. */ /************************************************/ extern void vecAdd_kernel(double *ain,double *bin,double *cout,int len); SEXP VecAdd_cuda(SEXP a,SEXP b) { int len; double *a_ptr,*b_ptr,*resout_ptr; /*Digest R objects*/ len=length(a); a_ptr=REAL(a); b_ptr=REAL(b); SEXP resout; PROTECT(resout=allocVector(REALSXP,len)); resout_ptr=REAL(resout); vecAdd_kernel(a_ptr,b_ptr,resout_ptr,len); UNPROTECT(1); return resout; } (b) Next, the host function and the k...

[LLVMdev] llc -O# / opt -O# differences

2012 Jun 30

[LLVMdev] llc -O# / opt -O# differences

...different. Here's a silly unoptimized bit of code which I'm generating from my LLVM-backed program ; ModuleID = 'foo' %Coord = type { double, double, double } define double @foo(%Coord*, %Coord*) nounwind uwtable ssp { entry: %dx_ptr = alloca double %b_ptr = alloca %Coord* %a_ptr = alloca %Coord* store %Coord* %0, %Coord** %a_ptr store %Coord* %1, %Coord** %b_ptr %a = load %Coord** %a_ptr %addr = getelementptr %Coord* %a, i64 0 %2 = getelementptr inbounds %Coord* %addr, i32 0, i32 0 %3 = load double* %2 %b = load %Coord** %b_ptr %addr1 = getelementptr %Coord...

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

...rizontal reduction i mention above is not needed if you have no use inside the loop like in the case of: r=0 for (i = …) { r += a[i]; } return r; This is simply (i am leaving out the induction variable for “i”): %r_red = phi <2 x i32> [preheader, <%r, 0>] , [loop, %r_red_next] %a_ptr = gep %a, %i %val_of_a_sub_i = load <2 x i32> * %a_ptr %r_red_next = add %r_red, %val_of_a_sub_i Outside of the loop we reduce the vectorized reduction to the final value of “r” %r= horizontal_add %r_red_next ret %r > In my example, I'm aggregating a computation in an array...

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 23

[LLVMdev] First attempt at recognizing pointer reduction

On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > In the examples you gave there are no reduction variables in the loop > vectorizer’s sense. But, they all have memory accesses that are strided. > This is what I don't get. As far as I understood, a reduction variable is the one that aggregates the computation done by the loop, and is used

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

2014 Oct 03

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

...AG: SUB_INT > -; EG-DAG: ADD_INT > +; EG-DAG: SUB_INT {{[* ]*}}[[HI]] > +; EG-NOT: SUB > define void @v_sub_i64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind { > %tid = call i32 @llvm.r600.read.tidig.x() readnone > %a_ptr = getelementptr i64 addrspace(1)* %inA, i32 %tid > diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll > index 0b854b5..ce30bbc 100644 > --- a/test/CodeGen/R600/uaddo.ll > +++ b/test/CodeGen/R600/uaddo.ll > @@ -1,5 +1,5 @@ > ; RUN: llc -march=r600 -mcpu=SI -verif...

search for: a_ptr