thr3ads.net - search: "aptr"

2019 Nov 28

2

Question on TBAA and optimization

TBAA Question. Please consider the following test case. ---Snip-- struct B { int b1; int b2; }; struct C { int b1; }; struct A { int a1; struct C SC; int a2; }; int foo1(struct A * Aptr, struct B* Bptr) { int *a = &Aptr->SC.b1; *a=10; Bptr->b1 = 11; return *a; } int foo2(struct A * Aptr, struct B* Bptr) { Aptr->SC.b1=10; Bptr->b1 = 11; return Aptr->SC.b1; } ---Snip-- The structure pointers "Aptr" and "Bptr" will not ali...

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

2014 Oct 03

2

[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

...,9 @@ define void @s_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > > ; FUNC-LABEL: @v_uaddo_i32 > ; SI: V_ADD_I32 > + > +; EG: ADDC_UINT > +; EG: ADD_INT > define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { > %a = load i32 addrspace(1)* %aptr, align 4 > %b = load i32 addrspace(1)* %bptr, align 4 > @@ -45,6 +54,9 @@ define void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 > ; FUNC-LABEL: @s_uaddo_i64 > ; SI: S_ADD_U32 &...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

3

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...neilson at azul.com> wrote: > Hi Alexandre, > Before the change you would have been expecting one of the following, > correct? > a) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x > float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* > [[APTR]], i64 2048, i32 0, i1 false) > b) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x > float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* > [[APTR]], i64 2048, i32 1, i1 false) > > Functionally, (a) & (b) are both saying that the src & d...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

Hi Alexandre, Before the change you would have been expecting one of the following, correct? a) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i32 0, i1 false) b) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i32 1, i1 false) Functionally, (a) & (b) are both saying that the src & dest pointers are 1-byte...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

0

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...son at azul.com<mailto:dneilson at azul.com>> wrote: Hi Alexandre, Before the change you would have been expecting one of the following, correct? a) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i32 0, i1 false) b) call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i32 1, i1 false) Functionally, (a) & (b) are both saying that the src & dest pointers are 1-byte...

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

2017 Jan 28

3

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

...h additional examples. #pragma omp parallel for(int i = 0; i < n; ++i) { A[i] = C[i]; } preheader: br label %header header: %i = phi [ i32 0, %preheader ], [ %inc, %latch ] %done = icmp ge %i, %n br i1 %done, label %exit, label %body body: fork label %task, label %latch task: %aptr = getelementptr i32, i32* %A, i32 0, i32 %i %aval = load i32* %aptr %cptr = getelementptr i32, i32* %C, i32 0, i32 %i store i32 %aval, i32* %aptr halt label %latch latch: %inc = add i32, i32 %i, i32 1 br label %header exit: join label %afterloop afterloop: ... (2) Reasoning: The...

(no subject)

2017 Mar 08

5

(no subject)

...label %header > > > > header: > > %i = phi [ i32 0, %preheader ], [ %inc, %latch ] > > %done = icmp ge %i, %n > > br i1 %done, label %exit, label %body > > > > body: > > fork label %task, label %latch > > > > task: > > %aptr = getelementptr i32, i32* %A, i32 0, i32 %i > > %aval = load i32* %aptr > > %cptr = getelementptr i32, i32* %C, i32 0, i32 %i > > store i32 %aval, i32* %aptr > > halt label %latch > > > > latch: > > %inc = add i32, i32 %i, i32 1 > > br la...

LV: predication

2020 May 18

2

LV: predication

...l, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations: vector.preheader: %evl = i32 llvm.hwloop.set.elements(%n) vector.body: %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body] %aval = call @llvm.vp.load(Aptr, .., %evl) call @llvm.vp.store(Bptr, %aval, ..., %evl) %next.evl = call i32 @llvm.hwloop.decrement(%evl) Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size"...

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

2018 Jan 25

2

[PATCH] D41675: Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)

...space(3)* align 1 bitcast ([512 x float] addrspace(3)* @a_scratchpad to i8 addrspace(3)*), i8 addrspace(1)* align 1 %0, i64 2048, i1 false) And we expected: call void @llvm.memcpy.p3i8.p1i8.i64(i8 addrspace(3)* bitcast ([512 x float] addrspace(3)* [[SPM0]] to i8 addrspace(3)*), i8 addrspace(1)* [[APTR]], i64 2048, i1 false) Notice the presence of "align 1". I'm not sure which side is correct, isn't it equivalent (that is, this is the natural ABI alignment of that type)? Here is my datalayout: target datalayout = "e-m:e-i64:64-n8:16:32:64-S128-v16:16-v24:32-v32:32-v48:64...

LV: predication

2020 May 19

3

LV: predication

...lar loop trip count My previous mail had an example on how %evl could be tied to the scalar trip count. Re-posting that here: vector.preheader: %init.evl = i32 llvm.hwloop.set.elements(%n) vector.body: %evl = phi 32 [%init.evl, %preheader, %next.evl, vector.body] %aval = call @llvm.vp.load(Aptr, .., %evl) call @llvm.vp.store(Bptr, %aval, ..., %evl) %next.evl = call i32 @llvm.hwloop.decrement(%evl) - Simon Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM><mailto:Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 14:11 To: Sjoerd Meijer &...

(no subject)

2017 Mar 08

3

(no subject)

...;> header: >>> %i = phi [ i32 0, %preheader ], [ %inc, %latch ] >>> %done = icmp ge %i, %n >>> br i1 %done, label %exit, label %body >>> >>> body: >>> fork label %task, label %latch >>> >>> task: >>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i >>> %aval = load i32* %aptr >>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>> store i32 %aval, i32* %aptr >>> halt label %latch >>> >>> latch: >>> %inc = add i32, i32 %i...

LV: predication

2020 May 19

2

LV: predication

...here's no difference in your example whether all instructions consume some predicate or only masked loads/stores: vector.preheader: %init.evl = i32 llvm.hwloop.set.elements(%n) vector.body: %evl = phi 32 [%init.evl, %preheader, %next.evl, vector.body] %aval = call @llvm.vp.load(Aptr, .., %evl) call @llvm.vp.store(Bptr, %aval, ..., %evl) %next.evl = call i32 @llvm.hwloop.decrement(%evl) No difference in that the problem remains that we have a random intrinsic sitting in the preheader describing a loop property that needs to be maintained. So, eliminating hardware loop...

(no subject)

2017 Mar 08

3

(no subject)

...gt;> %i = phi [ i32 0, %preheader ], [ %inc, %latch ] >>>> %done = icmp ge %i, %n >>>> br i1 %done, label %exit, label %body >>>> >>>> body: >>>> fork label %task, label %latch >>>> >>>> task: >>>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i >>>> %aval = load i32* %aptr >>>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>>> store i32 %aval, i32* %aptr >>>> halt label %latch >>>> >>>> latch: >>>> %inc = a...

LV: predication

2020 May 18

2

LV: predication

...l, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations: vector.preheader: %evl = i32 llvm.hwloop.set.elements(%n) vector.body: %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body] %aval = call @llvm.vp.load(Aptr, .., %evl) call @llvm.vp.store(Bptr, %aval, ..., %evl) %next.evl = call i32 @llvm.hwloop.decrement(%evl) Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size"...

(no subject)

2017 Mar 08

4

(no subject)

...i32 0, %preheader ], [ %inc, %latch ] > >>> %done = icmp ge %i, %n > >>> br i1 %done, label %exit, label %body > >>> > >>> body: > >>> fork label %task, label %latch > >>> > >>> task: > >>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i > >>> %aval = load i32* %aptr > >>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i > >>> store i32 %aval, i32* %aptr > >>> halt label %latch > >>> > >>> latch: > &gt...

(no subject)

2017 Mar 08

2

(no subject)

...gt; %done = icmp ge %i, %n >>>>>> br i1 %done, label %exit, label %body >>>>>> >>>>>> body: >>>>>> fork label %task, label %latch >>>>>> >>>>>> task: >>>>>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i >>>>>> %aval = load i32* %aptr >>>>>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>>>>> store i32 %aval, i32* %aptr >>>>>> halt label %latch >>>>&gt...

(no subject)

2017 Mar 08

2

(no subject)

...gt; > >>> %done = icmp ge %i, %n > > >>> br i1 %done, label %exit, label %body > > >>> > > >>> body: > > >>> fork label %task, label %latch > > >>> > > >>> task: > > >>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i > > >>> %aval = load i32* %aptr > > >>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i > > >>> store i32 %aval, i32* %aptr > > >>> halt label %latch > > >>> > &gt...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

3

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...gt; %done = icmp ge %i, %n >>>>>> br i1 %done, label %exit, label %body >>>>>> >>>>>> body: >>>>>> fork label %task, label %latch >>>>>> >>>>>> task: >>>>>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i >>>>>> %aval = load i32* %aptr >>>>>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>>>>> store i32 %aval, i32* %aptr >>>>>> halt label %latch >>>>&gt...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

2

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...;> br i1 %done, label %exit, label %body >>>>>>>> >>>>>>>> body: >>>>>>>> fork label %task, label %latch >>>>>>>> >>>>>>>> task: >>>>>>>> %aptr = getelementptr i32, i32* %A, i32 0, i32 %i >>>>>>>> %aval = load i32* %aptr >>>>>>>> %cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>>>>>>> store i32 %aval, i32* %aptr >>>>>>>> halt...

[LLVMdev] Work in progress patch to speed up andersen's implementation

2007 Apr 25

2

[LLVMdev] Work in progress patch to speed up andersen's implementation

...stack_int_grow(OBSTACK,datum) \ __extension__ \ ({ struct obstack *__o = (OBSTACK); \ if (__o->next_free + sizeof (int) > __o->chunk_limit) \ _obstack_newchunk (__o, sizeof (int)); \ obstack_int_grow_fast (__o, datum); }) # define obstack_ptr_grow_fast(OBSTACK,aptr) \ __extension__ \ ({ struct obstack *__o1 = (OBSTACK); \ *(const void **) __o1->next_free = (aptr); \ __o1->next_free += sizeof (const void *); \ (void) 0; }) # define obstack_int_grow_fast(OBSTACK,aint) \ __extension__ \ ({ struct obstack *__o1 = (OBS...

search for: aptr