thr3ads.net - search: "xi32"

Displaying 5 results from an estimated 5 matches for "xi32".

Did you mean: i32

[LLVMdev] where is F7 opcode for TEST instruction on X86?

2014 Apr 22

[LLVMdev] where is F7 opcode for TEST instruction on X86?

...{ let Defs = [EFLAGS] in { let isCommutable = 1 in { def TEST8rr : BinOpRR_F<0x84, "test", Xi8 , X86testpat, MRMSrcReg>; def TEST16rr : BinOpRR_F<0x84, "test", Xi16, X86testpat, MRMSrcReg>; def TEST32rr : BinOpRR_F<0x84, "test", Xi32, X86testpat, MRMSrcReg>; def TEST64rr : BinOpRR_F<0x84, "test", Xi64, X86testpat, MRMSrcReg>; } // isCommutable def TEST8rm : BinOpRM_F<0x84, "test", Xi8 , X86testpat>; def TEST16rm : BinOpRM_F<0x84, "test", Xi16, X86testpat&gt...

[LLVMdev] Missed optimization on array initialization

2012 Feb 25

[LLVMdev] Missed optimization on array initialization

...9; target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define void @_Z3fooi(i32 %a) uwtable { %ar =alloca [100 xi32],align 16 %1 =bitcast [100 xi32]* %arto i8* call void @llvm.memset.p0i8.i64(i8* %1,i8 0,i64 400,i32 16,i1 false) %2 =getelementptr inbounds [100 xi32]* %ar,i64 0,i64 0 store i32 %a,i32* %2,align 16, !tbaa !0 %3 =icmp eq i32 %a, 0 br i1 %3,label %4,label %5 ;...

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 16

[LLVMdev] Limit loop vectorizer to SSE

The vectorizer will now emit = load <8 x i32>, align #TargetAlignmentOfScalari32 where before it would emit = load <8 x i32> (which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction). When the backend generates code for the former it will emit an unaligned move: = vmovups ... wheres for the later it will use an aligned move: = vmovaps … vmovups...

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 16

[LLVMdev] Limit loop vectorizer to SSE

I confirm that r194876 fixes the issue, i.e. segfault not caused. My program still passed 16 byte aligned pointers to the function which the loop vectorizer processes successfully: LV: Vector loop of width 8 costs: 1. LV: Selecting VF = : 8. LV: Found a vectorizable loop (8) in func_orig.ll LV: Unroll Factor is 1 Since the program runs fine, it seems to be allowed for the CPU to issue a vector

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

A fix for this is in r194876. Thanks for reporting this! On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to

search for: xi32