similar to: [LLVMdev] Simple Loop Vectorize Question

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Simple Loop Vectorize Question"

2013 May 09
0
[LLVMdev] Simple Loop Vectorize Question
Hi Josh, Your modules does not have a triple, so the target machine and TargetTransformInfo have no way of knowing if you are running on a machine with vector registers. Try adding the '-mcpu=XXXX' to opt and see what happens. Thanks, Nadav On May 9, 2013, at 1:42 PM, Josh Klontz <josh.klontz at gmail.com> wrote: > Hi! I am trying to get the loop vectorizer to work on a
2013 May 10
2
[LLVMdev] Simple Loop Vectorize Question
Nadav, Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S -debug double.ll' doesn't appear to make a difference. In fact it seems to be ignored as garbage values for -mcpu don't raise an error. Am I overlooking something else also? Many Thanks, Josh On Thu, May 9, 2013 at 6:06 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Josh, > > Your
2013 May 10
0
[LLVMdev] Simple Loop Vectorize Question
Hi Josh, This line works for me: opt file.ll -loop-vectorize -S -o - -mtriple=x86_64 -mcpu=corei7-avx -debug You need to specify the triple on the command line if it is not inside the module. Thanks, Nadav On May 9, 2013, at 5:53 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S
2013 Nov 15
6
[LLVMdev] Limit loop vectorizer to SSE
On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: > Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! > > I think
2013 Nov 15
4
[LLVMdev] Limit loop vectorizer to SSE
Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]
2020 Jun 09
2
LoopStrengthReduction generates false code
Hi. In my backend I get false code after using StrengthLoopReduction. In the generated code the loop index variable is multiplied by 8 (correct, everything is 64 bit aligned) to get an address offset, and the index variable is incremented by 1*8, which is not correct. It should be incremented by 1 only. The factor 8 appears again. I compared the debug output
2020 Jun 09
2
LoopStrengthReduction generates false code
Hm, no. I expect byte addresses - everywhere. The compiler should not know that the arch needs word addresses. During lowering LOAD and STORE get explicit conversion operations for the memory address. Even if my arch was byte addressed the code would be false/illegal. Boris > Am 09.06.2020 um 19:36 schrieb Eli Friedman <efriedma at quicinc.com>: > > Blindly guessing here,
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index
2013 Nov 15
2
[LLVMdev] Limit loop vectorizer to SSE
Yes, I was just about to send out: DL->getABITypeAlignment(ScalarDataTy); The question is: “… ABI alignment for the target …" is that getPrefTypeAlignment or getABITypeAlignment I would have thought the latter. On Nov 15, 2013, at 4:12 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Arnold Schwaighofer"
2020 Jun 10
2
LoopStrengthReduction generates false code
The IR after LSR is: *** IR Dump After Loop Strength Reduction *** ; Preheader: entry: tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* @buffer, i32 0, i32 0)) #2 br label %while.body ; Loop: while.body: ; preds = %while.body, %entry %lsr.iv = phi i32 [ %lsr.iv.next, %while.body ], [ 0, %entry ] %uglygep = getelementptr
2013 Nov 15
2
[LLVMdev] Limit loop vectorizer to SSE
Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! -Josh On Fri, Nov 15, 2013 at 3:20 PM, Frank Winter <fwinter at jlab.org> wrote: > Hmm.. I don't quite understand. How can a module validator > catch this, when it's the
2013 Nov 15
2
[LLVMdev] Limit loop vectorizer to SSE
A fix for this is in r194876. Thanks for reporting this! On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to
2013 Oct 27
0
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Arnold, thanks for the detailed setup. Still, I haven't figured out the right thing to do. I would need only the native target since all generated code will execute on the JIT execution machine (right now, the old JIT interface). There is no need for other targets. Maybe it would be good to ask specific questions: How do I get the triple for the native target? How do I setup the
2013 Nov 19
0
[LLVMdev] Limit loop vectorizer to SSE
On 16/11/2013 7:58 AM, Nadav Rotem wrote: > > On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org > <mailto:renato.golin at linaro.org>> wrote: > >> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com >> <mailto:josh.klontz at gmail.com>> wrote: >> >> Agreed, is there a pass that will insert a
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: > Agreed, is there a pass that will insert a runtime alignment check? Also, > what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() > so I don't have to hard code 32? Thanks! > I think that's a fair question, and it's about safety. If you're getting this on the
2013 Oct 27
3
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Frank, On Oct 26, 2013, at 6:29 PM, Frank Winter <fwinter at jlab.org> wrote: > I would need this to work when calling the vectorizer through > the function pass manager. Unfortunately I am having the same > problem there: I am not sure which function pass manager you are referring here. I assume you create your own (you are not using opt but configure your own pass
2013 Nov 16
0
[LLVMdev] Limit loop vectorizer to SSE
I confirm that r194876 fixes the issue, i.e. segfault not caused. My program still passed 16 byte aligned pointers to the function which the loop vectorizer processes successfully: LV: Vector loop of width 8 costs: 1. LV: Selecting VF = : 8. LV: Found a vectorizable loop (8) in func_orig.ll LV: Unroll Factor is 1 Since the program runs fine, it seems to be allowed for the CPU to issue a vector
2013 Nov 16
1
[LLVMdev] Limit loop vectorizer to SSE
The vectorizer will now emit = load <8 x i32>, align #TargetAlignmentOfScalari32 where before it would emit = load <8 x i32> (which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction). When the backend generates code for the former it will emit an unaligned move:
2014 Dec 11
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the