Hi Nicolas (at least, I suspect your signing of your mail with "Anton" was not intentional :-p),> I assume that's the same as the online demo's "Show LLVM C++ API code" > option (http://llvm.org/demo/)? I've tried that with a structure containing > four floating-point components but it also appears to add them individually > using extract/insert. Maybe I have to try an array of floats...Did you turn off the link-time optimization flag (or something like that)? If not, the compiler will optimize things like small structs away (though a struct of more than 3 elements should not be scalarized directly AFAIK...). Gr. Matthijs -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080508/5ea62e4e/attachment.sig>
Hi Matthijs, Yes, I've turned off the link-time optimizations (otherwise it just propagates my constant vectors and immediate prints the result). :-) Here's essentially what I try to generate: void add(float z[4], float x[4], float y[4]) { z[0] = x[0] + y[0]; z[1] = x[1] + y[1]; z[2] = x[2] + y[2]; z[3] = x[3] + y[3]; } And here's part of the output from the online demo: LoadInst* float_tmp2 = new LoadInst(ptr_x, "tmp2", false, label_entry); LoadInst* float_tmp5 = new LoadInst(ptr_y, "tmp5", false, label_entry); BinaryOperator* float_tmp6 = BinaryOperator::create(Instruction::Add, float_tmp2, float_tmp5, "tmp6", label_entry); StoreInst* void_20 = new StoreInst(float_tmp6, ptr_z, false, label_entry); GetElementPtrInst* ptr_tmp10 = new GetElementPtrInst(ptr_x, const_int32_13, "tmp10", label_entry); LoadInst* float_tmp11 = new LoadInst(ptr_tmp10, "tmp11", false, label_entry); GetElementPtrInst* ptr_tmp13 = new GetElementPtrInst(ptr_y, const_int32_13, "tmp13", label_entry); LoadInst* float_tmp14 = new LoadInst(ptr_tmp13, "tmp14", false, label_entry); BinaryOperator* float_tmp15 = BinaryOperator::create(Instruction::Add, float_tmp11, float_tmp14, "tmp15", label_entry); ... So it just processes one element at a time instead of with one (SIMD) operation. Thank you, -Nicolas (not Anton) :-P -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Matthijs Kooijman Sent: Thursday, 08 May, 2008 21:39 To: LLVM Developers Mailing List Subject: Re: [LLVMdev] Vector code Hi Nicolas (at least, I suspect your signing of your mail with "Anton" was not intentional :-p),> I assume that's the same as the online demo's "Show LLVM C++ API code" > option (http://llvm.org/demo/)? I've tried that with a structure > containing four floating-point components but it also appears to add > them individually using extract/insert. Maybe I have to try an array offloats... Did you turn off the link-time optimization flag (or something like that)? If not, the compiler will optimize things like small structs away (though a struct of more than 3 elements should not be scalarized directly AFAIK...). Gr. Matthijs
llvm does not automatically vectorize your scalar code (as least for now). You have to write gcc generic vector code or use vector builtins. Evan On May 8, 2008, at 1:46 PM, Nicolas Capens wrote:> Hi Matthijs, > > Yes, I've turned off the link-time optimizations (otherwise it just > propagates my constant vectors and immediate prints the result). :-) > > Here's essentially what I try to generate: > > void add(float z[4], float x[4], float y[4]) > { > z[0] = x[0] + y[0]; > z[1] = x[1] + y[1]; > z[2] = x[2] + y[2]; > z[3] = x[3] + y[3]; > } > > And here's part of the output from the online demo: > > LoadInst* float_tmp2 = new LoadInst(ptr_x, "tmp2", false, > label_entry); > LoadInst* float_tmp5 = new LoadInst(ptr_y, "tmp5", false, > label_entry); > BinaryOperator* float_tmp6 = BinaryOperator::create(Instruction::Add, > float_tmp2, float_tmp5, "tmp6", label_entry); > StoreInst* void_20 = new StoreInst(float_tmp6, ptr_z, false, > label_entry); > GetElementPtrInst* ptr_tmp10 = new GetElementPtrInst(ptr_x, > const_int32_13, > "tmp10", label_entry); > LoadInst* float_tmp11 = new LoadInst(ptr_tmp10, "tmp11", false, > label_entry); > GetElementPtrInst* ptr_tmp13 = new GetElementPtrInst(ptr_y, > const_int32_13, > "tmp13", label_entry); > LoadInst* float_tmp14 = new LoadInst(ptr_tmp13, "tmp14", false, > label_entry); > BinaryOperator* float_tmp15 = BinaryOperator::create(Instruction::Add, > float_tmp11, float_tmp14, "tmp15", label_entry); > ... > > So it just processes one element at a time instead of with one (SIMD) > operation. > > Thank you, > > -Nicolas (not Anton) :-P > > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev- > bounces at cs.uiuc.edu] On > Behalf Of Matthijs Kooijman > Sent: Thursday, 08 May, 2008 21:39 > To: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Vector code > > Hi Nicolas (at least, I suspect your signing of your mail with > "Anton" was > not intentional :-p), > >> I assume that's the same as the online demo's "Show LLVM C++ API >> code" >> option (http://llvm.org/demo/)? I've tried that with a structure >> containing four floating-point components but it also appears to add >> them individually using extract/insert. Maybe I have to try an >> array of > floats... > Did you turn off the link-time optimization flag (or something like > that)? > If not, the compiler will optimize things like small structs away > (though a > struct of more than 3 elements should not be scalarized directly > AFAIK...). > > Gr. > > Matthijs > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Nicolas Capens wrote:> Here's essentially what I try to generate: > > void add(float z[4], float x[4], float y[4]) > { > z[0] = x[0] + y[0]; > z[1] = x[1] + y[1]; > z[2] = x[2] + y[2]; > z[3] = x[3] + y[3]; > }This is the vectorized llvm-assembly equivalent: ----- define void @add(<4 x float>* %z, <4 x float>* %x, <4 x float>* %y) { entry: %xs = load <4 x float>* %x %ys = load <4 x float>* %y %zs = add <4 x float> %xs, %ys store <4 x float> %zs, <4 x float>* %z ret void } ------ Run that through "llvm-as < code.ll | llc -march=cpp" to see the code to construct it.