Robert Haskett
2012-Feb-21 19:12 UTC
[LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address
Hi all, me again! Well, after much hacking of code and thinking and frustration, I finally figured out what I was doing wrong. It turns out my initial attempts at using various gflags settings were causing VirtualAlloc to return GIANT addresses. In particular, the Application Verifier flag ( -vrf ), seems to cause VirtualAlloc to do what looks like top-down allocations and then llvm happily starts using the addresses specified without checking to see if the next function stub address goes beyond the Windows 8 terabyte limit. And why should it care really? So, the lesson here is, DON'T use the Microsoft Application Verifier flag with anything that uses llvm 3.0, because if you are JIT'ing large amounts of IR, you'll end up with a bad address eventually, and in my case, immediately. Guh. .r. Date: Tue, 14 Feb 2012 20:31:31 +0000 From: Robert Haskett <rhaskett at opentext.com> Subject: [LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address To: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu> Message-ID: <4895D06E53674F498EF4F6406C6436460A517E at otwlxg21.opentext.net> Content-Type: text/plain; charset="us-ascii" Hi all, Some background: I'm working on a project to replace a custom VM with various components of llvm. We have everything running just peachy keen with one recent exception, one of our executables crashes when attempting run a JIT'd function. We have llvm building and running on 64 bit Windows and Linux, using Visual Studio 2008 on Windows and gcc on Linux, and we have the llvm static libs linked into one of a DLLs, which is then linked to several different EXE's. The DLL contains the code to compile to llvm IR, JIT and run code written in our proprietary language. Each EXE calls into this DLL the same way. The same chunk of IR, when JIT'd in 3 of the EXE's runs perfectly, but in the last program, it dies in a call instruction out into an invalid memory location. All compiler and linker options are the same for all 4 exe's. The one difference I've seen when debugging the assembly is that the 3 that work all have JIT function pointer addresses less than a 32 bit value but t! he one that is failing has a 64 bit address, as indicated in the snippet below: 000007FFFFC511D7 pop rbp 000007FFFFC511D8 ret 000007FFFFC511D9 sub rsp,20h 000007FFFFC511DD mov rcx,qword ptr [rbp-70h] 000007FFFFC511E1 mov edx,0FFFFFFFEh 000007FFFFC511E6 xor r8d,r8d 000007FFFFC511E9 call rsi 000007FFFFC511EB add rsp,20h 000007FFFFC511EF test al,1 000007FFFFC511F2 je 000007FFFFC511C3 000007FFFFC511F8 sub rsp,20h 000007FFFFC511FC mov rax,7FFFFC30030h 000007FFFFC51206 mov rcx,rdi 000007FFFFC51209 mov edx,0FFFFFFFEh 000007FFFFC5120E xor r8d,r8d 000007FFFFC51211 call rax 000007FFFFC51213 add rsp,20h 000007FFFFC51217 test al,1 000007FFFFC5121A je 000007FFFFC511C3 000007FFFFC51220 mov qword ptr [rbp-68h],rdi 000007FFFFC51224 mov eax,10h 000007FFFFC51229 call 0000080077B3F1D0 000007FFFFC5122E sub rsp,rax 000007FFFFC51231 mov rdx,rsp 000007FFFFC51234 mov qword ptr [rbp-0F0h],rdx 000007FFFFC5123B sub rsp,20h The call instruction at 000007FFFFC51229 is the one that jumps into invalid memory at 80077B3F1D0. I'm not sure why this particular EXE causes llvm to use such large address values, but it looks like there might be some 32 bit vs 64 bit address calculation/offset problem when emitting the assembly. The code that works looks like this: 0000000002931211 call rax 0000000002931213 add rsp,20h 0000000002931217 test al,1 000000000293121A je 00000000029311C3 0000000002931220 mov qword ptr [rbp-68h],rdi 0000000002931224 mov eax,10h 0000000002931229 call 0000000077B3F1D0 000000000293122E sub rsp,rax 0000000002931231 mov rdx,rsp 0000000002931234 mov qword ptr [rbp-0F0h],rdx 000000000293123B sub rsp,20h 000000000293123F mov r12,180071AD0h 0000000002931249 mov ecx,0FFFFFFEEh 000000000293124E xor r8d,r8d 0000000002931251 mov r9,29C02EAh 000000000293125B call r12 000000000293125E add rsp,20h 0000000002931262 mov eax,10h 0000000002931267 call 0000000077B3F1D0 000000000293126C sub rsp,rax 000000000293126F mov rdx,rsp 0000000002931272 mov qword ptr [rbp-58h],rdx 0000000002931276 sub rsp,20h 000000000293127A mov ecx,0FFFFFFEEh 000000000293127F xor r8d,r8d 0000000002931282 mov r9,29C02EAh 000000000293128C call r12 000000000293128F add rsp,20h 0000000002931293 mov eax,10h 0000000002931298 call 0000000077B3F1D0 000000000293129D sub rsp,rax 00000000029312A0 mov rax,rsp And the code at 77B3F1D0 is this: 0000000077B3F1BE nop 0000000077B3F1BF nop 0000000077B3F1C0 int 3 0000000077B3F1C1 int 3 0000000077B3F1C2 int 3 0000000077B3F1C3 int 3 0000000077B3F1C4 int 3 0000000077B3F1C5 int 3 0000000077B3F1C6 nop word ptr [rax+rax] 0000000077B3F1D0 sub rsp,10h 0000000077B3F1D4 mov qword ptr [rsp],r10 0000000077B3F1D8 mov qword ptr [rsp+8],r11 0000000077B3F1DD xor r11,r11 0000000077B3F1E0 lea r10,[rsp+18h] 0000000077B3F1E5 sub r10,rax 0000000077B3F1E8 cmovb r10,r11 0000000077B3F1EC mov r11,qword ptr gs:[10h] 0000000077B3F1F5 cmp r10,r11 0000000077B3F1F8 jae 0000000077B3F210 0000000077B3F1FA and r10w,0F000h 0000000077B3F200 lea r11,[r11-1000h] 0000000077B3F207 mov byte ptr [r11],0 0000000077B3F20B cmp r10,r11 0000000077B3F20E jne 0000000077B3F200 0000000077B3F210 mov r10,qword ptr [rsp] 0000000077B3F214 mov r11,qword ptr [rsp+8] 0000000077B3F219 add rsp,10h 0000000077B3F21D ret 0000000077B3F21E nop 0000000077B3F21F nop 0000000077B3F220 int 3 0000000077B3F221 int 3 I searched the bug database for various topics but didn't see anything specific other than one mention in bug 5201 to do with 32 bit address truncating. My dev system is a dual-core xeon with 16 gigs of ram. I'm no expert in how llvm works to output the asm, but I'm not afraid to delve into it to see what's happening. Has anyone else run into this? Does anyone have a suggestion of where I might start to debug in the X86 emitter code? I'm not even sure how to create a test case that would use a large starting address for the JIT? Any help is greatly appreciated. Thanks in advance, .r. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120214/06679ec8/attachment-0001.html ------------------------------ Message: 3 Date: Tue, 14 Feb 2012 23:51:57 +0100 From: Carl-Philip H?nsch <cphaensch at googlemail.com> Subject: Re: [LLVMdev] Vectorization: Next Steps To: Hal Finkel <hfinkel at anl.gov> Cc: llvmdev at cs.uiuc.edu Message-ID: <CAO_gjAVJBcN==XwfJBJ6UL+=pPxjRkPXHoLNuOBQDVQjUciD0A at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" That works. Thank you. Will -vectorize become default later? 2012/2/14 Hal Finkel <hfinkel at anl.gov>> If you run with -vectorize instead of -bb-vectorize it will schedule the > cleanup passes for you. > > -Hal > > *Sent from my Verizon Wireless Droid* > > > -----Original message----- > > *From: *"Carl-Philip H?nsch" <cphaensch at googlemail.com>* > To: *Hal Finkel <hfinkel at anl.gov>* > Cc: *llvmdev at cs.uiuc.edu* > Sent: *Tue, Feb 14, 2012 16:10:28 GMT+00:00 > * > Subject: *Re: [LLVMdev] Vectorization: Next Steps > > I tested the "restricted" keyword and it works well :) > > The generated code is a bunch of shufflevector instructions, but after a > second -O3 pass, everything looks fine. > This problem is described in my ML post "passes propose passes" and occurs > here again. LLVM has so much great passes, but they cannot start again when > the code was somewhat simplified :( > Maybe that's one more reason to tell the pass scheduler to redo some > passes to find all optimizations. The core really simplifies to what I > expected. > > 2012/2/13 Hal Finkel <hfinkel at anl.gov> > >> On Mon, 2012-02-13 at 11:11 +0100, Carl-Philip H?nsch wrote: >> > I will test your suggestion, but I designed the test case to load the >> > memory directly into <4 x float> registers. So there is absolutely no >> > permutation and other swizzle or move operations. Maybe the heuristic >> > should not only count the depth but also the surrounding load/store >> > operations. >> >> I've attached two variants of your file, both which vectorize as you'd >> expect. The core difference between these and your original file is that >> I added the 'restrict' keyword so that the compiler can assume that the >> arrays don't alias (or, in the first case, I made them globals). You >> also probably need to specify some alignment information, otherwise the >> memory operations will be scalarized in codegen. >> >> -Hal >> >> > >> > Are the load/store operations vectorized, too? (I designed the test >> > case to completely fit the SSE registers) >> > >> > 2012/2/10 Hal Finkel <hfinkel at anl.gov> >> > Carl-Philip, >> > >> > The reason that this does not vectorize is that it cannot >> > vectorize the >> > stores; this leaves only the mul-add chains (and some chains >> > with >> > loads), and they only have a depth of 2 (the threshold is 6). >> > >> > If you give clang -mllvm -bb-vectorize-req-chain-depth=2 then >> > it will >> > vectorize. The reason the heuristic has such a large default >> > value is to >> > prevent cases where it costs more to permute all of the >> > necessary values >> > into and out of the vector registers than is saved by >> > vectorizing. Does >> > the code generated with -bb-vectorize-req-chain-depth=2 run >> > faster than >> > the unvectorized code? >> > >> > The heuristic can certainly be improved, and these kinds of >> > test cases >> > are very important to that improvement process. >> > >> > -Hal >> > >> > On Thu, 2012-02-09 at 13:27 +0100, Carl-Philip H?nsch wrote: >> > > I have a super-simple test case 4x4 matrix * 4-vector which >> > gets >> > > correctly unrolled, but is not vectorized by -bb-vectorize. >> > (I used >> > > llvm 3.1svn) >> > > I attached the test case so you can see what is going wrong >> > there. >> > > >> > > 2012/2/3 Hal Finkel <hfinkel at anl.gov> >> > > As some of you may know, I committed my basic-block >> > > autovectorization >> > > pass a few days ago. I encourage anyone interested >> > to try it >> > > out (pass >> > > -vectorize to opt or -mllvm -vectorize to clang) and >> > provide >> > > feedback. >> > > Especially in combination with >> > -unroll-allow-partial, I have >> > > observed >> > > some significant benchmark speedups, but, I have >> > also observed >> > > some >> > > significant slowdowns. I would like to share my >> > thoughts, and >> > > hopefully >> > > get feedback, on next steps. >> > > >> > > 1. "Target Data" for vectorization - I think that in >> > order to >> > > improve >> > > the vectorization quality, the vectorizer will need >> > more >> > > information >> > > about the target. This information could be provided >> > in the >> > > form of a >> > > kind of extended target data. This extended target >> > data might >> > > contain: >> > > - What basic types can be vectorized, and how many >> > of them >> > > will fit >> > > into (the largest) vector registers >> > > - What classes of operations can be vectorized >> > (division, >> > > conversions / >> > > sign extension, etc. are not always supported) >> > > - What alignment is necessary for loads and stores >> > > - Is scalar-to-vector free? >> > > >> > > 2. Feedback between passes - We may to implement a >> > closer >> > > coupling >> > > between optimization passes than currently exists. >> > > Specifically, I have >> > > in mind two things: >> > > - The vectorizer should communicate more closely >> > with the >> > > loop >> > > unroller. First, the loop unroller should try to >> > unroll to >> > > preserve >> > > maximal load/store alignments. Second, I think it >> > would make a >> > > lot of >> > > sense to be able to unroll and, only if this helps >> > > vectorization should >> > > the unrolled version be kept in preference to the >> > original. >> > > With basic >> > > block vectorization, it is often necessary to >> > (partially) >> > > unroll in >> > > order to vectorize. Even when we also have real loop >> > > vectorization, >> > > however, I still think that it will be important for >> > the loop >> > > unroller >> > > to communicate with the vectorizer. >> > > - After vectorization, it would make sense for the >> > > vectorization pass >> > > to request further simplification, but only on those >> > parts of >> > > the code >> > > that it modified. >> > > >> > > 3. Loop vectorization - It would be nice to have, in >> > addition >> > > to >> > > basic-block vectorization, a more-traditional loop >> > > vectorization pass. I >> > > think that we'll need a better loop analysis pass in >> > order for >> > > this to >> > > happen. Some of this was started in >> > LoopDependenceAnalysis, >> > > but that >> > > pass is not yet finished. We'll need something like >> > this to >> > > recognize >> > > affine memory references, etc. >> > > >> > > I look forward to hearing everyone's thoughts. >> > > >> > > -Hal >> > > >> > > -- >> > > Hal Finkel >> > > Postdoctoral Appointee >> > > Leadership Computing Facility >> > > Argonne National Laboratory >> > > >> > > _______________________________________________ >> > > LLVM Developers mailing list >> > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > >> > >> > -- >> > Hal Finkel >> > Postdoctoral Appointee >> > Leadership Computing Facility >> > Argonne National Laboratory >> > >> > >> > >> >> -- >> Hal Finkel >> Postdoctoral Appointee >> Leadership Computing Facility >> Argonne National Laboratory >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120214/29e445a0/attachment-0001.html ------------------------------ Message: 4 Date: Tue, 14 Feb 2012 17:12:54 -0800 From: Welson Sun <welson.sun at gmail.com> Subject: [LLVMdev] Wrong AliasAnalysis::getModRefInfo result To: llvmdev at cs.uiuc.edu Message-ID: <CAD3rk=0yOD693BPTGVGwTuWaSa8ckuZ=dg88RmQ3FFp_vw_m3A at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Just want to test out the LLVM's AliasAnalysis::getModRefInfo API. The input C code is very simple: void foo(int *a, int *b) { for(int i=0; i<10; i++) b[i] = a[i]*a[i]; } int main() { int a[10]; int b[10]; for(int i=0; i<10; i++) a[i] = i; foo(a,b); return 0; } Obviously, for "foo", it only reads from array "a" and only writes to array "b". The LLVM pass: virtual bool runOnFunction(Function &F) { ++HelloCounter; errs() << "Hello: "; errs().write_escaped(F.getName()) << '\n'; AliasAnalysis &AA = getAnalysis<AliasAnalysis>(); for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) { Instruction *Inst = &*I; if ( CallInst *ci = dyn_cast<CallInst>(Inst) ){ ci->dump(); for(int i = 0; i < ci->getNumArgOperands(); i++){ Value *v = ci->getArgOperand(i); if (GetElementPtrInst *vi = dyn_cast<GetElementPtrInst>(v)){ Value *vPtr = vi->getPointerOperand(); vPtr->dump(); if ( AllocaInst *allo = dyn_cast<AllocaInst>(vPtr) ) { const Type *t = allo->getAllocatedType(); if ( const ArrayType *at = dyn_cast<ArrayType>(t) ) { int64_t size = at->getNumElements() * at->getElementType()->getPrimitiveSizeInBits() / 8; ImmutableCallSite cs(ci); AliasAnalysis::Location loc(v, size); errs() << AA.getModRefInfo(ci, loc) << "\n"; } } } } } } return false; } However, the result is "3" for both a and b, which is both read and write. What's the problem? I am not quite sure if I get the AliasAnalysis::Location right, what is exactly "address-units" for the size of the location? And did I get the starting address of the Location right? I tried v, vi and vPtr, same result. Any insight helps, Welson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120214/fab3ba30/attachment-0001.html ------------------------------ Message: 5 Date: Tue, 14 Feb 2012 18:33:34 -0800 From: Lang Hames <lhames at gmail.com> Subject: Re: [LLVMdev] [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the same livein copy more than once To: Tom Stellard <thomas.stellard at amd.com> Cc: "Stellard, Thomas" <Tom.Stellard at amd.com>, "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu> Message-ID: <CALLttgr8Bxj_Ttbs=ez_qRCuS_xNwWjcrsu4SEbcsT4OyQf7nQ at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi Tom, As far as I can tell EmitLiveInCopies is just there to handle physreg arguments and return values. Is there any reason for these to change late in your backend? - Lang. On Tue, Feb 14, 2012 at 7:22 AM, Tom Stellard <thomas.stellard at amd.com>wrote:> On Mon, Feb 13, 2012 at 10:17:11PM -0800, Lang Hames wrote: > > Hi Tom, > > > > I'm pretty sure this function should only ever be called once, by > > SelectionDAG. Do you know where the second call is coming from in your > code? > > > > Cheers, > > Lang. > > Hi Lang, > > I was calling EmitLiveInCopies() from one of my backend specific passes. > If the function can only be called once, then I'll just try to merge > that pass with into the SelectionDAG. > > Thanks, > Tom > > > > > On Mon, Feb 13, 2012 at 7:03 PM, Stellard, Thomas <Tom.Stellard at amd.com > >wrote: > > > > > This patch seems to have been lost on the llvm-commits mailing list. > > > Would someone be able to review it? > > > > > > Thanks, > > > Tom > > > ________________________________________ > > > From: llvm-commits-bounces at cs.uiuc.edu [ > llvm-commits-bounces at cs.uiuc.edu] > > > on behalf of Tom Stellard [thomas.stellard at amd.com] > > > Sent: Friday, February 03, 2012 1:55 PM > > > To: llvm-commits at cs.uiuc.edu > > > Subject: Re: [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the > > > same livein copy more than once > > > > > > On Fri, Jan 27, 2012 at 02:56:03PM -0500, Tom Stellard wrote: > > > > --- > > > > > > > > Is MachineRegisterInfo::EmitLiveInCopies() only meant to be called > once > > > > per compile? If I call it more than once, it emits duplicate copies > > > > which causes the live interval analysis to fail. > > > > > > > > lib/CodeGen/MachineRegisterInfo.cpp | 4 +++- > > > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > > > > > diff --git a/lib/CodeGen/MachineRegisterInfo.cpp > > > b/lib/CodeGen/MachineRegisterInfo.cpp > > > > index 266ebf6..fc787f2 100644 > > > > --- a/lib/CodeGen/MachineRegisterInfo.cpp > > > > +++ b/lib/CodeGen/MachineRegisterInfo.cpp > > > > @@ -227,7 +227,9 @@ > > > MachineRegisterInfo::EmitLiveInCopies(MachineBasicBlock *EntryMBB, > > > > // complicated by the debug info code for arguments. > > > > LiveIns.erase(LiveIns.begin() + i); > > > > --i; --e; > > > > - } else { > > > > + //Make sure we don't emit the same livein copies twice, in > case > > > this > > > > + //function is called more than once. > > > > + } else if (def_empty(LiveIns[i].second)) { > > > > // Emit a copy. > > > > BuildMI(*EntryMBB, EntryMBB->begin(), DebugLoc(), > > > > TII.get(TargetOpcode::COPY), LiveIns[i].second) > > > > -- > > > > 1.7.6.4 > > > > > > > > > > > > > > Reposting this as a diff that can be applied via patch -P0 for SVN > > > users. > > > > > > -Tom > > > > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120214/6eba3eed/attachment-0001.html ------------------------------ Message: 6 Date: Tue, 14 Feb 2012 22:23:11 -0600 From: Hal Finkel <hfinkel at anl.gov> Subject: Re: [LLVMdev] Vectorization: Next Steps To: Carl-Philip H?nsch <cphaensch at googlemail.com> Cc: llvmdev at cs.uiuc.edu Message-ID: <1329279791.2835.4.camel at sapling2> Content-Type: text/plain; charset="UTF-8" On Tue, 2012-02-14 at 23:51 +0100, Carl-Philip H?nsch wrote:> That works. Thank you. > Will -vectorize become default later?I don't know, but I think there is a lot of improvement to be made first. -Hal> > 2012/2/14 Hal Finkel <hfinkel at anl.gov> > If you run with -vectorize instead of -bb-vectorize it will > schedule the cleanup passes for you. > > -Hal > > Sent from my Verizon Wireless Droid > > > -----Original message----- > From: "Carl-Philip H?nsch" <cphaensch at googlemail.com> > To: Hal Finkel <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Tue, Feb 14, 2012 16:10:28 GMT+00:00 > > Subject: Re: [LLVMdev] Vectorization: Next Steps > > > I tested the "restricted" keyword and it works well :) > > The generated code is a bunch of shufflevector > instructions, but after a second -O3 pass, everything > looks fine. > This problem is described in my ML post "passes > propose passes" and occurs here again. LLVM has so > much great passes, but they cannot start again when > the code was somewhat simplified :( > Maybe that's one more reason to tell the pass > scheduler to redo some passes to find all > optimizations. The core really simplifies to what I > expected. > > 2012/2/13 Hal Finkel <hfinkel at anl.gov> > On Mon, 2012-02-13 at 11:11 +0100, Carl-Philip > H?nsch wrote: > > I will test your suggestion, but I designed > the test case to load the > > memory directly into <4 x float> registers. > So there is absolutely no > > permutation and other swizzle or move > operations. Maybe the heuristic > > should not only count the depth but also the > surrounding load/store > > operations. > > > I've attached two variants of your file, both > which vectorize as you'd > expect. The core difference between these and > your original file is that > I added the 'restrict' keyword so that the > compiler can assume that the > arrays don't alias (or, in the first case, I > made them globals). You > also probably need to specify some alignment > information, otherwise the > memory operations will be scalarized in > codegen. > > -Hal > > > > > Are the load/store operations vectorized, > too? (I designed the test > > case to completely fit the SSE registers) > > > > 2012/2/10 Hal Finkel <hfinkel at anl.gov> > > Carl-Philip, > > > > The reason that this does not > vectorize is that it cannot > > vectorize the > > stores; this leaves only the mul-add > chains (and some chains > > with > > loads), and they only have a depth > of 2 (the threshold is 6). > > > > If you give clang -mllvm > -bb-vectorize-req-chain-depth=2 then > > it will > > vectorize. The reason the heuristic > has such a large default > > value is to > > prevent cases where it costs more to > permute all of the > > necessary values > > into and out of the vector registers > than is saved by > > vectorizing. Does > > the code generated with > -bb-vectorize-req-chain-depth=2 run > > faster than > > the unvectorized code? > > > > The heuristic can certainly be > improved, and these kinds of > > test cases > > are very important to that > improvement process. > > > > -Hal > > > > On Thu, 2012-02-09 at 13:27 +0100, > Carl-Philip H?nsch wrote: > > > I have a super-simple test case > 4x4 matrix * 4-vector which > > gets > > > correctly unrolled, but is not > vectorized by -bb-vectorize. > > (I used > > > llvm 3.1svn) > > > I attached the test case so you > can see what is going wrong > > there. > > > > > > 2012/2/3 Hal Finkel > <hfinkel at anl.gov> > > > As some of you may know, I > committed my basic-block > > > autovectorization > > > pass a few days ago. I > encourage anyone interested > > to try it > > > out (pass > > > -vectorize to opt or > -mllvm -vectorize to clang) and > > provide > > > feedback. > > > Especially in combination > with > > -unroll-allow-partial, I have > > > observed > > > some significant benchmark > speedups, but, I have > > also observed > > > some > > > significant slowdowns. I > would like to share my > > thoughts, and > > > hopefully > > > get feedback, on next > steps. > > > > > > 1. "Target Data" for > vectorization - I think that in > > order to > > > improve > > > the vectorization quality, > the vectorizer will need > > more > > > information > > > about the target. This > information could be provided > > in the > > > form of a > > > kind of extended target > data. This extended target > > data might > > > contain: > > > - What basic types can be > vectorized, and how many > > of them > > > will fit > > > into (the largest) vector > registers > > > - What classes of > operations can be vectorized > > (division, > > > conversions / > > > sign extension, etc. are > not always supported) > > > - What alignment is > necessary for loads and stores > > > - Is scalar-to-vector > free? > > > > > > 2. Feedback between passes > - We may to implement a > > closer > > > coupling > > > between optimization > passes than currently exists. > > > Specifically, I have > > > in mind two things: > > > - The vectorizer should > communicate more closely > > with the > > > loop > > > unroller. First, the loop > unroller should try to > > unroll to > > > preserve > > > maximal load/store > alignments. Second, I think it > > would make a > > > lot of > > > sense to be able to unroll > and, only if this helps > > > vectorization should > > > the unrolled version be > kept in preference to the > > original. > > > With basic > > > block vectorization, it is > often necessary to > > (partially) > > > unroll in > > > order to vectorize. Even > when we also have real loop > > > vectorization, > > > however, I still think > that it will be important for > > the loop > > > unroller > > > to communicate with the > vectorizer. > > > - After vectorization, it > would make sense for the > > > vectorization pass > > > to request further > simplification, but only on those > > parts of > > > the code > > > that it modified. > > > > > > 3. Loop vectorization - It > would be nice to have, in > > addition > > > to > > > basic-block vectorization, > a more-traditional loop > > > vectorization pass. I > > > think that we'll need a > better loop analysis pass in > > order for > > > this to > > > happen. Some of this was > started in > > LoopDependenceAnalysis, > > > but that > > > pass is not yet finished. > We'll need something like > > this to > > > recognize > > > affine memory references, > etc. > > > > > > I look forward to hearing > everyone's thoughts. > > > > > > -Hal > > > > > > -- > > > Hal Finkel > > > Postdoctoral Appointee > > > Leadership Computing > Facility > > > Argonne National > Laboratory > > > > > > > _______________________________________________ > > > LLVM Developers mailing > list > > > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > -- > > Hal Finkel > > Postdoctoral Appointee > > Leadership Computing Facility > > Argonne National Laboratory > > > > > > > > -- > Hal Finkel > Postdoctoral Appointee > Leadership Computing Facility > Argonne National Laboratory > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory 1-630-252-0023 hfinkel at anl.gov -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory ------------------------------ _______________________________________________ LLVMdev mailing list LLVMdev at cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev End of LLVMdev Digest, Vol 92, Issue 30 ***************************************
Maybe Matching Threads
- [LLVMdev] Strange behaviour with x86-64 windows, bad call instruction address
- [LLVMdev] [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the same livein copy more than once
- [LLVMdev] [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the same livein copy more than once
- [LLVMdev] [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the same livein copy more than once
- [LLVMdev] [llvm-commits] [PATCH] MachineRegisterInfo: Don't emit the same livein copy more than once