At 2013-12-15 22:43:34,"Caldarale, Charles R" <Chuck.Caldarale at unisys.com> wrote:>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Haishan >> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > >> My clang version is 3.3 and debug build. > >> //test.c >> int a[6] = {1, 2, 3, 4, 5, 6} >> int main() { >> a[0] = a[5]; >> a[1] = a[4]; >> a[2] = a[5]; >> } >> //end test.c >> Then test.dump is generated by using the objdump tool. >> //test.dump >> ldr r1, [r0, #20] >> str r1, [r0] >> ldr r1, [r0, #16] >> str r1, [r0, #4] >> ldr r1, [r0, #12] >> str r1, [r0, #8] >> bx lr >> //end test.dump > >It appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c. Presumably, the last array reference in test.c was really from a[3], not a[5].I'm sorry for making a mistake in the above test.c.And your presumption is right.> >> However, for 3th and 4th instructions, they should be allocated different >> register from the second instruction. > >Why? > > - Chuck >Thank you for your answer.If 3th and 4th instructions are allocated different register from the second instruction. Then the same machine register dependence will disappear, this sequence instructions would be executed with less stalls and cycles. However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. //original graph ----> data flow ====> control flow load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 //end original graph So, Pre-RA-sched is unable to schedule apart load/store instruction pair. Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. If we change the control flow in the above original graph, the modified graph is shown following. //modified graph ----> data flow ====> control flow load1 ----> store1 ====> store2 ====> store3 load2 ----> store2 load3 ----> store3 //end modified graph I think the Pre-RA-sched is able to schedule apart load/store instruction pairs. Then each instruction pair uses different register. The order of scheduled instruction of test.c may be load1, load2, load3, store1, store2, store3.Best Wishes- Haishan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131216/b09b5faa/attachment.html>
The flag -enable-aa-sched-mi should do what you want you want in the MachineScheduler pass. If you want to do it in the selection DAG, there is a subtarget hook that might do it: TargetSubtargetInfo::useAA() LLVM won’t generate the schedule you want anyway for Intel core processors, but the alias analysis can be useful in general. -Andy On Dec 16, 2013, at 6:03 AM, Haishan <hndxvon at 163.com> wrote:> At 2013-12-15 22:43:34,"Caldarale, Charles R" <Chuck.Caldarale at unisys.com> wrote: > >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > >> On Behalf Of Haishan > >> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > > > >> My clang version is 3.3 and debug build. > > > >> //test.c > >> int a[6] = {1, 2, 3, 4, 5, 6} > >> int main() { > >> a[0] = a[5]; > >> a[1] = a[4]; > >> a[2] = a[5]; > >> } > >> //end test.c > >> Then test.dump is generated by using the objdump tool. > >> //test.dump > >> ldr r1, [r0, #20] > >> str r1, [r0] > >> ldr r1, [r0, #16] > >> str r1, [r0, #4] > >> ldr r1, [r0, #12] > >> str r1, [r0, #8] > >> bx lr > >> //end test.dump > > > >It appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c. Presumably, the last array reference in test.c was really from a[3], not a[5]. > I'm sorry for making a mistake in the above test.c. > And your presumption is right. > > > >> However, for 3th and 4th instructions, they should be allocated different > >> register from the second instruction. > > > >Why? > > > > - Chuck > > > Thank you for your answer. > If 3th and 4th instructions are allocated different register from the second instruction. > Then the same machine register dependence will disappear, > this sequence instructions would be executed with less stalls and cycles. > However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. > //original graph > ----> data flow > ====> control flow > load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 > //end original graph > So, Pre-RA-sched is unable to schedule apart load/store instruction pair. > Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. > > If we change the control flow in the above original graph, the modified graph is shown following. > //modified graph > ----> data flow > ====> control flow > load1 ----> store1 ====> store2 ====> store3 > load2 ----> store2 > load3 ----> store3 > //end modified graph > > I think the Pre-RA-sched is able to schedule apart load/store instruction pairs. > Then each instruction pair uses different register. > The order of scheduled instruction of test.c may be load1, load2, load3, store1, store2, store3. > Best Wishes > - Haishan > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131220/1ce12617/attachment.html>
----- Original Message -----> From: "Andrew Trick" <atrick at apple.com> > To: "Haishan" <hndxvon at 163.com> > Cc: "llvmdev" <llvmdev at cs.uiuc.edu> > Sent: Friday, December 20, 2013 8:41:40 PM > Subject: Re: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > > > The flag -enable-aa-sched-mi should do what you want you want in the > MachineScheduler pass. > > > If you want to do it in the selection DAG, there is a subtarget hook > that might do it: > > > > TargetSubtargetInfo::useAA()To be clear, returning true from useAA enables the use of AA both in the selection DAG and also in the MachineScheduler pass. -Hal> > > LLVM won’t generate the schedule you want anyway for Intel core > processors, but the alias analysis can be useful in general. > > > -Andy > > > > > On Dec 16, 2013, at 6:03 AM, Haishan < hndxvon at 163.com > wrote: > > > > At 2013-12-15 22:43:34,"Caldarale, Charles R" < > Chuck.Caldarale at unisys.com > wrote: > >> From: llvmdev-bounces at cs.uiuc.edu [ > >> mailto:llvmdev-bounces at cs.uiuc.edu ] > >> On Behalf Of Haishan > >> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > > > >> My clang version is 3.3 and debug build. > > > >> //test.c > >> int a[6] = {1, 2, 3, 4, 5, 6} > >> int main() { > >> a[0] = a[5]; > >> a[1] = a[4]; > >> a[2] = a[5]; > >> } > >> //end test.c > >> Then test.dump is generated by using the objdump tool. > >> //test.dump > >> ldr r1, [r0, #20] > >> str r1, [r0] > >> ldr r1, [r0, #16] > >> str r1, [r0, #4] > >> ldr r1, [r0, #12] > >> str r1, [r0, #8] > >> bx lr > >> //end test.dump > > > >It appears you have a typo in the above, since the generated array > > reference offsets do not correspond to the code in test.c. > > Presumably, the last array reference in test.c was really from > > a[3], not a[5]. I'm sorry for making a mistake in the above test.c. > And your presumption is right. > > >> However, for 3th and 4th instructions, they should be allocated > >> different > >> register from the second instruction. > > > >Why? > > > > - Chuck > > Thank you for your answer. If 3th and 4th instructions are > > allocated different register from the second instruction. > Then the same machine register dependence will disappear, > this sequence instructions would be executed with less stalls and > cycles. > However, in the latest version of LLVM, the Pre-RA-sched builds a > scheduling graph(original graph) which is shown following. > //original graph > ----> data flow > ====> control flow > load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 > //end original graph > So, Pre-RA-sched is unable to schedule apart load/store instruction > pair. > Due to LiveRange in the Register Allocation stage, all load/store > instruction pair are allocated the same register. > > If we change the control flow in the above original graph, the > modified graph is shown following. > //modified graph > ----> data flow > ====> control flow > load1 ----> store1 ====> store2 ====> store3 > load2 ----> store2 > load3 ----> store3 > //end modified graph > > I think the Pre-RA-sched is able to schedule apart load/store > instruction pairs. > Then each instruction pair uses different register. > The order of scheduled instruction of test.c may be load1, load2, > load3, store1, store2, store3. Best Wishes - Haishan > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory