Hi, I compile a case (test.c) to get object machine file (test.o) using clang as follows: "clang -target arm -integrated-as -c test.c -o test.o" My clang version is 3.3 and debug build. //test.c int a[6] = {1, 2, 3, 4, 5, 6} int main() { a[0] = a[5]; a[1] = a[4]; a[2] = a[5]; } //end test.c Then test.dump is generated by using the objdump tool. //test.dump ldr r1, [r0, #20] str r1, [r0] ldr r1, [r0, #16] str r1, [r0, #4] ldr r1, [r0, #12] str r1, [r0, #8] bx lr //end test.dump From the test.dump, we can see that the first instruction and second one use a register "r1", the 3th and 4th use the same register "r1", it's same to the 5th and 6th instruction. That's to say, the six instructions use the same register. However, for 3th and 4th instructions, they should be allocated different register from the second instruction. So, I insert a breakpoint in BuildSchedGraph function in ScheduleDAGSNodes.cpp to debug the source code. Then I get schedule graph of this basic block: Like the above graph, Pre-RA-sched(ScheduleRRList.cpp) is unable to insert the 3th SDNode(load2 instruction) between the first SDNode(load1 Instruction) and the second store1 SDNode. Then in the register allocation step, the pair instruction are allocated same register. However, if we build a schedule graph like the following: I think that Pre-RA-sched has change to schedule apart load1 and store1, the same to load2 and store2. Have someone considered building such a schedule graph? Thank you very much if any suggestion. -Haishan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131215/3af945a6/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 截图2.png Type: image/png Size: 13552 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131215/3af945a6/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: 截图3.png Type: image/png Size: 21750 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131215/3af945a6/attachment-0001.png>
Haishan писал 15.12.2013 17:47:> Hi, > I compile a case (test.c) to get object machine file (test.o) using > clang as follows: > "clang -target arm -integrated-as -c test.c -o test.o" > My clang version is 3.3 and debug build. > > //test.c > int a[6] = {1, 2, 3, 4, 5, 6} > int main() { > a[0] = a[5]; > a[1] = a[4]; > a[2] = a[5]; > } > //end test.c > Then test.dump is generated by using the objdump tool. > //test.dump > ldr r1, [r0, #20] > str r1, [r0] > ldr r1, [r0, #16] > str r1, [r0, #4] > ldr r1, [r0, #12] > str r1, [r0, #8] > bx lr > //end test.dump > From the test.dump, we can see that the first instruction and second > one use a register "r1", the 3th and 4th use the same register "r1", > it's same to the 5th and 6th instruction. > That's to say, the six instructions use the same register. > However, for 3th and 4th instructions, they should be allocated > different register from the second instruction. > So, I insert a breakpoint in BuildSchedGraph function in > ScheduleDAGSNodes.cpp to debug the source code. > Then I get schedule graph of this basic block: > > Like the above graph, Pre-RA-sched(ScheduleRRList.cpp) is unable to > insert the 3th SDNode(load2 instruction) between the first > SDNode(load1 Instruction) and the second store1 SDNode. > Then in the register allocation step, the pair instruction are > allocated same register. > However, if we build a schedule graph like the following: > > I think that Pre-RA-sched has change to schedule apart load1 and > store1, the same to load2 and store2. > Have someone considered building such a schedule graph? > Thank you very much if any suggestion. > -HaishanTry -mllvm -pre-RA-sched=list-burr -- WBR, Peter Zotov.
Caldarale, Charles R
2013-Dec-15 14:43 UTC
[LLVMdev] Question about Pre-RA-schedule in LLVM3.3
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Haishan > Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3> My clang version is 3.3 and debug build.> //test.c > int a[6] = {1, 2, 3, 4, 5, 6} > int main() { > a[0] = a[5]; > a[1] = a[4]; > a[2] = a[5]; > } > //end test.c > Then test.dump is generated by using the objdump tool. > //test.dump > ldr r1, [r0, #20] > str r1, [r0] > ldr r1, [r0, #16] > str r1, [r0, #4] > ldr r1, [r0, #12] > str r1, [r0, #8] > bx lr > //end test.dumpIt appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c. Presumably, the last array reference in test.c was really from a[3], not a[5].> However, for 3th and 4th instructions, they should be allocated different > register from the second instruction.Why? - Chuck
At 2013-12-15 22:43:34,"Caldarale, Charles R" <Chuck.Caldarale at unisys.com> wrote:>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Haishan >> Subject: [LLVMdev] Question about Pre-RA-schedule in LLVM3.3 > >> My clang version is 3.3 and debug build. > >> //test.c >> int a[6] = {1, 2, 3, 4, 5, 6} >> int main() { >> a[0] = a[5]; >> a[1] = a[4]; >> a[2] = a[5]; >> } >> //end test.c >> Then test.dump is generated by using the objdump tool. >> //test.dump >> ldr r1, [r0, #20] >> str r1, [r0] >> ldr r1, [r0, #16] >> str r1, [r0, #4] >> ldr r1, [r0, #12] >> str r1, [r0, #8] >> bx lr >> //end test.dump > >It appears you have a typo in the above, since the generated array reference offsets do not correspond to the code in test.c. Presumably, the last array reference in test.c was really from a[3], not a[5].I'm sorry for making a mistake in the above test.c.And your presumption is right.> >> However, for 3th and 4th instructions, they should be allocated different >> register from the second instruction. > >Why? > > - Chuck >Thank you for your answer.If 3th and 4th instructions are allocated different register from the second instruction. Then the same machine register dependence will disappear, this sequence instructions would be executed with less stalls and cycles. However, in the latest version of LLVM, the Pre-RA-sched builds a scheduling graph(original graph) which is shown following. //original graph ----> data flow ====> control flow load1 ----> store1 ====> load2 ----> store2 ====> load3 ----> store3 //end original graph So, Pre-RA-sched is unable to schedule apart load/store instruction pair. Due to LiveRange in the Register Allocation stage, all load/store instruction pair are allocated the same register. If we change the control flow in the above original graph, the modified graph is shown following. //modified graph ----> data flow ====> control flow load1 ----> store1 ====> store2 ====> store3 load2 ----> store2 load3 ----> store3 //end modified graph I think the Pre-RA-sched is able to schedule apart load/store instruction pairs. Then each instruction pair uses different register. The order of scheduled instruction of test.c may be load1, load2, load3, store1, store2, store3.Best Wishes- Haishan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131216/b09b5faa/attachment.html>