Dear Prof.Adve and Bill, I deeply appreciate your comments and concerns. (Please forgive my late response. I've tried some more cases to make this issue) As Prof.Adve mentioned, I need to explain exactly what my problem is, but I have no good ability that I can explain it in this plain text space. For this reason, I made a .pdf file and linked it as follows: https://netfiles.uiuc.edu/lee225/www/MakingLoops.pdf Would you please explain to me how I can access to this problem in a better way if you can figure out? Once again, I really appreciate your time and favor, Bill and Prof.Adve. Truly yours, Seung J. Lee ---- Original message ---->Date: Mon, 28 Jan 2008 09:39:52 -0600 >From: "Vikram S. Adve" <vadve at uiuc.edu> >Subject: Re: [LLVMdev] Question to Chris >To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> > >Bill, > >Depending on what Seung's problem is, converting *out* of SSA form may >actually be the wrong thing to do. Seung needs to explain exactly >problem he is unable solve using SSA form. > >--Vikram >http://www.cs.uiuc.edu/~vadve >http://llvm.org/ > >
Ok, here are a few suggestions and comments: 1) LLVM has the capabilities to do everything that you are trying to re-implement. 2) Have you looked at the C backend? It recreates loops. It may not create "for" loops but you can hack on it to do that. 3) The way you are converting out of SSA is wrong. You will suffer from lost copies. You should look at using demotePHI(). see include/ llvm/Transforms/Utils/Local.h 4) LLVM will compute your trip counts for you, use LoopInfo. You need to be in SSA form to do this. -Tanya On Feb 1, 2008, at 11:51 PM, Seung Jae Lee wrote:> Dear Prof.Adve and Bill, > > I deeply appreciate your comments and concerns. > (Please forgive my late response. I've tried some more cases to > make this issue) > > As Prof.Adve mentioned, I need to explain exactly what my problem > is, but I have no good ability that I can explain it in this plain > text space. > > For this reason, I made a .pdf file and linked it as follows: > > https://netfiles.uiuc.edu/lee225/www/MakingLoops.pdf > > Would you please explain to me how I can access to this problem in > a better way if you can figure out? > > Once again, I really appreciate your time and favor, Bill and > Prof.Adve. > > Truly yours, > Seung J. Lee > > > ---- Original message ---- >> Date: Mon, 28 Jan 2008 09:39:52 -0600 >> From: "Vikram S. Adve" <vadve at uiuc.edu> >> Subject: Re: [LLVMdev] Question to Chris >> To: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> >> >> Bill, >> >> Depending on what Seung's problem is, converting *out* of SSA form >> may >> actually be the wrong thing to do. Seung needs to explain exactly >> problem he is unable solve using SSA form. >> >> --Vikram >> http://www.cs.uiuc.edu/~vadve >> http://llvm.org/ >> >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Feb 1, 2008, at 11:51 PM, Seung Jae Lee wrote:> As Prof.Adve mentioned, I need to explain exactly what my problem > is, but I have no good ability that I can explain it in this plain > text space.Let me rephrase your question for you. You can then see the right question, and the answer for it. The question is, why is, why can't I change: bb8: sum_1 = i_0_pn + sum_0_pn; br bool %exitcond, label %bb10, label %bb3 into bb8: br bool %exitcond, label %bb10, label %bb3 sum_1 = i_0_pn + sum_0_pn; ? The answer is, because that is an invalid transform. A: B; C; has to be transformed into: B; A: C; (where the motion of B is copy it to the other side of all control flows into A) When you do this, you'll notice you get: $ cat t.c int foo() { unsigned indvar_next27, indvar_next; int sum_1, sum_0_pn, i_0_pn; //entry: unsigned indvar26 = 0; int sum_0_pn_ph = 0; //brlabel %bb8_outer //bb8_outer: for (indvar_next27=0; indvar_next27 != 10; ){ int i_0_0_ph = (int) indvar26; unsigned indvar= 0; int sum_0_pn = sum_0_pn_ph; int i_0_pn = i_0_0_ph; //brlabel %bb8 //bb8: sum_1 = i_0_pn + sum_0_pn; /* LOOK HERE */ for (;indvar!=3;){ //%exitcond= setequint%indvar, 3 //brbool %exitcond, label %bb10, label %bb3 //bb3: indvar_next= indvar+ 1; indvar= indvar_next; sum_0_pn = sum_1; i_0_pn = 2; sum_1 = i_0_pn + sum_0_pn; /* LOOK HERE */ //brlabel %bb8 } //bb10: indvar_next27 = indvar26 + 1; //%exitcond28 = setequint%indvar_next27, 10 indvar26 = indvar_next27; sum_0_pn_ph = sum_1; //brbool %exitcond28, label %bb16, label %bb8_outer } //bb16: return sum_1; } int main() { printf("%d\n", foo()); } $ gcc t.c t.c: In function ‘main’: t.c:40: warning: incompatible implicit declaration of built-in function ‘printf’ $ ./a.out 105 The cost of the .pdf file outweighed the benefit for me. I can see the pretty graphs in the straight text code. If you want to help me see them further, just indent. The short answer is you can't guess at valid transforms. You have to know they are valid, or prove the are valid. The only way to get: for (;C;) { S } is to transform: top: if (!C) goto end; S goto top; end: into it. If you have any other form, it is wrong to transform into it. You _must_transform into the form just above first, and then that form goes into the for statement. At the end of the day, you have a set of valid transforms, each one of which you can talk about and audit separately. If you have any bugs, they are never in the valid transforms, and the invalid ones stand out to people skilled in the art. So, if you had written down the transform you were using, we could have identified it. You didn't, so, we could not. I didn't check your other loop for correctness, or any of the other implied transforms that you used for correctness. If you want to ensure it is correct, you can run large amount of code through it and check the results (I'd call that the industrial approach) and hope it is approximately right, or, you have to reason about each and every transform you're using and hopefully even prove each one. The latter approach I'd call the academic approach. :-) You're splitting and moving phi nodes seems reasonable. The transform of: top: S; goto top; into for (;;) { S; } is valid. What isn't valid is putting the induction variable into the condition slot and calling it done. You must first transform it into the form given above. To do that, you have to use other transforms, each one of which you should write down and list and prove is valid. If you want us to audit each transform for you, write them all down, one by one.
Seung,> 3) The way you are converting out of SSA is wrong. You will suffer > from lost copies. You should look at using demotePHI(). see include/ > llvm/Transforms/Utils/Local.hUsing demotePHI() to remove PHI nodes should be the easiest way. However, if you want to keep values in registers, you must take into account the problem mentioned above, and also the "swap problem". You might take a look at this article, for example: http://www.cs.ucsb.edu/~ckrintz/papers/spe98-ssaconstruct-deconstruct.pdf.gz to see when these problems appear and how to deal with them. If you address these issues, I think, your algorithm to remove PHI nodes should be okay. -Wojtek
Seung,> Would you please explain to me how I can access to this problem in a better way if you can figure out?Here are my thoughts on your first question: "how to make it count correctly for the reconstruction of loops?". I believe it can't be done for an arbitrary loop. However, IIRC from previous posts your loops are simple - for example: they don't contain any breaks, gotos. I think, your reconstruction could be done for two forms of loops that are very common: 1) rotated canonical loop (bb8.outer in your example): The loop has only one backedge, and an induction variable counting from 0 with step 1. The only basic block that branches out of the loop (so called, exiting block) is the backedge. header: %i = phi [ 0, %preheader ], [ %i.next, %body ] ... br label %body ;%header and %body may be the same block body: ... %i.next = add %i, 1 %c = seteq %i.next, %n br %c, label %exit, label %header exit: ... The loop iteration count is %n. In this case, it means that _every_ basic block is executed %n times. Your current version of reconstruction seems to be okay for loops in this form. 2) unrotated canonical loop (bb8 in your example): The loop has only one backedge and an induction variable counting from 0 with step 1. The only basic block that branches out of the loop is the loop header. header: %i = phi [ 0, %preheader ], [ %i.next, %body ] ... %c = seteq %i, %n br %c, label %exit, label %body body: ... %i.next = add %i, 1 br label %header exit: ... The loop iteration count is also %n. However, this time it means that the loop header is executed %n times, while the rest of blocks - %n-1 times. The reconstruction should take it into account. For example, it could make a "for" loop from all loop basic blocks setting the iteration count to %n-1, and then insert a copy of the loop header with %i==%n after the "for". I also suggest taking look at the LoopInfo pass. It has many useful methods to analyze loops. To determine the loop iteration count you may use the ScalarEvolution pass as well. -Wojtek