Hi, I am planning to generate code for a peculiar architecture with _no_ branch instructions (!), but with predicated loads and stores to memory. This means the architecture is not Turing complete, is going to waste a lot of computation, and any input program that can hope to get compiled for this architecture must have loops that can be fully unrolled, and all its functions must get fully inlined. Since I have no branches, I think I will need to predicate code before DAG Selection. I was thinking of doing this: 1) Run an analysis on the LLVM bitcode that calculates the condition under which an instruction will be "reached" -- this is straightforward since my input program is not allowed to have cycles in the CFG. 2) Run a pass that requires the above analysis and uses it to: - merge all basic blocks in topological sort order (which exists, because CFG is acyclic). - insert appropriate instructions to generate the predicates. - change all PHI-nodes into Select nodes. - predicate memory operations (well, at least the stores). It is this final predication step that I am not sure how to handle. Since LLVM does not have predicated load/store instructions, will I have to upgrade the memory operations to a call or something that would then have to be custom handled by the SelectionDAG mechanism? Or should I just leave the load/stores alone, and somehow, later after the machine instructions are created predicate them using this predicate information? If so, how do I ensure that the dependency on the predicate generating instructions is preserved (the predicate instructions are not dce'd away)? Or should I be trying to do the predication right before Legalize? If so, I'll need to think about it a little more. Thank you for your time :) nikhil -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071007/9f2a1407/attachment.html>
On 07/10/2007, Nikhil A. Patil <patil.nikhil at gmail.com> wrote:> Hi, > > I am planning to generate code for a peculiar architecture with _no_ branch > instructions (!), but with predicated loads and stores to memory. This means > the architecture is not Turing complete, is going to waste a lot of > computation, and any input program that can hope to get compiled for this > architecture must have loops that can be fully unrolled, and all its > functions must get fully inlined. > > Since I have no branches, I think I will need to predicate code before DAG > Selection. I was thinking of doing this:oops, I meant to say before "Initial SelectionDAG construction"> 1) Run an analysis on the LLVM bitcode that calculates the condition under > which an instruction will be "reached" -- this is straightforward since my > input program is not allowed to have cycles in the CFG. > 2) Run a pass that requires the above analysis and uses it to: > - merge all basic blocks in topological sort order (which exists, because > CFG is acyclic). > - insert appropriate instructions to generate the predicates. > - change all PHI-nodes into Select nodes. > - predicate memory operations (well, at least the stores). > > It is this final predication step that I am not sure how to handle. Since > LLVM does not have predicated load/store instructions, will I have to > upgrade the memory operations to a call or something that would then have to > be custom handled by the SelectionDAG mechanism? > > Or should I just leave the load/stores alone, and somehow, later after the > machine instructions are created predicate them using this predicate > information? If so, how do I ensure that the dependency on the predicate > generating instructions is preserved (the predicate instructions are not > dce'd away)? > > Or should I be trying to do the predication right before Legalize? If so, > I'll need to think about it a little more. > > Thank you for your time :) > nikhil
On Oct 7, 2007, at 12:51 AM, Nikhil A. Patil wrote:> Hi, > > I am planning to generate code for a peculiar architecture with > _no_ branch instructions (!), but with predicated loads and stores > to memory. This means the architecture is not Turing complete, is > going to waste a lot of computation, and any input program that can > hope to get compiled for this architecture must have loops that can > be fully unrolled, and all its functions must get fully inlined. > > Since I have no branches, I think I will need to predicate code > before DAG Selection. I was thinking of doing this: > 1) Run an analysis on the LLVM bitcode that calculates the > condition under which an instruction will be "reached" -- this is > straightforward since my input program is not allowed to have > cycles in the CFG. > 2) Run a pass that requires the above analysis and uses it to: > - merge all basic blocks in topological sort order (which > exists, because CFG is acyclic). > - insert appropriate instructions to generate the predicates. > - change all PHI-nodes into Select nodes. > - predicate memory operations (well, at least the stores).I am not aware of a mechanism to completely remove branches for any general program. Don't you have to restrict it a loop with known iteration count?> > It is this final predication step that I am not sure how to handle. > Since LLVM does not have predicated load/store instructions, will I > have to upgrade the memory operations to a call or something that > would then have to be custom handled by the SelectionDAG mechanism?Target specific instructions can have predicate operands. See ARM. I've implemented a if-converter but it runs after register allocation so it won't fit your needs. Ideally (because that's what everyone does) you want a pass that operate on 3-address machine specific instructions. Then you can predicate code and merge blocks together before you do scheduling. The complication our codegen scheme is we do instruction scheduling on DAGs and only translate into 3-address code at the end of scheduling. Sometime down the road we will be implementing whole function instruction scheduling. Once that's done, you can perhaps do a predication pass that operate on the DAG that represent the whole function. But that's not going to help you right now. So for now, your choice is either 1) Operate on DAGs that represent basic blocks, predicate nodes and merge multiple DAGs into a larger DAG. 2) Operate on the machine instructions and machine basic blocks after instruction scheduling and redo scheduling afterwards. Evan> > Or should I just leave the load/stores alone, and somehow, later > after the machine instructions are created predicate them using > this predicate information? If so, how do I ensure that the > dependency on the predicate generating instructions is preserved > (the predicate instructions are not dce'd away)? > > Or should I be trying to do the predication right before Legalize? > If so, I'll need to think about it a little more. > > Thank you for your time :) > nikhil > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On 08/10/2007, Evan Cheng <evan.cheng at apple.com> wrote:> > On Oct 7, 2007, at 12:51 AM, Nikhil A. Patil wrote: > > > Hi, > > > > I am planning to generate code for a peculiar architecture with > > _no_ branch instructions (!), but with predicated loads and stores > > to memory. This means the architecture is not Turing complete, is > > going to waste a lot of computation, and any input program that can > > hope to get compiled for this architecture must have loops that can > > be fully unrolled, and all its functions must get fully inlined. > > > > Since I have no branches, I think I will need to predicate code > > before DAG Selection. I was thinking of doing this: > > 1) Run an analysis on the LLVM bitcode that calculates the > > condition under which an instruction will be "reached" -- this is > > straightforward since my input program is not allowed to have > > cycles in the CFG. > > 2) Run a pass that requires the above analysis and uses it to: > > - merge all basic blocks in topological sort order (which > > exists, because CFG is acyclic). > > - insert appropriate instructions to generate the predicates. > > - change all PHI-nodes into Select nodes. > > - predicate memory operations (well, at least the stores). > > I am not aware of a mechanism to completely remove branches for any > general program. Don't you have to restrict it a loop with known > iteration count?Yes, I am going to have to live with that restriction. I also need to assume no recursive functions, so all calls can be fully inlined.> > > > > It is this final predication step that I am not sure how to handle. > > Since LLVM does not have predicated load/store instructions, will I > > have to upgrade the memory operations to a call or something that > > would then have to be custom handled by the SelectionDAG mechanism? > > Target specific instructions can have predicate operands. See ARM. > I've implemented a if-converter but it runs after register allocation > so it won't fit your needs. > > Ideally (because that's what everyone does) you want a pass that > operate on 3-address machine specific instructions. Then you can > predicate code and merge blocks together before you do scheduling. > The complication our codegen scheme is we do instruction scheduling > on DAGs and only translate into 3-address code at the end of scheduling. > > Sometime down the road we will be implementing whole function > instruction scheduling. Once that's done, you can perhaps do a > predication pass that operate on the DAG that represent the whole > function. But that's not going to help you right now. So for now, > your choice is either 1) Operate on DAGs that represent basic blocks, > predicate nodes and merge multiple DAGs into a larger DAG. 2) > Operate on the machine instructions and machine basic blocks after > instruction scheduling and redo scheduling afterwards.Thanks, that solves my confusion. Actually I did think of this earlier but rejected the idea since this will mean that the Legalize phase is going to generate "illegal" output -- it will still have branches in it. I guess that can't be avoided anyway, because all input programs aren't even legal :) Also, is there a straightforward way to directly go from LLVM bitcode to some generic form of machine instructions -- something like lib/Target/Generic? The docs mention an "old-style 'simple' instruction selector, which effectively peephole selects each LLVM instruction into a series of machine instructions." Is this still available? Thanks, nikhil> > Evan > > > > > Or should I just leave the load/stores alone, and somehow, later > > after the machine instructions are created predicate them using > > this predicate information? If so, how do I ensure that the > > dependency on the predicate generating instructions is preserved > > (the predicate instructions are not dce'd away)? > > > > Or should I be trying to do the predication right before Legalize? > > If so, I'll need to think about it a little more. > > > > Thank you for your time :) > > nikhil > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >