hi, we're trying to come up with a static compiler based on llvm for a 4- way vliw architecture with full support for predicated execution. a distinguishing feature is the omission of flag registers. instead, conditions can be paired with a particular instruction within the same bundle. a performance critical issue will be proper use of predicated execution. if-conversion can either be performed early in the code generation process [1] (exposing larger basic blocks to the optimizers) or deferred until code generation is almost complete [2]. for the time being, i'm planning to go with the second approach and have a late optimization pass over the selected machine instructions that a] preliminarily schedules and bundles the selected instructions b] speculatively executes instructions in the predecessor block if there are unused resources c] converts blocks B into a appropriately predicated version (eliminating branches) if it's profitable for the particular architecture. this approach will require only some minor extensions to the existing class hierarchy to represent instruction bundles and predicates. additionally, it appears to be more reasonable to make the decision what to execute conditionally late in the compilation process when we can easily account for architectural constraints than perform if- conversion early and undoing it if necessary. also, previous work [2] indicates, that this approach is able to retain almost all opportunities for if-conversion present in the input program. for what i've seen so far (i'm not yet fully familiar with llvm), there's almost no existing support for predicated execution. however, architectures such as arm and ia64 should show a significant speedup. are there already people working on those enhancements or are there any short-term plans? also, does anybody have strong objections against the approach outlined above? any comments are kindly welcome! cheers and thanks in advance, - dietmar [1] August et al. : A Framework for Balancing Control Flow and Predication http://portal.acm.org/citation.cfm?id=266800.266810 [2] Snavely et al. : Predicate Analysis and If-Conversion in an Itanium Link-Time Optimizer http://systems.cs.colorado.edu/EPIC2/papers/s3-3-snavely.pdf --------------------------------------------------------------------- Dietmar Ebner CD Laboratory - Compilation Techniques for Embedded Processors Institut fuer Computersprachen E: ebner at complang.tuwien.ac.at Technische Universitaet Wien T: (+431) 58801-18598 Argentinierstrasse 8 / E1851 F: (+431) 58801-58521 1040 Wien, Austria W: www.complang.tuwien.ac.at/cd/ebner
On Mar 7, 2007, at 6:45 AM, Dietmar Ebner wrote:> hi, > > we're trying to come up with a static compiler based on llvm for a 4- > way vliw architecture with full support for predicated execution. a > distinguishing feature is the omission of flag registers. instead, > conditions can be paired with a particular instruction within the > same bundle.Ok.> a performance critical issue will be proper use of predicated > execution. if-conversion can either be performed early in the code > generation process [1] (exposing larger basic blocks to the > optimizers) or deferred until code generation is almost > complete [2].Option 1 makes a lot of sense, most predication aware compilers go this route. I haven't read the paper. But I am guessing their system is rediscovering CFG etc. at link time, reverse compiling assembly code into some internal representation? That is an interesting approach for a company who is servicing clients who do not provide their source code. If you use LLVM and you build option 1, you can do something like option 2 without a lot of horribleness. :-) LLVM does link time optimization at bytecode level.> > for the time being, i'm planning to go with the second approach and > have a late optimization pass over the selected machine instructions > that > a] preliminarily schedules and bundles the selected instructions > b] speculatively executes instructions in the predecessor block if > there are unused resources > c] converts blocks B into a appropriately predicated version > (eliminating branches) if it's profitable for the particular > architecture.Ok. It doesn't sound like option 2 though (it has nothing to do with LTO). Item (b) sounds like meld scheduling, if that what you are considering?> > this approach will require only some minor extensions to the existing > class hierarchy to represent instruction bundles and predicates. > additionally, it appears to be more reasonable to make the decision > what to execute conditionally late in the compilation process when we > can easily account for architectural constraints than perform if- > conversion early and undoing it if necessary. also, previous work [2] > indicates, that this approach is able to retain almost all > opportunities for if-conversion present in the input program.Sounds reasonable especially given the current LLVM infrastructure. The current register allocator is not predication aware. Therefor if- conversion too aggressively would over-subscribe the available resources. Once you have the exact instruction count and register requirement, you should be able to make a better if-conversion decision.> > for what i've seen so far (i'm not yet fully familiar with llvm), > there's almost no existing support for predicated execution. however, > architectures such as arm and ia64 should show a significant speedup. > are there already people working on those enhancements or are there > any short-term plans? also, does anybody have strong objections > against the approach outlined above? any comments are kindly welcome!I don't know enough about your target architecture to critique your choice. This isn't the "traditional" flow for a fully predicated VLIW model, but it's likely you can get it up and running at reasonable time frame. LLVM does not have much support for predicated execution. You can introduce predicate register class, but the backend passes would not treat them differently. However, you are in luck! It's extremely likely we will be starting on predication work for the ARM backend in the very near future. ARM has an extremely limited predication model (just condition codes), so it's not yet clear how much of our work will be applicable to your needs. But we'll try to keep our work as target independent as possible. Evan> > > cheers and thanks in advance, > > - > dietmar > > [1] August et al. : A Framework for Balancing Control Flow and > Predication > http://portal.acm.org/citation.cfm?id=266800.266810 > > [2] Snavely et al. : Predicate Analysis and If-Conversion in an > Itanium Link-Time Optimizer > http://systems.cs.colorado.edu/EPIC2/papers/s3-3-snavely.pdf > > > --------------------------------------------------------------------- > Dietmar Ebner > CD Laboratory - Compilation Techniques for Embedded Processors > Institut fuer Computersprachen E: ebner at complang.tuwien.ac.at > Technische Universitaet Wien T: (+431) 58801-18598 > Argentinierstrasse 8 / E1851 F: (+431) 58801-58521 > 1040 Wien, Austria W: www.complang.tuwien.ac.at/cd/ebner > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Evan, thanks for your detailed answer! On Mar 8, 2007, at 10:03 AM, Evan Cheng wrote:>> a performance critical issue will be proper use of predicated >> execution. if-conversion can either be performed early in the code >> generation process [1] (exposing larger basic blocks to the >> optimizers) or deferred until code generation is almost >> complete [2]. > > Option 1 makes a lot of sense, most predication aware compilers go > this route. I haven't read the paper. But I am guessing their system > is rediscovering CFG etc. at link time, reverse compiling assembly > code into some internal representation? That is an interesting > approach for a company who is servicing clients who do not provide > their source code. > > If you use LLVM and you build option 1, you can do something like > option 2 without a lot of horribleness. :-) LLVM does link time > optimization at bytecode level.right, i agree option 1 might be favorable in many terms. however, this would involve major changes in various components (instruction selection, register allocation, probably various optimization passes) and i don't have the time now to do this. also, it would be very hard to predict on the llvm level how code will look like after code generation and register allocation, i.e., aggressive if-conversion and hyperblock generation will probably require partial reverse if- conversion to be effective.>> for the time being, i'm planning to go with the second approach and >> have a late optimization pass over the selected machine instructions >> that >> a] preliminarily schedules and bundles the selected instructions >> b] speculatively executes instructions in the predecessor block if >> there are unused resources >> c] converts blocks B into a appropriately predicated version >> (eliminating branches) if it's profitable for the particular >> architecture. > > Ok. It doesn't sound like option 2 though (it has nothing to do with > LTO). Item (b) sounds like meld scheduling, if that what you are > considering?not exactly and yes, it has nothing to do with LTO. the idea is just to move instructions to the predecessor block (if there is only one), if they can execute there (conditionally) for free, i.e., there are unused slots/nops that might be replaced with a particular instruction. this decreases the size of blocks for (c).> However, you are in luck! It's extremely likely we will be starting > on predication work for the ARM backend in the very near future. ARM > has an extremely limited predication model (just condition codes), so > it's not yet clear how much of our work will be applicable to your > needs. But we'll try to keep our work as target independent as > possible.thats what i was hoping. for the time being, i need a fast implementation that gets me an idea of the performance that can be reached with the existing infrastructure (our target doesn't exist yet in silicon but we do have a simulator). i'll report first results as they become available. cheers, - dietmar