Gordon reminded me that most people seem to generate code using code, whereas it has become natural for me to use templates to generate code. Let me include an example from an article I'm working on (Java centric). The same argument goes for LLVM IR templates versus using the C++ interface. Generating LLVM IR for me is super simple. Here are a few of my templates for generating IR from simple expressions with arrays (only i32 in this example): load(id,reg) ::= "<reg> = load i32* %<id>" /** reg = *(id+a) and addr is a temporary register name. a is the template with code to compute the index */ index(reg,addr,size,id,a) ::= << <a> <addr> = getelementptr [<size> x i32]* %<id>, i32 0, i32 <a.reg>; <reg> = load i32* <addr> >> intval(reg,v) ::= "<reg> = add i32 <v>,0" Creating instances of those templates and setting the attribute that fill in the holes is very simple from an ANTLR grammar. :) Just a few thoughts in case people are wondering about the various approaches. regards, Ter ------------- Imagine that you would like to generate the Java byte codes using a program for the following java code: System.out.println("Hello"); The javac byte code compiler will generate the following byte codes: getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Hello" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V Either you use templates that render to text or you use a library such as BCEL to create a data structure that will render to text. Here is a StringTemplate template definition that will generated byte codes to print any string. println(s) ::= << getstatic java/lang/System/out Ljava/io/PrintStream; ldc "<s>" invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V >> Then, to generate the byte codes above, just create an instance of template println and set its attribute to the appropriate string. Using ANTLR, this is particularly easy /** Match things like print "hi"; */ printStat : 'print' STRING ';' -> println(s={$STRING.text}) ; Now compare that to what you would have to do using Java code to create a data structure: InstructionHandle start il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, Constants.GETSTATIC)); il.append(new PUSH(cp, "Hello")); il.append(factory.createInvoke("java.io.PrintStream", "print", Type.VOID, new Type[] { Type.STRING }, Constants.INVOKEVIRTUAL)); Templates differ from code snippets in that templates specify exactly what the output looks like whereas you have to imagine the emergent behavior of code snippets. Certainly writing templates is much faster. The byte code template is one third the size of the Java code necessary to build the structure representing the byte codes.
Hi Terence, The reason I remarked, actually, is that for LLVM in particular the C+ + API offers more safety, and emitting .ll generally requires at least partially reimplementing the IR object model. I think this is a topic in the FAQ. But both are perfectly valid approaches! - Gordon On Apr 23, 2008, at 14:04, Terence Parr <parrt at cs.usfca.edu> wrote:> Gordon reminded me that most people seem to generate code using code, > whereas it has become natural for me to use templates to generate > code. Let me include an example from an article I'm working on (Java > centric). The same argument goes for LLVM IR templates versus using > the C++ interface. Generating LLVM IR for me is super simple. Here > are a few of my templates for generating IR from simple expressions > with arrays (only i32 in this example): > > load(id,reg) ::= "<reg> = load i32* %<id>" > > /** reg = *(id+a) and addr is a temporary register name. a is the > template with code to compute the index */ > index(reg,addr,size,id,a) ::= << > <a> > <addr> = getelementptr [<size> x i32]* %<id>, i32 0, i32 <a.reg>; > <reg> = load i32* <addr> >>> > > intval(reg,v) ::= "<reg> = add i32 <v>,0" > > Creating instances of those templates and setting the attribute that > fill in the holes is very simple from an ANTLR grammar. :) > > Just a few thoughts in case people are wondering about the various > approaches. > > regards, > Ter > ------------- > Imagine that you would like to generate the Java byte codes using a > program for the following java code: > > System.out.println("Hello"); > > The javac byte code compiler will generate the following byte codes: > > getstatic java/lang/System/out Ljava/io/PrintStream; > ldc "Hello" > invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V > > Either you use templates that render to text or you use a library such > as BCEL to create a data structure that will render to text. Here is a > StringTemplate template definition that will generated byte codes to > print any string. > > println(s) ::= << > getstatic java/lang/System/out Ljava/io/PrintStream; > ldc "<s>" > invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V >>> > > Then, to generate the byte codes above, just create an instance of > template println and set its attribute to the appropriate > string. Using ANTLR, this is particularly easy > > /** Match things like print "hi"; */ > printStat > : 'print' STRING ';' -> println(s={$STRING.text}) > ; > > Now compare that to what you would have to do using Java code to > create a data structure: > > InstructionHandle start > il.append(factory.createFieldAccess("java.lang.System", "out", > p_stream, > Constants.GETSTATIC)); > il.append(new PUSH(cp, "Hello")); > il.append(factory.createInvoke("java.io.PrintStream", "print", > Type.VOID, > new Type[] { Type.STRING }, > Constants.INVOKEVIRTUAL)); > > Templates differ from code snippets in that templates specify exactly > what the output looks like whereas you have to imagine the emergent > behavior of code snippets. Certainly writing templates is much > faster. The byte code template is one third the size of the Java code > necessary to build the structure representing the byte codes. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Apr 23, 2008, at 5:08 PM, Gordon Henriksen wrote:> Hi Terence, > > The reason I remarked, actually, is that for LLVM in particular the C+ > + API offers more safety, and emitting .ll generally requires at least > partially reimplementing the IR object model. I think this is a topic > in the FAQ. But both are perfectly valid approaches!:) Yep,The advantage of the code-based mechanism is that the compiler, because of that static typing, can do all sorts of well formedness checking for you. A major advantage of that approach and a weakness of the template-based approach. My main goal is programmer productivity. An extreme example illustrates my point: imagine a bunch of support functions you need to generate in the output code (that cannot be linked in from C, for example). What would you rather type in: IR in text or the massive amount of code to build up the object model? Surely, in your head you are imagining the target IR language (we evolved over millions of years to be good at extracting and generating structure from linear sequences of symbols). :) Ter