Hi, I'm retargetting llvm for a new architecture, basiclaly a target that has v4f32 registers and a few operations, so it's mostly common stuff. However, it also has some kind of options on top of the operations, kinda like VLIW. For example, there is an addition operation, but there is also a source operand negation bit which you can set. So you can implement SUB using the ADD opcode (and there is actually no SUB opcode proper). Besides negating the source operands, there are also bits to take the absolute value of the source operands as input, bits to scale the result by power of 2 constants, and bits to do arbitrary shuffling of the 4 components. Now obviously I'd like to take advantage of that. I have taken a look at the ia64 target which has similar requirements, and seen that it gets around the problem by using an external optimizing assembler that reorders and packs instructions into bundles. However since my requirements are so simple I'm hoping I can do everything inside llvm (and I'm only interested in JIT anyway). So I guess the question became obvious by now: how can I do this in llvm? I've been thinking about a couple of solutions: - Use the tablegen template system to generate all the variants for eacho operation (with/without absolute value, with/without negation, with/without post-scaling, with/without all kinds of shuffling, and all combinations of those). Is this possible ? We're talking about approximately 70000 variants per instruction here, approximately 300 if we ignore the shuffling which has 256 combinations by itself. - Use an additional pass that reorders and packs instructions together (similary to what the ia64 optimizing assembler does), maybe relying on the instruction scheduler to place the instructions for me. - Rewrite some of tablegen to support that feature. I'm under the impression it would be a lot of work for a very specific case, but I could be wrong here. Ideas ? Thanks in advance, Stephane
I'm targeting an machine where the operands have a number of operations that are orthogonal to the instruction, but have the good fortune of there being an aggressively optimizing post pass. I've considered the same option - "folding" the operand options into the instructions, but as you point out, this causes an explosion in the number of instructions. Because of the post pass, I can afford to sequence operand options, and count on later optimization. However, it would be nice for me (necessary for you) to be more direct. I think an extended set of control bits in operands would be the most natural approach (to those of us used to working with such machines, at least), but you'd like to be able to have transformations in tablegen that let you do things like (fneg operand) -> operand with negate bit flipped I'm not sure exactly how this could be expressed (I'm admittedly new to compilers in general, LLVM in particular, so my limitations generally get in the way before LLVM's - my knowledge is more in the target machine). Dan On Sep 25, 2008, at 9:51 AM, Stephane Marchesin wrote:> Hi, > > I'm retargetting llvm for a new architecture, basiclaly a target that > has v4f32 registers and a few operations, so it's mostly common stuff. > However, it also has some kind of options on top of the operations, > kinda like VLIW. For example, there is an addition operation, but > there is also a source operand negation bit which you can set. So you > can implement SUB using the ADD opcode (and there is actually no SUB > opcode proper). Besides negating the source operands, there are also > bits to take the absolute value of the source operands as input, bits > to scale the result by power of 2 constants, and bits to do arbitrary > shuffling of the 4 components. Now obviously I'd like to take > advantage of that. > > I have taken a look at the ia64 target which has similar requirements, > and seen that it gets around the problem by using an external > optimizing assembler that reorders and packs instructions into > bundles. However since my requirements are so simple I'm hoping I can > do everything inside llvm (and I'm only interested in JIT anyway). > > So I guess the question became obvious by now: how can I do this in > llvm? I've been thinking about a couple of solutions: > - Use the tablegen template system to generate all the variants for > eacho operation (with/without absolute value, with/without negation, > with/without post-scaling, with/without all kinds of shuffling, and > all combinations of those). Is this possible ? We're talking about > approximately 70000 variants per instruction here, approximately 300 > if we ignore the shuffling which has 256 combinations by itself. > - Use an additional pass that reorders and packs instructions together > (similary to what the ia64 optimizing assembler does), maybe relying > on the instruction scheduler to place the instructions for me. > - Rewrite some of tablegen to support that feature. I'm under the > impression it would be a lot of work for a very specific case, but I > could be wrong here. > > Ideas ? > > Thanks in advance, > Stephane > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Sep 25, 2008, at 6:51 AM, Stephane Marchesin wrote:> > So I guess the question became obvious by now: how can I do this in > llvm? I've been thinking about a couple of solutions: > - Use the tablegen template system to generate all the variants for > eacho operation (with/without absolute value, with/without negation, > with/without post-scaling, with/without all kinds of shuffling, and > all combinations of those). Is this possible ? We're talking about > approximately 70000 variants per instruction here, approximately 300 > if we ignore the shuffling which has 256 combinations by itself. > - Use an additional pass that reorders and packs instructions together > (similary to what the ia64 optimizing assembler does), maybe relying > on the instruction scheduler to place the instructions for me. > - Rewrite some of tablegen to support that feature. I'm under the > impression it would be a lot of work for a very specific case, but I > could be wrong here.LLVM codegen is missing some pieces for wide machines. So you should expect to roll up your sleeves and write your own. :-) I am not sure if I completely understand the description. Is an instruction bundle made up of fixed number operations? If so, you could model each operation as a separate machine instruction. You can add a "bundle formation" (using e.g. bottom up greedy or clustering technique) pass to reorder instructions to form the bundles. Evan> > > Ideas ? > > Thanks in advance, > Stephane > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thu, Sep 25, 2008 at 5:59 PM, Evan Cheng <evan.cheng at apple.com> wrote:> > On Sep 25, 2008, at 6:51 AM, Stephane Marchesin wrote: > > > LLVM codegen is missing some pieces for wide machines. So you should > expect to roll up your sleeves and write your own. :-)Yeah, I'm not sure if that could be tackled by a newcomer though...> > I am not sure if I completely understand the description. Is an > instruction bundle made up of fixed number operations?Well, the bundle has one "major" operation (add, mul, mad...) and a couple of "additional" operations that will modify the inputs/outputs of the major operation. It's not exactly VLIW in that the additional operations don't change the value of a register, but instead change the inputs and outputs of the major operation. Or to put it differently, you could see the additional operations as some kind of data routing to and from the major operation. And yes, there is always exactly one major operation, and there are bits allocated for each of the additional operations for each input/output. So you can do abs+neg+shuffle on all inputs and scaling on the output, and the instruction is still 128 bit.> If so, you > could model each operation as a separate machine instruction. You can > add a "bundle formation" (using e.g. bottom up greedy or clustering > technique) pass to reorder instructions to form the bundles.Yes, that's the second point I had in mind. That solution is becoming increasingly interesting to me since I don't think I can tackle modifying llvm overnight :) Stephane
Maybe Matching Threads
- [LLVMdev] Generating instruction bundles ?
- [Bug 14821] New: nv11 does not work with Randr1.2 enabled
- geforce 4 mx + 2.6.22 problem
- [Bug 14820] New: Double mouse cursor on nv4
- [Bug 16765] New: white rectangles instead of fonts and bitmaps lost when switching to virtual console and back