Jeff thanks for those suggestions, that's exactly what i would like to do, however i dont know how to do it with my current knowledge :\ As far as i understand patterns only take one instruction as an input (while the pattern you wrote before takes two) and also, i dont know how to handle register copying (COPY) in the .td file because they're handled in a different way to the rest of instructions. Jakob i know my approach is not very suitable watching the results i got. The example i wrote in my previous email was just a trivial function to see if what i was trying to do was good. Without doing what i mentioned and letting LLVM expand all operations wider than 8 bits as you asked, the code produced is excellent supposing that many of the moves there should be 16 bit moves reducing code size and right register allocation, also something important for me is that the code is better than gcc's. When i say right reg allocation it doesnt mean it's doing things wrong, i mean it's getting regs freely without pairing regs because i dont know how to do it. So now i have to push things further and implement these details to make the backend introduce those 16 bit instructions i dont know how to insert, and this is where i need help. My 2 big questions are: - How to tell the register allocator to store 16 bit data in two contiguous 8 bit regs being the low part an even reg? Remember data would be expanded into 8 bit ops, so when we're working with dags after type legalization do we really know the original type before expansion? As an example, storing a short in r21:r20 would be valid, but r20:r19 or r20:r18 would be invalid because the in the first case the low reg is odd and in the second case regs arent contiguous. To store data wider than 16 bits, for example for a 32 bit int we would use 2 register pairs (4 8bit regs) but here the pairs dont need to be contiguous so storing it r25:r24:r19:r18 is completely fine. As i said in my previous email this is achieved by using HARD_REGNO_MODE_OK in gcc. - Second, assuming the previous point works so the register constraints are done, how would i then proceed and combine two 8 bit instructions into a 16 bit one as Jeff pointed out in his email? For example i want to combine a 16 bit add like this: // b = b + 1: (b stored in r25:r24) add r24, 1 adc r25, 0 into adw r25:r24, 1 and the one im talking all my mails about move r25, r23 move r24, r22 into movw r25:r24, r23:r22 or move r18, r2 move r19, r3 into movw r19:r18, r3:r2 any combination of moves with reg pairs are valid. I wrote a function pass to test, it scanned for moves and checked if next instruction was a move to see if globally it was a 16 move and replace those 2 insts with a movw. But this pass lacked lots of details for example if a different instruction was placed in between the moves it wouldnt be able to detect it. Your help is really appreciated, and thanks to both for your patience. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101201/fc30c2c6/attachment.html>
Hi Borja,> Without doing what i mentioned and letting LLVM expand all operations wider > than 8 bits as you asked, the code produced is excellent supposing that many > of the moves there should be 16 bit moves reducing code size and right > register allocation, also something important for me is that the code is > better than gcc's. When i say right reg allocation it doesnt mean it's doing > things wrong, i mean it's getting regs freely without pairing regs because i > dont know how to do it. So now i have to push things further and implement > these details to make the backend introduce those 16 bit instructions i dont > know how to insert, and this is where i need help. >I'm not sure how to help with inserting the 16-bit instructions, but if you go with using allocating 8-bit virtuals the PBQP allocator can handle the pairing constraint. An overview of how to use the allocator is given in http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034781.html , and the follow up emails. If you think it might be useful and have any questions I'd be happy to help. Cheers, Lang. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101202/ccc325d1/attachment.html>
Hello Lang, thanks for the suggestion :) it's very interesting. I'll take a read to the email you've pointed out there to understand how it works. Btw, does this mean that only your allocator is able to handle or support this type of constraint? As a follow up to my previous email, the following code is a real example of what i was explaining, Lang this example is exactly why i need the constraints and how to combine instructions. typedef short t; extern t mcos(t a); extern t mdiv(t a, t b); t foo(t a, t b) { short p1 = mcos(b); short p2 = mcos(a); return mdiv(p1&p2, p1^p2); } This C code produces: ; a<- r25:r24 b<--r23:r22 mov r18, r24 mov r19, r25 <-- can be combined into a movw r19:r18, r25:r24 mov r25, r23 mov r24, r22 <-- can be combined into a movw r25:r24, r23:r22 call mcos ; here we have the case i was explaining, pairs dont match because they're the other way round, function result is in r25:r24 ; but it's storing the hi part in r20 instead of r21, so we cant insert a movw mov r20, r25 mov r21, r24 <--- should be mov r21, r25; mov r20, r24 to be able to insert a movw mov r25, r19 mov r24, r18 <-- can be combined into a movw r25:r24, r19:r18 call mcos ; same problem as above, again it's moving the hi part in r25 into r18 instead of r19 so we've lost another movw here mov r18, r25 <-- should be mov r19, r25 and r18, r20 mov r19, r24 <-- should be mov r18, r24 and r19, r21 mov r23, r25 <-- this * eor r23, r20 mov r22, r24 <-- * and this could be combined into movw r23:r22, r25:r24 eor r22, r21 mov r25, r18 mov r24, r19 <-- because the result returned by the second call to mcos is stored in r18:r19 (other way round) ; we've lost another movw call mdiv ret As you can see there are three cases were we've lost the opportunity of inserting a movw. This is a very important instruction because if we're able to insert it everywhere it's possible we can reduce the code in 7 instructions and 7 cycles which is a huge gain. Ignoring these issues the code produced is optimal. I hope this example make things more clearer. Also, as i said in my previous email i wrote a function pass searching for 2 moves in a row and combining them into a movw since i dont know a better way of combining machine instructions, however in this case it wouldnt work for the second half of the program because there are "and" and "xor" instructions in between the moves breaking the simple heuristic the function pass has. Thanks for the help. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101205/4337dad7/attachment.html>