Вадим Марковцев
2011-Feb-18 06:35 UTC
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
Hello everyone, I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched. Besides, I propose the codegen optimization based on them, which removes the redundant comparison in patterns like orr r1, r2, r3 ----> orrs r1, r2, r3 cmp r1, 0 This optimization has shown nice acceleration, e.g. 3.3% in SQLite on CortexA8 and works fine. I have some questions though. 1)"neverHasSideEffects" in tablegen means that CPSR is not implicitly defined, doesn't it? 2)What else can be done using that super "S" power? 3)Current optimization implementation works similar to peephole (peephole pitiful cmp optimization was disabled), right before ifcvt. Should I raise it up somewhere? What do you think is the right place for such thing? 4)Consider the following C code: int a, b, c; ... a = b * c; if (a > 0) { ... } One gets the corresponding ARM assembler mul r(a), r(b), r(c) cmp r(a), 1 blt LABEL // r(x) is the register where x is The other cases ("if (a == 0)", "if (a < 0)") produce expected cmp r(a), 0 So what is the hidden idea of this resultant comparison with 1? Where should I look for the code behind that logic? Thanks, Vadim Markovtsev, ISP RAS. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110218/1ab28267/attachment.html>
On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote:> Hello everyone, > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to tablegen. Those are, for example, "movs" or "muls". > Of course, some instructions have already had their twins, such as add/adds, and I leaved them untouched.Adding separate "s" instructions is not the right thing to do. We've been trying hard to avoid adding those "twins". The instructions that can optionally set the condition codes have an "optional def" operand. For example, look at the "cc_out" operand in the "sI" class defined in ARMInstrFormats.td. If that operand is set to the CPSR register, then the instruction becomes the "s" variant. There are some existing peephole optimizations to make use of this, but there are some unresolved issues as well. Do you have some example testcases that show where we're missing opportunities?
Вадим Марковцев
2011-Feb-28 09:23 UTC
[LLVMdev] Adding "S" suffixed ARM/Thumb2 instructions
I've just revised the current LLVM trunk.>>Adding separate "s" instructions is not the right thing to do. We've beentrying hard to avoid adding those "twins". The instructions that can optionally set the condition >>codes have an "optional def" operand. For example, look at the "cc_out" operand in the "sI" class defined in ARMInstrFormats.td. If that operand is set to the CPSR >>register, then the instruction becomes the "s" variant. Alright, but everything is not so shiny as one may expect. For example, when I set "mov" instruction to define CPSR, generated assembler is still "mov", not "movs". "movs" is absolutely correct instruction which sets CPSR. The same operation on "add" brings the desired effect. So, if one should go the way you propose instead of adding separate instructions to tablegen, what he or she has to modify in LLVM code to resolve such issues? There are lots of similar instructions unsupported by LLVM which SURE HAVE a suffixed twin.>>There are some existing peephole optimizations to make use of this, butthere are some unresolved issues as well. Do you have some example testcases that show where >>we're missing opportunities? Oh yeah. Consider the following existing peephole optimization: PeepholeOptimizer.cpp->PeepholeOptimizer::OptimizeCmpInstr->ARMBaseInstrInfo::OptimizeCompareInstr. case ARM::ADDri: case ARM::ANDri: case ARM::t2ANDri: case ARM::SUBri: case ARM::t2ADDri: case ARM::t2SUBri: // Toggle the optional operand to CPSR. MI->getOperand(5).setReg(ARM::CPSR); MI->getOperand(5).setIsDef(true); CmpInstr->eraseFromParent(); return true; ...and that's all, however this switch should be giant (88 instructions instead of 6 can be supported so far). Yet another question unclear to me is what the origin of the comment above // Set the "zero" bit in CPSR. is. Why not also "negative"? Moreover, that peephole thing particularly can be dramatically improved with some advanced analysis. For example, consider the following program: #include <stdio.h> int main() { srand(time(NULL)); int x, y; x = rand(); y = rand(); int z = x * y; if (z == 0) { printf("Zero"); } z = x|y; if (z > 0) { printf("Greater"); } else { printf("Smaller"); } return 0; } It compiles to .syntax unified .cpu cortex-a8 .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .eabi_attribute 10, 2 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .file "test.bc" .text .globl main .align 2 .type main,%function main: @ @main @ BB#0: @ %entry push {r4, r5, r11, lr} mov r0, #0 bl time bl srand bl rand mov r4, r0 bl rand mov r5, r0 mul r0, r5, r4 cmp r0, #0 bne .LBB0_2 @ BB#1: @ %bb movw r0, :lower16:.L.str movt r0, :upper16:.L.str bl printf .LBB0_2: @ %bb1 orr r0, r5, r4 cmp r0, #1 blt .LBB0_5 @ BB#3: @ %bb2 movw r0, :lower16:.L.str1 movt r0, :upper16:.L.str1 .LBB0_4: @ %bb2 bl printf mov r0, #0 ldmia sp!, {r4, r5, r11, pc} .LBB0_5: @ %bb3 movw r0, :lower16:.L.str2 movt r0, :upper16:.L.str2 b .LBB0_4 .Ltmp0: .size main, .Ltmp0-main .type .L.str,%object @ @.str .section .rodata,"a",%progbits .align 2 .L.str: .asciz "Zero" .size .L.str, 5 .type .L.str1,%object @ @.str1 .align 2 .L.str1: .asciz "Greater" .size .L.str1, 8 .type .L.str2,%object @ @.str2 .align 2 .L.str2: .asciz "Smaller" .size .L.str2, 8 At the same time, my optimization produces .syntax unified .cpu cortex-a8 .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .eabi_attribute 10, 2 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .file "test.bc" .text .globl main .align 2 .type main,%function main: @ @main @ BB#0: @ %entry push {r4, r5, r11, lr} mov r0, #0 bl time bl srand bl rand mov r4, r0 bl rand mov r5, r0 muls r0, r5, r4 bne .LBB0_2 @ BB#1: @ %bb movw r0, :lower16:.L.str movt r0, :upper16:.L.str bl printf .LBB0_2: @ %bb1 orrs r0, r5, r4 ble .LBB0_5 @ BB#3: @ %bb2 movw r0, :lower16:.L.str1 movt r0, :upper16:.L.str1 .LBB0_4: @ %bb2 bl printf mov r0, #0 ldmia sp!, {r4, r5, r11, pc} .LBB0_5: @ %bb3 movw r0, :lower16:.L.str2 movt r0, :upper16:.L.str2 b .LBB0_4 .Ltmp0: .size main, .Ltmp0-main .type .L.str,%object @ @.str .section .rodata,"a",%progbits .align 2 .L.str: .asciz "Zero" .size .L.str, 5 .type .L.str1,%object @ @.str1 .align 2 .L.str1: .asciz "Greater" .size .L.str1, 8 .type .L.str2,%object @ @.str2 .align 2 .L.str2: .asciz "Smaller" .size .L.str2, 8 You should pay attention to "muls" instead of "mul" (lack of support) and "orrs" instead of "orr" (advanced analysis). 18 февраля 2011 г. 21:49 пользователь Bob Wilson <bob.wilson at apple.com>написал:> > On Feb 17, 2011, at 10:35 PM, Вадим Марковцев wrote: > > > Hello everyone, > > > > I've added the "S" suffixed versions of ARM and Thumb2 instructions to > tablegen. Those are, for example, "movs" or "muls". > > Of course, some instructions have already had their twins, such as > add/adds, and I leaved them untouched. > > Adding separate "s" instructions is not the right thing to do. We've been > trying hard to avoid adding those "twins". The instructions that can > optionally set the condition codes have an "optional def" operand. For > example, look at the "cc_out" operand in the "sI" class defined in > ARMInstrFormats.td. If that operand is set to the CPSR register, then the > instruction becomes the "s" variant. > > There are some existing peephole optimizations to make use of this, but > there are some unresolved issues as well. Do you have some example > testcases that show where we're missing opportunities?-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110228/79217629/attachment.html>