Hi, I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this is exactly what I would like to handle: a simple loop with an address that is inremented in each iteration. I am considering using a target intrinsic for this purpose, as the SCEV interface is available on the LLVM I/R. In this way, I could get a DAG with a post-inc-load node instead of the load and add nodes. Is this a work in progress? Please explain why these constraints are put in the above mentioned method as they do not seem to facilitate post-inc instruction combining. Best regards, Jonas Paulsson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110128/f92d6159/attachment.html>
Hello> Is this a work in progress? Please explain why these constraints are put in > the above mentioned method as they do not seem to facilitate post-inc > instruction combining.Have you looked how the stuff is implemented in the existing backends? At least ARM and MSP430 have post-inc stuff working. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
On Jan 27, 2011, at 11:13 PM, Jonas Paulsson wrote:> Hi, > > I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this > is exactly what I would like to handle: a simple loop with an address that is inremented in each iteration. > > I am considering using a target intrinsic for this purpose, as the SCEV interface is available on the LLVM I/R. In this way, I could get a DAG with a post-inc-load node instead of the load and add nodes. > > Is this a work in progress? Please explain why these constraints are put in the above mentioned method as they do not seem to facilitate post-inc instruction combining.The "predecessor" and "successor" terminology used there refers to the DAG, not to the order of the operations in the llvm IR. For example, if the result of the ADD is the value being stored to memory, then you couldn't fold that into into a post-inc STORE: %x = add i32 %addr, 4; store i32 %x, i32* %addr In the DAG for that, the ADD is a predecessor of the STORE. If the result of the add is used for some other memory reference, then it would not be a predecessor and could be folded. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110128/3301f7a9/attachment.html>
When I compile the following program (for ARM): for(i=0;i<n2;i+=n3) { s+=a[i]; } , with GCC, I get the following loop body, with a post-modify load: .L4: add r1, r1, r3 ldr r4, [ip], r6 rsb r5, r3, r1 cmp r2, r5 add r0, r0, r4 bgt .L4 With LLVM, however, I get: .LBB0_3: @ %for.body @ =>This Inner Loop Header: Depth=1 add r12, lr, r3 ldr lr, [r0, lr, lsl #2] add r1, lr, r1 cmp r12, r2 mov lr, r12 blt .LBB0_3 , which does not seem to be auto-incrementing, I think. I wonder what I should do to get loops auto-incing generally, for instance in this simple loop: for(i=0;i<256;i++) { s+=a[i]; } , which now yields .LBB0_1: @ %for.body @ =>This Inner Loop Header: Depth=1 ldr r3, [r0, r2] add r2, r2, #4 add r1, r3, r1 cmp r2, #1, 22 @ 1024 bne .LBB0_1 , which uses r0 as base address with r2 as offset. On my target, it is much preferred to use auto-inc in cases like this. I repeat my question, as I don't quite understand why the ldr/add is used by ARM here, instead of post-inc. I guess I would like the DAG combiner to work in cases like this, but it does not seem to do so. Thank you, Jonas Subject: Re: [LLVMdev] Post-inc combining From: bob.wilson at apple.com Date: Fri, 28 Jan 2011 08:56:09 -0800 CC: llvmdev at cs.uiuc.edu To: jnspaulsson at hotmail.com On Jan 27, 2011, at 11:13 PM, Jonas Paulsson wrote:Hi, I would like to transform a LLVM function containing a load and an add of the base address inside a loop to a post-incremented load. In DAGCombiner.cpp::CombineToPostIndexedLoadStore(), it says it cannot fold the add for instance if it is a predecessor/successor of the load. I find this odd, as this is exactly what I would like to handle: a simple loop with an address that is inremented in each iteration. I am considering using a target intrinsic for this purpose, as the SCEV interface is available on the LLVM I/R. In this way, I could get a DAG with a post-inc-load node instead of the load and add nodes. Is this a work in progress? Please explain why these constraints are put in the above mentioned method as they do not seem to facilitate post-inc instruction combining. The "predecessor" and "successor" terminology used there refers to the DAG, not to the order of the operations in the llvm IR. For example, if the result of the ADD is the value being stored to memory, then you couldn't fold that into into a post-inc STORE: %x = add i32 %addr, 4; store i32 %x, i32* %addr In the DAG for that, the ADD is a predecessor of the STORE. If the result of the add is used for some other memory reference, then it would not be a predecessor and could be folded. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110207/54cd90c3/attachment.html>