David Conrad
2010-Jan-15 06:13 UTC
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
Hi, On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now. I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common. Other targets that I know of that could potentially benefit from this optimization being global (that have a clz and bitreverse instruction but not ctz) are AVR32 and C64x, neither of which llvm has backends for yet. -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-ctz-arm.diff Type: application/octet-stream Size: 5160 bytes Desc: not available URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20100115/9da482e2/attachment.obj>
Chris Lattner
2010-Jan-15 18:03 UTC
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 14, 2010, at 10:13 PM, David Conrad wrote:> Hi, > > On ARMv6T2 this turns cttz into rbit, clz instead of the 4 > instruction sequence it is now. > > I'm not sure if adding RBIT to ARMISD and doing this optimization in > the legalize pass is the best option, but the only better way I > could think of doing it was to add a bitreverse intrinsic to llvm > ir, which itself might not be the best option since bitreverse > probably isn't too common.I haven't looked at the patch in detail, but this approach makes sense to me.> Other targets that I know of that could potentially benefit from > this optimization being global (that have a clz and bitreverse > instruction but not ctz) are AVR32 and C64x, neither of which llvm > has backends for yet.When/if another target wants this, we could add a ISD::RBIT operation, it doesn't need to be added at the llvm ir level, -Chris
Richard Osborne
2010-Jan-15 19:37 UTC
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On 15 Jan 2010, at 18:03, Chris Lattner wrote:> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level,The XCore also has ctlz and bitreverse instructions and not cttz. At the moment in the XCore backend cttz is marked as legal and expanded to this pair of instructions in a pattern in the InstrInfo.td. -- Richard Osborne | XMOS xmos.com
Sandeep Patel
2010-Jan-15 20:04 UTC
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Fri, Jan 15, 2010 at 6:03 PM, Chris Lattner <clattner at apple.com> wrote:> > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Hi, >> >> On ARMv6T2 this turns cttz into rbit, clz instead of the 4 >> instruction sequence it is now. >> >> I'm not sure if adding RBIT to ARMISD and doing this optimization in >> the legalize pass is the best option, but the only better way I >> could think of doing it was to add a bitreverse intrinsic to llvm >> ir, which itself might not be the best option since bitreverse >> probably isn't too common. > > I haven't looked at the patch in detail, but this approach makes sense > to me. > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level,Bit reversal turns up in most FFT algorithms, so it wouldn't hurt to be able to add an instcombine that recognizes it, etc. deep
Jakob Stoklund Olesen
2010-Jan-19 01:10 UTC
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 15, 2010, at 10:03 AM, Chris Lattner wrote:> > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level,Blackfin can add with backwards carry, essentially doing (rbit (add (rbit a), (rbit b))) This is used for FFTs. I wasn't hoping to be able to pattern-match something so complicated.
Possibly Parallel Threads
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz