search for: bitreverse

Displaying 20 results from an estimated 26 matches for "bitreverse".

2010 Jan 15
4
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
Hi, On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now. I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common. Other targets that I know of that could potentially benefit from this optimization being global (that have a clz and bitreverse instruction but not ctz) are AVR32 and C64x, neither of wh...
2010 Jan 15
0
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
...gt; On ARMv6T2 this turns cttz into rbit, clz instead of the 4 > instruction sequence it is now. > > I'm not sure if adding RBIT to ARMISD and doing this optimization in > the legalize pass is the best option, but the only better way I > could think of doing it was to add a bitreverse intrinsic to llvm > ir, which itself might not be the best option since bitreverse > probably isn't too common. I haven't looked at the patch in detail, but this approach makes sense to me. > Other targets that I know of that could potentially benefit from > this optim...
2010 Jan 15
1
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
...this turns cttz into rbit, clz instead of the 4 >> instruction sequence it is now. >> >> I'm not sure if adding RBIT to ARMISD and doing this optimization in >> the legalize pass is the best option, but the only better way I >> could think of doing it was to add a bitreverse intrinsic to llvm >> ir, which itself might not be the best option since bitreverse >> probably isn't too common. > > I haven't looked at the patch in detail, but this approach makes sense > to me. > >> Other targets that I know of that could potentially benefi...
2010 Jan 15
2
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On 15 Jan 2010, at 18:03, Chris Lattner wrote: > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another target wants this, we could add a ISD::RBIT operation, > it doesn't need to be added at the llvm ir level, The XCore also has ctlz and bitreverse instr...
2010 Jan 15
0
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
...M, Richard Osborne wrote: > > On 15 Jan 2010, at 18:03, Chris Lattner wrote: > >> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >> >>> Other targets that I know of that could potentially benefit from >>> this optimization being global (that have a clz and bitreverse >>> instruction but not ctz) are AVR32 and C64x, neither of which llvm >>> has backends for yet. >> >> When/if another target wants this, we could add a ISD::RBIT >> operation, >> it doesn't need to be added at the llvm ir level, > > The XCore a...
2010 Jan 18
1
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
...> >> On 15 Jan 2010, at 18:03, Chris Lattner wrote: >> >>> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >>> >>>> Other targets that I know of that could potentially benefit from >>>> this optimization being global (that have a clz and bitreverse >>>> instruction but not ctz) are AVR32 and C64x, neither of which llvm >>>> has backends for yet. >>> >>> When/if another target wants this, we could add a ISD::RBIT >>> operation, >>> it doesn't need to be added at the llvm ir lev...
2015 Nov 16
2
LLVM Weekly - #98, Nov 16th 2015
...e+execute+minimize the corpus, 2) choose a random unit, 3) reset the coverage, 4) start fuzzing as if the chosen unit was the only element of the corpus, 5) reset the coverage again when done, 6) merge the newly created corpus into the original one. [r252838](http://reviews.llvm.org/rL252838). * A BITREVERSE SelectionDAG node and a set of `llvm.bitreverse.*` intrinsics have been introduced. The intention is that backends should no longer have to reimplement similar code to match instruction patterns to their own ISA's bitreverse instruction. See also the patch to the ARM backend that replaces ARMIS...
2020 Jul 05
8
[RFC] carry-less multiplication instruction
...ltiply can also be used to implement Erasure code efficiently. [14]</p><p>==clmul lowering without hardware support==<br />A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3].</p><p>[1] <a href="https://en.wikipedia.org/wiki/Carry-less_product">https://en.wikipedia.org/wiki/Carry-less_product</a><br /><a href="https://en.wikipedia.org/wiki/Carry-less_product...
2016 Sep 28
3
Load combine pass
...or the rest of the optimizer. I can't find any backstory for this pass, why was it chosen to optimize the pattern in question in this way? What is the current status of this pass? I have an alternative implementation for it locally. I implemented an instcombine rule similar to recognise bswap/bitreverse idiom. It relies on collectBitParts (Local.cpp) to determine the origin of the bits in a given or value. If all the bits are happen to be loaded from adjacent locations it replaces the or with a single load or a load plus bswap. If the alternative approach sounds reasonable I'll post my patche...
2016 Sep 28
4
Load combine pass
...t; >> I can't find any backstory for this pass, why was it chosen to optimize the pattern in question in this way? What is the current status of this pass? >> >> I have an alternative implementation for it locally. I implemented an instcombine rule similar to recognise bswap/bitreverse idiom. It relies on collectBitParts (Local.cpp) to determine the origin of the bits in a given or value. If all the bits are happen to be loaded from adjacent locations it replaces the or with a single load or a load plus bswap. >> >> If the alternative approach sounds reasonable I'...
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
On 5/18/20 8:24 PM, Craig Topper wrote: > I can tell you that your avx512 issue is that v64i8 gfni instructions also > require avx512bw to be enabled to make v64i8 a supported type. The C > intrinsics handling in the front end know this rule. But since you > generated your own intrinsics you bypassed that. Indeed that's the issue... I was stick with what Intel announces here
2020 Jul 09
2
[RFC] carry-less multiplication instruction
...ltiply can also be used to implement Erasure code efficiently. [14] >> >>  ==clmul lowering without hardware support== >>  A 8x8=>16 clmul can also be lowered to a 32x32=>64 multiplication when there is no specialized instruction (also 15x15=>30, to a 60x60=>120, or if bitreverse is available 16x16=>32 to TWO 64x64=>64 multiplications)[3]. >> >>  [1] https://en.wikipedia.org/wiki/Carry-less_product >>  [2] (page 30) https://raw.githubusercontent.com/riscv/riscv-bitmanip/master/bitmanip-0.92.pdf >>  [3] https://www.bearssl.org/constanttime.html...
2015 Dec 01
10
[RFC] Intrinsic naming convention (words with dots)
...fix): @llvm.gcroot @llvm.gcread @llvm.gcwrite @llvm.experimental.stackmap @llvm.experimental.patchpoint @llvm.experimental.gc.statepoint @llvm.returnaddress @llvm.frameaddress @llvm.localescape @llvm.localrecover @llvm.stacksave @llvm.stackrestore @llvm.pcmarker @llvm.readcyclecounter @llvm.bitreverse @llvm.eh.begincatch @llvm.eh.endcatch @llvm.eh.padparam @llvm.stackprotector @llvm.stackprotectorcheck @llvm.objectsize @llvm.donothing Words with dots: @llvm.sadd.with.overflow @llvm.uadd.with.overflow @llvm.ssub.with.overflow @llvm.usub.with.overflow @llvm.smul.with.overflow @llvm.umul.with...
2016 Sep 29
2
Load combine pass
...n't find any backstory for this pass, why was it chosen to optimize the pattern in question in this way? What is the current status of this pass? > >>> > >>> I have an alternative implementation for it locally. I implemented an instcombine rule similar to recognise bswap/bitreverse idiom. It relies on collectBitParts (Local.cpp) to determine the origin of the bits in a given or value. If all the bits are happen to be loaded from adjacent locations it replaces the or with a single load or a load plus bswap. > >>> > >>> If the alternative approach sounds...
2003 Jan 23
4
SIMD instructions
Vorbis does not appear to use any SIMD instructions. A short look around in the source code indicates that it would be possible and might even yield big performance improvements. Why has nobody done it yet? I am currently trying to learn using these instructions and would be willing to rewrite a few functions in SIMD instructions, if I understand how to vectorize them and if they make a
2019 Sep 11
2
Load combine pass
...y backstory for this pass, why was it chosen to > optimize the pattern in question in this way? What is the current status of > this pass? > >>> > >>> I have an alternative implementation for it locally. I implemented an > instcombine rule similar to recognise bswap/bitreverse idiom. It relies on > collectBitParts (Local.cpp) to determine the origin of the bits in a given > or value. If all the bits are happen to be loaded from adjacent locations > it replaces the or with a single load or a load plus bswap. > >>> > >>> If the alternative...
2019 Sep 12
2
Load combine pass
...pass, why was it chosen to >> optimize the pattern in question in this way? What is the current status of >> this pass? >> >>> >> >>> I have an alternative implementation for it locally. I implemented an >> instcombine rule similar to recognise bswap/bitreverse idiom. It relies on >> collectBitParts (Local.cpp) to determine the origin of the bits in a given >> or value. If all the bits are happen to be loaded from adjacent locations >> it replaces the or with a single load or a load plus bswap. >> >>> >> >>>...
2013 Jul 16
1
Query regarding FFT in Opus
Dear Experts, I want to know if the KISS FFT in opus can be replaced with fast FFT. What are the implications & things to be taken care if this is done. Also, I saw in FFT code comments that in-line FFT not supported. Please let me know the reason for this. Thanks in advance. Regards, Mahantesh -------------- next part -------------- An HTML attachment was scrubbed... URL:
2019 Sep 25
2
Load combine pass
...in this way? What >>> is the current status of this pass? >>> >>> >>> >>> I have an alternative implementation for it locally. I >>> implemented an instcombine rule similar to recognise >>> bswap/bitreverse idiom. It relies on collectBitParts >>> (Local.cpp) to determine the origin of the bits in a given >>> or value. If all the bits are happen to be loaded from >>> adjacent locations it replaces the or with a single load or >>> a loa...
2018 May 16
0
Rotates, once again
...broken down into multiple IR instructions? As noted here, it's unlikely for rotate. If it is possible, then adding folds to instcombine for this intrinsic isn't hard. Are any other passes affected? For reference, these are the current target-independent bit-manipulation intrinsics - bswap, bitreverse, ctpop, ctlz, cttz: http://llvm.org/docs/LangRef.html#bit-manipulation-intrinsics The LLVM cost for the proposed rotate intrinsic should be in the same range as those? Note that we would not just be adding code to support an intrinsic. There are already ~200 lines of DAG matching code for rotate,...