similar to: SIMD instructions

Displaying 20 results from an estimated 1000 matches similar to: "SIMD instructions"

2010 Jan 15
4
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
Hi, On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now. I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common. Other
2010 Jan 15
0
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > Hi, > > On ARMv6T2 this turns cttz into rbit, clz instead of the 4 > instruction sequence it is now. > > I'm not sure if adding RBIT to ARMISD and doing this optimization in > the legalize pass is the best option, but the only better way I > could think of doing it was to add a bitreverse intrinsic to llvm
2010 Jan 15
2
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On 15 Jan 2010, at 18:03, Chris Lattner wrote: > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another
2010 Jan 15
1
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Fri, Jan 15, 2010 at 6:03 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Hi, >> >> On ARMv6T2 this turns cttz into rbit, clz instead of the 4 >> instruction sequence it is now. >> >> I'm not sure if adding RBIT to ARMISD and doing this optimization in >> the legalize pass is
2010 Jan 15
0
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 15, 2010, at 11:37 AM, Richard Osborne wrote: > > On 15 Jan 2010, at 18:03, Chris Lattner wrote: > >> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >> >>> Other targets that I know of that could potentially benefit from >>> this optimization being global (that have a clz and bitreverse >>> instruction but not ctz) are AVR32 and C64x,
2010 Jan 18
1
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 15, 2010, at 2:52 PM, Jim Grosbach wrote: > > On Jan 15, 2010, at 11:37 AM, Richard Osborne wrote: > >> >> On 15 Jan 2010, at 18:03, Chris Lattner wrote: >> >>> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >>> >>>> Other targets that I know of that could potentially benefit from >>>> this optimization being
2020 Jul 05
8
[RFC] carry-less multiplication instruction
<div> </div><div><div><p>Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more.</p><p>This proposal is to add a <code>llvm.clmul</code> instruction. Or if that is contentious, <code>llvm.experimental.bitmanip.clmul</code> instruction.
2020 May 18
2
Use Galois field New Instructions (GFNI) to combine affine instructions
On 5/18/20 8:24 PM, Craig Topper wrote: > I can tell you that your avx512 issue is that v64i8 gfni instructions also > require avx512bw to be enabled to make v64i8 a supported type. The C > intrinsics handling in the front end know this rule. But since you > generated your own intrinsics you bypassed that. Indeed that's the issue... I was stick with what Intel announces here
2016 Sep 28
4
Load combine pass
One of the arguments for doing this earlier is inline cost perception of the original pattern. Reading i32/i64 by bytes look much more expensive than it is and can prevent inlining of interesting function. Inhibiting other optimizations concern can be addressed by careful selection of the pattern we’d like to match. I limit the transformation to the case when all the individual have no uses other
2016 Sep 28
3
Load combine pass
Hi, I'm trying to optimize a pattern like this into a single i16 load: %1 = bitcast i16* %pData to i8* %2 = load i8, i8* %1, align 1 %3 = zext i8 %2 to i16 %4 = shl nuw i16 %3, 8 %5 = getelementptr inbounds i8, i8* %1, i16 1 %6 = load i8, i8* %5, align 1 %7 = zext i8 %6 to i16 %8 = shl nuw nsw i16 %7, 0 %9 = or i16 %8, %4 I came across load combine pass which is motivated
2015 Dec 01
10
[RFC] Intrinsic naming convention (words with dots)
Hi everyone, We seem to have allowed our documented target-independent intrinsics to acquire a somewhat-haphazard naming system, and I think we should standardize on one convention. All of the intrinsics have 'llvm.' as a prefix, and some also have some additional prefix 'llvm.dbg.', 'llvm.eh.', 'llvm.experimental.', etc., but after that we lose consistency. When
2020 Jul 09
2
[RFC] carry-less multiplication instruction
05.07.2020, 05:22, "Roman Lebedev" <lebedev.ri at gmail.com>: > On Sun, Jul 5, 2020 at 12:18 PM Shawn Landden via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >>  Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more. >> >>  This proposal is to add a
2005 Feb 20
1
Well decomposed mdct
I did composition of butterfly8 and butterfly16 and I found, that these functions are well decomposed - decomposition doesn't lower computional speed. On the other hand the same can be done with butterfly8 - decomposition to butterfly4 (further decomposition is not possible) but there's no reason to do this. I think little improvement can be done by inlining them. Compiler and processor
2014 Sep 10
4
[RFC PATCH v1 0/3] Introducing ARM SIMD Support
libvorbis does not currently have any simd/vectorization. Following patches add generic framework for simd/vectorization and on top, add ARM-NEON simd vectorization using intrinsics. I was able to get over 34% performance improvement on my Beaglebone Black which is single Cortex-A8 based CPU. You can find more information on metrics and procedure I used to measure at
2000 Aug 28
3
optimization patches
Well, here you are. 24k; sorry if I'm not supposed to put this size things in your mailbox, didn't know where else to put it. And you all are subscribed to vorbis-dev, after all. I'm not that good at breaking patches apart, so it's one big patch. Sorry. Overview: configure.in make profiling easier & more useful decoder-example.c (#if 0'ed) dither output;
2003 Mar 12
2
encoder block diagram
I've made a block diagram of the encoder because I tried to find out, how it works http://stoffke.freeshell.65535.net/ogg/block.html Although there are specifiation docs, that give very detailed information about single aspects of the encoding (or decoding) , I'm missing documenations that give a more general overview, about how the encoder works. (Vorbis Illuminated seems a bit
2016 Sep 29
2
Load combine pass
> On 29 Sep 2016, at 03:23, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Artur, > > Artur Pilipenko via llvm-dev wrote: > > One of the arguments for doing this earlier is inline cost > > perception of the original pattern. Reading i32/i64 by bytes look much > > more expensive than it is and can prevent inlining of interesting > >
2003 Sep 10
1
A new introduction attempt.
I have been using libvorbis for the past few weeks and have been asked to summarise what I have discovered about the codec. There is an early draft of the document at http://www.geocities.com/gatewaystation/vorbis/vorbis.htm - Please forgive the dodgy formatting (it was formerly a MS word document that got converted with their 'save as html' feature). I still have some additions to
2006 Feb 09
2
Speex Command line, Changing the LPC order and modifying the codebook
>There's plenty of areas for improvements that don't require incompatible changes like this one. can u please tell me what do I do to make it more exact waveform coder for music rather than speech. I understand that its meant for speech, but I was just using it for music... I am interested in getting the residual as small as possible using speex. Can you please tell me the areas to
2013 Jul 16
1
Query regarding FFT in Opus
Dear Experts, I want to know if the KISS FFT in opus can be replaced with fast FFT. What are the implications & things to be taken care if this is done. Also, I saw in FFT code comments that in-line FFT not supported. Please let me know the reason for this. Thanks in advance. Regards, Mahantesh -------------- next part -------------- An HTML attachment was scrubbed... URL: