thr3ads.net - similar to: "SIMD instructions"

Displaying 20 results from an estimated 1000 matches similar to: "SIMD instructions"

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

Hi, On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now. I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common. Other

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > Hi, > > On ARMv6T2 this turns cttz into rbit, clz instead of the 4 > instruction sequence it is now. > > I'm not sure if adding RBIT to ARMISD and doing this optimization in > the legalize pass is the best option, but the only better way I > could think of doing it was to add a bitreverse intrinsic to llvm

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On 15 Jan 2010, at 18:03, Chris Lattner wrote: > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Other targets that I know of that could potentially benefit from >> this optimization being global (that have a clz and bitreverse >> instruction but not ctz) are AVR32 and C64x, neither of which llvm >> has backends for yet. > > When/if another

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Fri, Jan 15, 2010 at 6:03 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jan 14, 2010, at 10:13 PM, David Conrad wrote: > >> Hi, >> >> On ARMv6T2 this turns cttz into rbit, clz instead of the 4 >> instruction sequence it is now. >> >> I'm not sure if adding RBIT to ARMISD and doing this optimization in >> the legalize pass is

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 15

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Jan 15, 2010, at 11:37 AM, Richard Osborne wrote: > > On 15 Jan 2010, at 18:03, Chris Lattner wrote: > >> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >> >>> Other targets that I know of that could potentially benefit from >>> this optimization being global (that have a clz and bitreverse >>> instruction but not ctz) are AVR32 and C64x,

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

2010 Jan 18

[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz

On Jan 15, 2010, at 2:52 PM, Jim Grosbach wrote: > > On Jan 15, 2010, at 11:37 AM, Richard Osborne wrote: > >> >> On 15 Jan 2010, at 18:03, Chris Lattner wrote: >> >>> On Jan 14, 2010, at 10:13 PM, David Conrad wrote: >>> >>>> Other targets that I know of that could potentially benefit from >>>> this optimization being

[RFC] carry-less multiplication instruction

2020 Jul 05

[RFC] carry-less multiplication instruction

<div> </div><div><div><p>Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more.</p><p>This proposal is to add a <code>llvm.clmul</code> instruction. Or if that is contentious, <code>llvm.experimental.bitmanip.clmul</code> instruction.

Use Galois field New Instructions (GFNI) to combine affine instructions

2020 May 18

Use Galois field New Instructions (GFNI) to combine affine instructions

On 5/18/20 8:24 PM, Craig Topper wrote: > I can tell you that your avx512 issue is that v64i8 gfni instructions also > require avx512bw to be enabled to make v64i8 a supported type. The C > intrinsics handling in the front end know this rule. But since you > generated your own intrinsics you bypassed that. Indeed that's the issue... I was stick with what Intel announces here

Load combine pass

2016 Sep 28

Load combine pass

One of the arguments for doing this earlier is inline cost perception of the original pattern. Reading i32/i64 by bytes look much more expensive than it is and can prevent inlining of interesting function. Inhibiting other optimizations concern can be addressed by careful selection of the pattern we’d like to match. I limit the transformation to the case when all the individual have no uses other

Load combine pass

2016 Sep 28

Load combine pass

Hi, I'm trying to optimize a pattern like this into a single i16 load: %1 = bitcast i16* %pData to i8* %2 = load i8, i8* %1, align 1 %3 = zext i8 %2 to i16 %4 = shl nuw i16 %3, 8 %5 = getelementptr inbounds i8, i8* %1, i16 1 %6 = load i8, i8* %5, align 1 %7 = zext i8 %6 to i16 %8 = shl nuw nsw i16 %7, 0 %9 = or i16 %8, %4 I came across load combine pass which is motivated

[RFC] Intrinsic naming convention (words with dots)

2015 Dec 01

[RFC] Intrinsic naming convention (words with dots)

Hi everyone, We seem to have allowed our documented target-independent intrinsics to acquire a somewhat-haphazard naming system, and I think we should standardize on one convention. All of the intrinsics have 'llvm.' as a prefix, and some also have some additional prefix 'llvm.dbg.', 'llvm.eh.', 'llvm.experimental.', etc., but after that we lose consistency. When

[RFC] carry-less multiplication instruction

2020 Jul 09

[RFC] carry-less multiplication instruction

05.07.2020, 05:22, "Roman Lebedev" <lebedev.ri at gmail.com>: > On Sun, Jul 5, 2020 at 12:18 PM Shawn Landden via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Carry-less multiplication[1] instructions exist (at least optionally) on many architectures: armv8, RISC-V, x86_64, POWER, SPARC, C64x, and possibly more. >> >> This proposal is to add a

Well decomposed mdct

2005 Feb 20

Well decomposed mdct

I did composition of butterfly8 and butterfly16 and I found, that these functions are well decomposed - decomposition doesn't lower computional speed. On the other hand the same can be done with butterfly8 - decomposition to butterfly4 (further decomposition is not possible) but there's no reason to do this. I think little improvement can be done by inlining them. Compiler and processor

[RFC PATCH v1 0/3] Introducing ARM SIMD Support

2014 Sep 10

[RFC PATCH v1 0/3] Introducing ARM SIMD Support

libvorbis does not currently have any simd/vectorization. Following patches add generic framework for simd/vectorization and on top, add ARM-NEON simd vectorization using intrinsics. I was able to get over 34% performance improvement on my Beaglebone Black which is single Cortex-A8 based CPU. You can find more information on metrics and procedure I used to measure at

optimization patches

2000 Aug 28

optimization patches

Well, here you are. 24k; sorry if I'm not supposed to put this size things in your mailbox, didn't know where else to put it. And you all are subscribed to vorbis-dev, after all. I'm not that good at breaking patches apart, so it's one big patch. Sorry. Overview: configure.in make profiling easier & more useful decoder-example.c (#if 0'ed) dither output;

encoder block diagram

2003 Mar 12

encoder block diagram

I've made a block diagram of the encoder because I tried to find out, how it works http://stoffke.freeshell.65535.net/ogg/block.html Although there are specifiation docs, that give very detailed information about single aspects of the encoding (or decoding) , I'm missing documenations that give a more general overview, about how the encoder works. (Vorbis Illuminated seems a bit

Load combine pass

2016 Sep 29

Load combine pass

> On 29 Sep 2016, at 03:23, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Artur, > > Artur Pilipenko via llvm-dev wrote: > > One of the arguments for doing this earlier is inline cost > > perception of the original pattern. Reading i32/i64 by bytes look much > > more expensive than it is and can prevent inlining of interesting > >

A new introduction attempt.

2003 Sep 10

A new introduction attempt.

I have been using libvorbis for the past few weeks and have been asked to summarise what I have discovered about the codec. There is an early draft of the document at http://www.geocities.com/gatewaystation/vorbis/vorbis.htm - Please forgive the dodgy formatting (it was formerly a MS word document that got converted with their 'save as html' feature). I still have some additions to

Speex Command line, Changing the LPC order and modifying the codebook

2006 Feb 09

Speex Command line, Changing the LPC order and modifying the codebook

>There's plenty of areas for improvements that don't require incompatible changes like this one. can u please tell me what do I do to make it more exact waveform coder for music rather than speech. I understand that its meant for speech, but I was just using it for music... I am interested in getting the residual as small as possible using speex. Can you please tell me the areas to

Query regarding FFT in Opus

2013 Jul 16

Query regarding FFT in Opus

Dear Experts, I want to know if the KISS FFT in opus can be replaced with fast FFT. What are the implications & things to be taken care if this is done. Also, I saw in FFT code comments that in-line FFT not supported. Please let me know the reason for this. Thanks in advance. Regards, Mahantesh -------------- next part -------------- An HTML attachment was scrubbed... URL:

similar to: SIMD instructions