thr3ads.net - similar to: "[LLVMdev] convert llvm ir to selection Dag"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] convert llvm ir to selection Dag"

[LLVMdev] convert llvm ir to selection Dag

2010 Oct 01

[LLVMdev] convert llvm ir to selection Dag

Hi, Can anyone please tell me how can I scalarize or de-vectorize the llvm vector ir. In this (http://old.nabble.com/Re%3A-Thoughts-about-the-llvm-architecture---p2961720 3.html) thread I found LegalizeTypes will do this while generating machine code from llvm ir.. How do I convert llvm ir to selection Dag. And scalarize the vector ir and again get back llvm ir. Thanks &

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 02:51 PM, Linfeng Zhang wrote: > However, the critical thing is that all the states in each stage when > processing input[i] are reused by the next input[i+1]. That is > input[i+1] must wait input[i] for 1 stage, and input[i+2] must wait > input[i+1] for 1 stage, etc. That is indeed the tricky part... and the one I think you could do slightly differently. If

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 07:18 PM, Linfeng Zhang wrote: > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > configured to 12, 14, 16, 20 and 24 according to complexity parameter. > > It's hard to get a universal function to handle all these orders > efficiently. Any suggestions? I can think of two ways of handling larger orders. The obvious one is

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Attached is the silk_warped_autocorrelation_FIX_neon() which implements your idea. Speed improvement vs the previous optimization: Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16) Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity 10: 0.1% (order = 24) Code size of silk_warped_autocorrelation_FIX_neon() changes from 2,644

SCEV and LoopStrengthReduction Formulae

2018 Apr 07

SCEV and LoopStrengthReduction Formulae

> > I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less > instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition > to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass? This seems reasonable to me so long as rbx has no other uses that

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, Thanks for the updated patch. I'll have a look and get back to you. When you report speedup percentages, is that relative to the entire encoder or relative to just that function in C? Also, what's the speedup compared to master? Cheers, Jean-Marc On 05/04/17 12:14 PM, Linfeng Zhang wrote: > I attached a new patch with small cleanup (disassembly is identical as > the

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, I had a closer look at your patch and the code looks good -- and slightly simpler than I had anticipated, so that's good. I did some profiling on a Cortex A57 and I've been seeing slightly less improvement than you're reporting, more like 3.5% at complexity 8. It appears that the warped autocorrelation function itself is only faster by a factor of about 1.35. That's a

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 29

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

Hello everyone, I think I have found an gvn / alias analysis related bug, but before opening an issue on the tracker I wanted to see if I am missing something. I have the following testcase: define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) { > entry: > ; Just some temporary storage > %tmp.0 = alloca i32 > %tmp.1 = alloca i32 > %tmp.i =

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 29

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

this is definitely a bug in AA. 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; ++I) { 226 const Value *Arg = *I; 227 if (!Arg->getType()->isPointerTy()) -> 228 continue; 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, CS2ArgIdx, TLI);

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

2013 Feb 15

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hey Eli, On Thu, Feb 14, 2013 at 5:45 PM, Eli Bendersky <eliben at google.com> wrote: > Hello, > > While investigating one of the existing tests > (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some > interesting code. The IR is very straightforward: > > define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 > %a4) { > entry: >

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 29

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

+ a few others. After following this rabbit hole a bit, there are a lot of mutually recursive calls, etc, that may or may not do the right thing with vectors of pointers. I can fix *this* particular bug with the attached patch. However, it's mostly papering over stuff. Nothing seems to know what to do with a memorylocation that is a vector of pointers. They all expect memorylocation to be a

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 29

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

Okay, so then it sounds like, for now, the right fix is to stop marking masked.gather and masked.scatter with intrarg* options. On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> wrote: > We might have specification bug here, but we appear to implement what we > specified. argmemonly is specified as only considering pointer typed > arguments. It's

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

2013 Feb 14

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hello, While investigating one of the existing tests (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some interesting code. The IR is very straightforward: define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) { entry: ret i32 %a3 } define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { entry: %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 30

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

----- Original Message ----- > From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Philip Reames" <listmail at philipreames.com>, "Davide Italiano" > <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > Cc: "Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" >

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

2016 Aug 31

GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

Thank you for the quick fix, I can no longer reproduce the issue. As far a releases go, I am guessing that this is going to be in 4.0? Best, Chris On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > Yeah, i just hope it doesn't regress scatter/gather vector code badly. > But at least it's correct now? > > > On Tue, Aug 30, 2016 at 1:11

similar to: [LLVMdev] convert llvm ir to selection Dag