similar to: SCEV and LoopStrengthReduction Formulae

Displaying 20 results from an estimated 2000 matches similar to: "SCEV and LoopStrengthReduction Formulae"

2018 Apr 07
0
SCEV and LoopStrengthReduction Formulae
> > I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less > instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition > to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass? This seems reasonable to me so long as rbx has no other uses that
2018 Apr 04
0
SCEV and LoopStrengthReduction Formulae
> cmpq %rbx, %r14 > jne .LBB0_1 > > LLVM can perform compare-jump fusion, it already does in certain cases, but > not in the case above. We can remove the cmp above if we were to perform > the following transformation: Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)? Is there any more limitation why these two or not fused? > -----Original
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone, I think I have found an gvn / alias analysis related bug, but before opening an issue on the tracker I wanted to see if I am missing something. I have the following testcase: define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) { > entry: > ; Just some temporary storage > %tmp.0 = alloca i32 > %tmp.1 = alloca i32 > %tmp.i =
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
this is definitely a bug in AA. 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; ++I) { 226 const Value *Arg = *I; 227 if (!Arg->getType()->isPointerTy()) -> 228 continue; 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I); 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2, CS2ArgIdx, TLI);
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
+ a few others. After following this rabbit hole a bit, there are a lot of mutually recursive calls, etc, that may or may not do the right thing with vectors of pointers. I can fix *this* particular bug with the attached patch. However, it's mostly papering over stuff. Nothing seems to know what to do with a memorylocation that is a vector of pointers. They all expect memorylocation to be a
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Okay, so then it sounds like, for now, the right fix is to stop marking masked.gather and masked.scatter with intrarg* options. On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com> wrote: > We might have specification bug here, but we appear to implement what we > specified. argmemonly is specified as only considering pointer typed > arguments. It's
2013 Feb 14
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hello, While investigating one of the existing tests (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some interesting code. The IR is very straightforward: define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) { entry: ret i32 %a3 } define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { entry: %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
2016 Aug 30
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
----- Original Message ----- > From: "Daniel Berlin" <dberlin at dberlin.org> > To: "Philip Reames" <listmail at philipreames.com>, "Davide Italiano" > <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com> > Cc: "Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" >
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Thank you for the quick fix, I can no longer reproduce the issue. As far a releases go, I am guessing that this is going to be in 4.0? Best, Chris On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote: > Yeah, i just hope it doesn't regress scatter/gather vector code badly. > But at least it's correct now? > > > On Tue, Aug 30, 2016 at 1:11
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Great, thank you! On Wed, Aug 31, 2016 at 2:07 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ------------------------------ > > *From: *"Chris Sakalis" <chrissakalis at gmail.com> > *To: *"Daniel Berlin" <dberlin at dberlin.org> > *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" < > david.majnemer
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling host communication. v2: * Keeping only the minimal common platform defines * added vmware_platform() check function v3: * Added new field to handle different hypervisor magic values Signed-off-by: Sinclair Yeh <syeh at vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com> Reviewed-by: Alok N Kataria
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling host communication. v2: * Keeping only the minimal common platform defines * added vmware_platform() check function v3: * Added new field to handle different hypervisor magic values Signed-off-by: Sinclair Yeh <syeh at vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com> Reviewed-by: Alok N Kataria
2010 Jan 13
4
merging issue.........
hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 1 10.2 2 11.1 3 9.7 6 10.2 7 8.9 10 9.7 11 10.2 The second file is like the following: id trait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 12 10.2 13 10.1 now I want to merge the two files by the variable "id", I only want
2013 Feb 15
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
>> While investigating one of the existing tests >> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some >> interesting code. The IR is very straightforward: >> >> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 >> %a4) { >> entry: >> ret i32 %a3 >> } >> >> define fastcc i32 @tailcaller(i32