Displaying 20 results from an estimated 2000 matches similar to: "SCEV and LoopStrengthReduction Formulae"
2018 Apr 07
0
SCEV and LoopStrengthReduction Formulae
>
> I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less
> instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition
> to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass?
This seems reasonable to me so long as rbx has no other uses that
2018 Apr 04
0
SCEV and LoopStrengthReduction Formulae
> cmpq %rbx, %r14
> jne .LBB0_1
>
> LLVM can perform compare-jump fusion, it already does in certain cases, but
> not in the case above. We can remove the cmp above if we were to perform
> the following transformation:
Do you mean branch-fusion (https://en.wikichip.org/wiki/macro-operation_fusion)?
Is there any more limitation why these two or not fused?
> -----Original
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB(
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be
configured to 12, 14, 16, 20 and 24 according to complexity parameter.
It's hard to get a universal function to handle all these orders
efficiently. Any suggestions?
Thanks,
Linfeng
On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions. Will get back to you once we have some updates.
Linfeng
On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> > This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Hello everyone,
I think I have found an gvn / alias analysis related bug, but before
opening an issue on the tracker I wanted to see if I am missing something.
I have the following testcase:
define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2, i32* %out) {
> entry:
> ; Just some temporary storage
> %tmp.0 = alloca i32
> %tmp.1 = alloca i32
> %tmp.i =
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
this is definitely a bug in AA.
225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E; ++I) {
226 const Value *Arg = *I;
227 if (!Arg->getType()->isPointerTy())
-> 228 continue;
229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I);
230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2,
CS2ArgIdx, TLI);
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
+ a few others.
After following this rabbit hole a bit, there are a lot of mutually
recursive calls, etc, that may or may not do the right thing with vectors
of pointers.
I can fix *this* particular bug with the attached patch.
However, it's mostly papering over stuff. Nothing seems to know what to do
with a memorylocation that is a vector of pointers. They all expect
memorylocation to be a
2016 Aug 29
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Okay, so then it sounds like, for now, the right fix is to stop marking
masked.gather and masked.scatter with intrarg* options.
On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com>
wrote:
> We might have specification bug here, but we appear to implement what we
> specified. argmemonly is specified as only considering pointer typed
> arguments. It's
2013 Feb 14
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
Hello,
While investigating one of the existing tests
(test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
interesting code. The IR is very straightforward:
define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
entry:
ret i32 %a3
}
define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32
2016 Aug 30
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
----- Original Message -----
> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Philip Reames" <listmail at philipreames.com>, "Davide Italiano"
> <davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com>
> Cc: "Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer"
>
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Thank you for the quick fix, I can no longer reproduce the issue. As far a
releases go, I am guessing that this is going to be in 4.0?
Best,
Chris
On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> Yeah, i just hope it doesn't regress scatter/gather vector code badly.
> But at least it's correct now?
>
>
> On Tue, Aug 30, 2016 at 1:11
2016 Aug 31
2
GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Great, thank you!
On Wed, Aug 31, 2016 at 2:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"Chris Sakalis" <chrissakalis at gmail.com>
> *To: *"Daniel Berlin" <dberlin at dberlin.org>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" <
> david.majnemer
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling
host communication.
v2:
* Keeping only the minimal common platform defines
* added vmware_platform() check function
v3:
* Added new field to handle different hypervisor magic values
Signed-off-by: Sinclair Yeh <syeh at vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com>
Reviewed-by: Alok N Kataria
2015 Dec 01
11
[PATCH 1/6] x86: Add VMWare Host Communication Macros
These macros will be used by multiple VMWare modules for handling
host communication.
v2:
* Keeping only the minimal common platform defines
* added vmware_platform() check function
v3:
* Added new field to handle different hypervisor magic values
Signed-off-by: Sinclair Yeh <syeh at vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom at vmware.com>
Reviewed-by: Alok N Kataria
2010 Jan 13
4
merging issue.........
hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:
id trait1
1 10.2
2 11.1
3 9.7
6 10.2
7 8.9
10 9.7
11 10.2
The second file is like the following:
id trait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
12 10.2
13 10.1
now I want to merge the two files by the variable "id", I only want
2013 Feb 15
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
>> While investigating one of the existing tests
>> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some
>> interesting code. The IR is very straightforward:
>>
>> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32
>> %a4) {
>> entry:
>> ret i32 %a3
>> }
>>
>> define fastcc i32 @tailcaller(i32