Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops"
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael,
Sometime back I did some experiments with interleave vectorizer and did not found any degrade,
probably my tests/benchmarks are not extensive enough to cover much.
Elina is the right person to comment on it as she already experienced cases where it hinders performance.
For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2015 Nov 19
5
[RFC] Introducing a vector reduction add instruction.
After some attempt to implement reduce-add in LLVM, I found out a
easier way to detect reduce-add without introducing new IR operations.
The basic idea is annotating phi node instead of add (so that it is
easier to handle other reduction operations). In PHINode class, we can
add a flag indicating if the phi node is a reduction one (the flag can
be set in loop vectorizer for vectorized phi nodes).
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved
access optimization ought to have a positive impact even without target
support.
Case in point - Hal enabled it on PPC last September. An important
difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC,
but, as I said below, I hope we can enable it on x86 with a conservative
cost function, and still get
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
On Wed, Nov 25, 2015 at 2:32 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> Hi Cong,
>
> After reading the original RFC and this update, I'm still not entirely sure I understand the semantics of the flag you're proposing to add. Does it having something to do with the ordering of the reduction operations?
The flag is only useful for vectorized reduction for now. I'll give
2015 Nov 25
2
[RFC] Introducing a vector reduction add instruction.
----- Original Message -----
> From: "Xinliang David Li" <davidxl at google.com>
> To: "Cong Hou" <congh at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, November 25, 2015 5:17:58 PM
> Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add
2004 Sep 10
2
An assembly optimization and fix
I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov
function and fixed bug when data_len == 0. Now the function is about
50% faster and flac -5 is about 5% faster on my box. I have tested it
thoroughly, I think it can go to flac 1.0.4.
--
Miroslav Lichvar
-------------- next part --------------
--- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100
+++
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet.
We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote:
> Hello. This is my 1st post.
ようこそ!
> I have tried SSE execution domain fixup pass.
> But I am not able to see any improvements.
Did you actually measure runtime, or did you look at assembly?
> I expect for the example below to use MOVDQA, PAND &c.
> (On nehalem, ANDPS is extremely slower than PAND)
Are you sure? The
2016 May 26
0
enabling interleaved access loop vectorization
On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Is there a compile-time and/or potential runtime cost that makes
> enableInterleavedAccessVectorization() default to 'false'?
>
> I notice that this is set to true for ARM, AArch64, and PPC.
>
> In particular, I'm wondering if there's a reason it's not enabled for
2006 May 25
2
Compilation issues with s390
Hi all,
I'm trying to compile asterisk on the mainframe (s390 / s390x) and I am
running into issues. I was wondering if somebody could give a hand?
I'm thinking that I should be able to do this. I have noticed that Debian
even has binary RPM's out for Asterisk now. I'm trying to do this on SuSE
SLES8 (with the 2.4 kernel).
What I see is, an issue that arch=s390 isn't
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x,
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello,
Depending on how I extract integer lanes from an x86_64 xmm register, the
backend may spill that register in order to load scalars. The effect was
observed on two targets: corei7-avx and btver1 (I haven't checked other
targets).
Here's a test case with spilling/no-spilling code put on conditional
compile:
#if __SSE4_1__ != 0
#include <smmintrin.h>
#else
#include
2006 Mar 22
9
Display problem
hey guys,
Does anyone know why the french letter "?" is displayed as a question
mark "?" inside an <h1> tag ?
--
Posted via http://www.ruby-forum.com/.
2016 Apr 01
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
RFC: A proposal for vectorizing loops with calls to math functions using SVML (short
vector math library).
=========
Overview
=========
Very simply, SVML (Intel short vector math library) functions are vector variants of
scalar math functions that take vector arguments, apply an operation to each
element, and store the result in a vector register. These vector variants can be
generated by the
2009 Aug 30
3
experimental patch for libtheora1.1beta3
Good morning in the Lord
Regarding the port of libtheora1.1beta3 for OpenBSD for amd64 and the
problem I described at:
http://lists.xiph.org/pipermail/theora/2009-August/002640.html
Attached is a patch for
libtheora/patches/patch-lib_x86_mmxencfrag_c
I can play videos with it. ?Does it work for you?
Best regards
--
Dios, gracias por tu amor infinito.
2016 Apr 04
2
RFC: A proposal for vectorizing loops with calls to math functions using SVML
Hi Sanjay,
For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow.
From: Sanjay
2011 Jan 04
1
isolinux, extlinux and accented characters
Hello,
First of all, thanks very much for this piece of software, it's working
like a charm.
I'm using it to boot a Linux distro I'm working on.
What I want to deal with is the display of accented characters in ISOLINUX.
First, I set EXTLINUX up to boot an ext2-formatted USB stick. No problem
so far.
I wrote a config file ordering to load a font (lat-9w) to display french
2008 Mar 15
0
Naissance de fr.centos.org / Birth of fr.centos.org
(For non native french speakers : that will be my only announce here in
another language than english ;-) )
Le projet CentOS est heureux de vous annoncer la naissance du site
http://fr.centos.org .
En r?ponse ? la demande croissante de la communaut? des utilisateurs
francophones de CentOS, le forum fr.centos.org a vu le jour.
Nous profitons de cette annonce pour relancer l'appel aux
2012 Nov 28
0
[LLVMdev] [llvm-commits] [dragonegg] r168787 - in /dragonegg/trunk: src/x86/Target.cpp src/x86/x86_builtins test/validator/c/copysignp.c
Hi Pawel, can you please pull this dragonegg patch into 3.2. I am the
code owner for dragonegg.
Thanks a lot, Duncan.
On 28/11/12 13:44, Duncan Sands wrote:
> Author: baldrick
> Date: Wed Nov 28 06:44:50 2012
> New Revision: 168787
>
> URL: http://llvm.org/viewvc/llvm-project?rev=168787&view=rev
> Log:
> Add support for GCC's vector copysign builtins, fixing