thr3ads.net - similar to: "[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops"

Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops"

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not

[RFC] Introducing a vector reduction add instruction.

2015 Nov 19

[RFC] Introducing a vector reduction add instruction.

After some attempt to implement reduce-add in LLVM, I found out a easier way to detect reduce-add without introducing new IR operations. The basic idea is annotating phi node instead of add (so that it is easier to handle other reduction operations). In PHINode class, we can add a flag indicating if the phi node is a reduction one (the flag can be set in loop vectorizer for vectorized phi nodes).

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

On Wed, Nov 25, 2015 at 2:32 PM, Hal Finkel <hfinkel at anl.gov> wrote: > Hi Cong, > > After reading the original RFC and this update, I'm still not entirely sure I understand the semantics of the flag you're proposing to add. Does it having something to do with the ordering of the reduction operations? The flag is only useful for vectorized reduction for now. I'll give

[RFC] Introducing a vector reduction add instruction.

2015 Nov 25

[RFC] Introducing a vector reduction add instruction.

----- Original Message ----- > From: "Xinliang David Li" <davidxl at google.com> > To: "Cong Hou" <congh at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, November 25, 2015 5:17:58 PM > Subject: Re: [llvm-dev] [RFC] Introducing a vector reduction add

An assembly optimization and fix

2004 Sep 10

An assembly optimization and fix

I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov function and fixed bug when data_len == 0. Now the function is about 50% faster and flac -5 is about 5% faster on my box. I have tested it thoroughly, I think it can go to flac 1.0.4. -- Miroslav Lichvar -------------- next part -------------- --- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100 +++

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer

[LLVMdev] How does SSEDomainFix work?

2010 May 11

[LLVMdev] How does SSEDomainFix work?

On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ！ > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for

Compilation issues with s390

2006 May 25

Compilation issues with s390

Hi all, I'm trying to compile asterisk on the mainframe (s390 / s390x) and I am running into issues. I was wondering if somebody could give a hand? I'm thinking that I should be able to do this. I have noticed that Debian even has binary RPM's out for Asterisk now. I'm trying to do this on SuSE SLES8 (with the 2.4 kernel). What I see is, an issue that arch=s390 isn't

MMX/mmxext optimisations

2004 Aug 24

MMX/mmxext optimisations

quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin

[LLVMdev] How does SSEDomainFix work?

2010 May 11

[LLVMdev] How does SSEDomainFix work?

Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x,

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

Display problem

2006 Mar 22

Display problem

hey guys, Does anyone know why the french letter "?" is displayed as a question mark "?" inside an <h1> tag ? -- Posted via http://www.ruby-forum.com/.

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 01

RFC: A proposal for vectorizing loops with calls to math functions using SVML

RFC: A proposal for vectorizing loops with calls to math functions using SVML (short vector math library). ========= Overview ========= Very simply, SVML (Intel short vector math library) functions are vector variants of scalar math functions that take vector arguments, apply an operation to each element, and store the result in a vector register. These vector variants can be generated by the

experimental patch for libtheora1.1beta3

2009 Aug 30

experimental patch for libtheora1.1beta3

Good morning in the Lord Regarding the port of libtheora1.1beta3 for OpenBSD for amd64 and the problem I described at: http://lists.xiph.org/pipermail/theora/2009-August/002640.html Attached is a patch for libtheora/patches/patch-lib_x86_mmxencfrag_c I can play videos with it. ?Does it work for you? Best regards -- Dios, gracias por tu amor infinito.

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 04

RFC: A proposal for vectorizing loops with calls to math functions using SVML

Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay

isolinux, extlinux and accented characters

2011 Jan 04

isolinux, extlinux and accented characters

Hello, First of all, thanks very much for this piece of software, it's working like a charm. I'm using it to boot a Linux distro I'm working on. What I want to deal with is the display of accented characters in ISOLINUX. First, I set EXTLINUX up to boot an ext2-formatted USB stick. No problem so far. I wrote a config file ordering to load a font (lat-9w) to display french

Naissance de fr.centos.org / Birth of fr.centos.org

2008 Mar 15

Naissance de fr.centos.org / Birth of fr.centos.org

(For non native french speakers : that will be my only announce here in another language than english ;-) ) Le projet CentOS est heureux de vous annoncer la naissance du site http://fr.centos.org . En r?ponse ? la demande croissante de la communaut? des utilisateurs francophones de CentOS, le forum fr.centos.org a vu le jour. Nous profitons de cette annonce pour relancer l'appel aux

[LLVMdev] [llvm-commits] [dragonegg] r168787 - in /dragonegg/trunk: src/x86/Target.cpp src/x86/x86_builtins test/validator/c/copysignp.c

2012 Nov 28

[LLVMdev] [llvm-commits] [dragonegg] r168787 - in /dragonegg/trunk: src/x86/Target.cpp src/x86/x86_builtins test/validator/c/copysignp.c

Hi Pawel, can you please pull this dragonegg patch into 3.2. I am the code owner for dragonegg. Thanks a lot, Duncan. On 28/11/12 13:44, Duncan Sands wrote: > Author: baldrick > Date: Wed Nov 28 06:44:50 2012 > New Revision: 168787 > > URL: http://llvm.org/viewvc/llvm-project?rev=168787&view=rev > Log: > Add support for GCC's vector copysign builtins, fixing

similar to: [LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops