thr3ads.net - similar to: "[LLVMdev] Optimized code analysis problems"

Displaying 20 results from an estimated 600 matches similar to: "[LLVMdev] Optimized code analysis problems"

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

All, Attached is a patch that does two things. First it makes the use of the current SSE code a run time option through the use of speex_decoder_ctl() and speex_encoder_ctl It does this twofold. First there is a modification to the configure.in script which introduces a check based upon platform. It will compile in the sse assembly if you are on an i?86 based platform by making a

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

(Changing subject line as diagnosis has changed) I'm attaching the compiled code that I've been getting, both with CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring with CodeGenOpt::None, but that seems to be because ECX isn't being used - it still gets set to 0x7fffffff by one of the calls to 76719BA1 I notice that X86::SQRTPD[m|r] appear in

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N. The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function. I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

Hi all, My Machine environment is Clang-2.8-svn on Linux-x86. When I build ffmpeg-0.6 using Clang, error output: CC libavcodec/x86/mpegvideo_mmx.o fatal error: error in backend: Ran out of registers during register allocation! Please check your inline asm statement for invalid constraints: INLINEASM <es:movd %eax, %xmm3 pshuflw $$0, %xmm3, %xmm3 punpcklwd %xmm3, %xmm3

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > I've noticed that LLVM tends to generate suboptimal code and spill an > excessive amount of registers in large functions, such as in those > that are automatically generated by FFTW. One problem might be that we're forcing the 16 stores to the out array to happen in source order, which

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > >> I've noticed that LLVM tends to generate suboptimal code and spill an >> excessive amount of registers in large functions, such as in those >> that are automatically generated by FFTW. >

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I may get one or two in the next month, but not more than that. Focused on the pass manager for now. If none get there first, I'll eventually circle back though, so they won't rot forever. On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote: > I filed a couple more, in case they're actually different issues: > -

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable)

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

Hi, I've noticed that LLVM tends to generate suboptimal code and spill an excessive amount of registers in large functions, such as in those that are automatically generated by FFTW. LLVM generates good code for a function that computes an 8-point complex FFT, but from 16-point upwards, icc or gcc generates much better code. Here is an example of a sequence of instructions from a 32-point

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

Hi Frank, I recommend trying trunk LLVM. AVX-512 development has been very active recently. -Hal ----- Original Message ----- > From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org> > To: "LLVM Dev" <llvm-dev at lists.llvm.org> > Sent: Wednesday, June 29, 2016 2:41:39 PM > Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

avx512 JIT backend generates wrong code on <4 x float>

Hi Hal! Thanks, but unfortunately it didn't help. The exact same assembler instructions are generated for both 3.8 (yesterday) and trunk (from today). So, this really looks like a bug. Best, Frank On 06/29/2016 03:48 PM, Hal Finkel wrote: > Hi Frank, > > I recommend trying trunk LLVM. AVX-512 development has been very active recently. > > -Hal > > ----- Original

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not

[LLVMdev] X86 Tablegen Description and VEX.W

2012 Nov 08

[LLVMdev] X86 Tablegen Description and VEX.W

On Wed, Nov 7, 2012 at 10:52 PM, Anitha Boyapati <anitha.boyapati at gmail.com>wrote: ... > For the multiclass "fma4s", why is "mr" not inherited from "VEX_W" and > "MemOp4" like those of "rm" or "rr" ? > Hey Anitha, The VEX.W bit is used to denote operand order. In other words, this bit allows for a memop to be used as

similar to: [LLVMdev] Optimized code analysis problems