similar to: [LLVMdev] How does SSEDomainFix work?

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] How does SSEDomainFix work?"

2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ! > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The
2017 Aug 18
1
[PATCH] fix alignment exceptions
Jonathan, Here's the code difference we see with the recent change -- what amounts to reverting your change from a couple years back. It doesn't look like we're getting superfluous instructions from clang now. the bad behavior for us was the alignment exception on the movdqa instructions when the input data wasn't 128-bit aligned. We had to change something because the code as-is
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2017 Aug 18
2
[PATCH] fix alignment exceptions
We see the MOVQ instruction but this patch deliberately uses it rather than MOVQDA (load 128-bits aligned). We were seeing that with the trace below, the final invocation is not 128-bit aligned but MOVQDA insists on it (the calling function was pitch_sse4_1.c:90, in the 4-way N - i >= 4 loop). 07-31 11:00:13.469 210 2540 <(469)%20210-2540> D opus_sse1: RBE celt_inner_prod_sse4_1: x
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
the clang 3.5 loop optimizer seems to jump in unintentional for simple loops the very simple example ---- const int SIZE = 3; int the_func(int* p_array) { int dummy = 0; #if defined(ITER) for(int* p = &p_array[0]; p < &p_array[SIZE]; ++p) dummy += *p; #else for(int i = 0; i < SIZE; ++i) dummy += p_array[i]; #endif return dummy; } int main(int argc, char** argv) {
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > >> I've noticed that LLVM tends to generate suboptimal code and spill an >> excessive amount of registers in large functions, such as in those >> that are automatically generated by FFTW. >
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get
2016 May 26
0
enabling interleaved access loop vectorization
On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for
2009 Mar 11
4
[LLVMdev] Bug in X86CompilationCallback_SSE
I don't know how to file a PR, but I have a patch (see below), that should work regardless of abi differences, since it relies on the compiler to do the though job. void X86CompilationCallback_SSE(void) { char * SAVEBUF= (char*) alloca(64+12); // alloca is 16byte aligned asm volatile ( "movl %%eax,(%0)\n" "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>>
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > I've noticed that LLVM tends to generate suboptimal code and spill an > excessive amount of registers in large functions, such as in those > that are automatically generated by FFTW. One problem might be that we're forcing the 16 stores to the out array to happen in source order, which
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
Hi, I've noticed that LLVM tends to generate suboptimal code and spill an excessive amount of registers in large functions, such as in those that are automatically generated by FFTW. LLVM generates good code for a function that computes an 8-point complex FFT, but from 16-point upwards, icc or gcc generates much better code. Here is an example of a sequence of instructions from a 32-point
2009 Mar 10
2
[LLVMdev] Bug in X86CompilationCallback_SSE
Hello. I found that the X86CompilationCallback_SSE wrapper for X86CompilationCallback2 is not setting up properly for the PIC invocation. Before you can correctly invoke a function via the Procedure Linkage Table (plt), the ABI mandates that ebx is pointing to the GOT (Global Offset Table) (see http://www.greyhat.ch/lab/downloads/pic.html) Dump of assembler code for function
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include