similar to: Vectorization of math function failed?

Displaying 20 results from an estimated 300 matches similar to: "Vectorization of math function failed?"

2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, While doing the performance measurement on a Ivy Bridge, I ran into compile time errors. I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i. E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with: fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explicitly said that we currently lack of
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
I've run the SingleSource test suite for core-avx-i and have no failures here so a preprocessed file + commandline would be very useful if this reproduces for you still. On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > I'm having trouble reproducing this. I'm trying to get LNT to actually > run, but manually compiling the given source
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote: > > Sure, > > Here is the command line: > clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I may get one or two in the next month, but not more than that. Focused on the pass manager for now. If none get there first, I'll eventually circle back though, so they won't rot forever. On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote: > I filed a couple more, in case they're actually different issues: > -
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 9, 2014, at 1:47 PM, Sean Silva <chisophugis at gmail.com> wrote: > > > > On Tue, Sep 9, 2014 at 12:53 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote: > Hi Chandler, > > I had observed some improvements and regressions with the new lowering. > > Here are the numbers for an Ivy Bridge machine fixed at
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
Hi, Here is a small C++ program: vec.cc: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { return 1.0 / x; } v4f32 fct2(v4f32 x) { return __builtin_ia32_rcpps(x); } Which is compiled to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0d 00 00 vbroadcastss
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
> On Sep 9, 2014, at 3:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at gmail.com>> wrote: > You have already mentioned how the new shuffle lowering is missing > some
2016 Nov 23
4
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
Hi All. This is an RFC for a proposed target specific X86 optimization for reducing code size in the encoding of AVX-512 instructions when possible. When the AVX512F instruction set was introduced in X86 it included additional 32 registers of 512bit size each ZMM0 - ZMM31, as well as additional 16 XMM registers XMM16-XMM31 and 16 YMM registers YMM16-YMM31. In order to encode the new registers of
2014 Sep 30
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about that. I'll fix those. On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com > wrote: > Hi Chandler, > > Here is another test. > > When looking at the AVX codegen, I noticed that, when using the new > shuffle lowering, we no longer emit a single vbroadcastss in the case
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
Hi Quentin, You are correct, I could manage to get clang to use vrcpps, but not in a satisfying way: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \ -c -o vec.o vec.cc 0000000000000140 <_Z4fct4Dv4_f>: 140: c5 f8 53 c8
2016 Nov 23
2
RFC: code size reduction in X86 by replacing EVEX with VEX encoding
I would like a command line option to disable this optimization. That way tests can still verify that EVEX instructions came out of isel by using -show-mc-encoding. On Wed, Nov 23, 2016 at 5:01 AM Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > ------------------------------ > > *From: *"Gadi via llvm-dev Haber" <llvm-dev at lists.llvm.org> >
2014 Sep 19
4
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to