thr3ads.net - similar to: "Status of llvm.experimental.vector.reduce.* intrinsics"

Displaying 20 results from an estimated 3000 matches similar to: "Status of llvm.experimental.vector.reduce.* intrinsics"

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 04

Status of llvm.experimental.vector.reduce.* intrinsics

I assume smaller types like <4 x i1> are getting zero extended to e.g., i8? Am 04.08.2017 um 15:58 schrieb Amara Emerson: > Actually for mask vectors of i1 values, you don't need to use reductions > at all(although for SVE this is what we'll do). You can instead bitcast > the vector value to an i8/i16/whatever and then compare against zero. > > Amara > > On

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 03

Status of llvm.experimental.vector.reduce.* intrinsics

Hi Amara, thank you for the clarification. I tested the intrinsics x86_64 and it seemed to work pretty well. Looking forward to try this intrinsics with the AArch64 backend. Maybe I find the time to look into codegen to get this intrinsics out of experimental stage. They seem pretty useful. Cheers, Michael -----Original Message----- From: Amara Emerson [amara.emerson at gmail.com] Received:

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I may get one or two in the next month, but not more than that. Focused on the pass manager for now. If none get there first, I'll eventually circle back though, so they won't rot forever. On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote: > I filed a couple more, in case they're actually different issues: > -

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 19

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, I have tested the new shuffle lowering on a AMD Jaguar cpu (which is AVX but not AVX2). On this particular target, there is a delay when output data from an execution unit is used as input to another execution unit of a different cluster. For example, There are 6 executions units which are divided into 3 execution clusters of Float(FPM,FPA), Vector Integer (MMXA,MMXB,IMM), and Store

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, While doing the performance measurement on a Ivy Bridge, I ran into compile time errors. I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i. E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with: fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

> On Sep 7, 2014, at 8:49 PM, Quentin Colombet <qcolombet at apple.com> wrote: > > Sure, > > Here is the command line: > clang -cc1 -triple x86_64-apple-macosx -S -disable-free -disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core-avx-i -O3 -ferror-limit 19 -fmessage-length 114

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explicitly said that we currently lack of

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

I've run the SingleSource test suite for core-avx-i and have no failures here so a preprocessed file + commandline would be very useful if this reproduces for you still. On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > I'm having trouble reproducing this. I'm trying to get LNT to actually > run, but manually compiling the given source

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Hi, I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind. Below I provide the optimized asm that is produced from

[LLVMdev] Shuffle regression

2008 Jul 12

[LLVMdev] Shuffle regression

Hi all, I think I found a regression in the shuffle instruction. I've attached a replacement of fibonacci.cpp to reproduce the issue. It runs fine on release 2.3 but revision 52648 fails, and I suspect that the issue is still present. 2.3 generates the following x86 code: 03A10010 push ebp 03A10011 mov ebp,esp 03A10013 and esp,0FFFFFFF0h 03A10019

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > >> I've noticed that LLVM tends to generate suboptimal code and spill an >> excessive amount of registers in large functions, such as in those >> that are automatically generated by FFTW. >

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

2017 Mar 01

[Codegen bug in LLVM 3.8?] br following `fcmp une` is present in ll, absent in asm

Hi, We seem to have found a bug in the LLVM 3.8 code generator. We are using MCJIT and have isolated working.ll and broken.ll after middle-end optimizations -- in the block merge128, notice that broken.ll has a fcmp une comparison to zero and a jump based on that branch: merge128: ; preds = %true71, %false72 %_rtB_724 = load %B_repro_T*, %B_repro_T**

similar to: Status of llvm.experimental.vector.reduce.* intrinsics