thr3ads.net - similar to: "[LLVMdev] SIMD for sdiv <2 x i64>"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] SIMD for sdiv <2 x i64>"

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: > > It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing also happens to zext <2 x i32> -> <2 x

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

------------------------------------ IR ------------------------------------------------------------------ if.then.i.i.i.i.i.i: ; preds = %if.then4 %S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64>

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

This snippet of IR is interesting: %sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24, i64 24> %cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D, %splatInsMapS1_D.splat %zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64> %BCS39_D = bitcast <2 x i64> %zextS39_D to i128 %mskS39_D = icmp ne i128 %BCS39_D, 0 br i1 %mskS39_D,

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

[LLVMdev] Can LLVM vectorize <2 x i32> type

For example, I have the following IR code, for.cond.preheader: ; preds = %if.end18 %mul = mul i32 %12, %3 %cmp21128 = icmp sgt i32 %mul, 0 br i1 %cmp21128, label %for.body.preheader, label %return for.body.preheader: ; preds = %for.cond.preheader %19 = mul i32 %12, %3 %20 = add i32 %19, -1 %21 = zext i32 %20 to i64 %22 =

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > Hi Chandler, > > I've been looking at the regressions Quentin mentioned, and filed a PR > for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377 > > As for the others, I'm working on reducing them, but for now, here are > some raw observations, in case any of

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I filed a couple more, in case they're actually different issues: - http://llvm.org/bugs/show_bug.cgi?id=22412 - http://llvm.org/bugs/show_bug.cgi?id=22413 And that's pretty much it for internal changes. I'm fine with flipping the switch; Quentin, are you? Also, just to have an idea, do you (or someone else!) plan to tackle these in the near future? -Ahmed On Thu, Jan 29, 2015 at

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com> wrote: > > On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > >> Hi Chandler, >> >> I've been looking at the regressions Quentin mentioned, and filed a PR >> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I may get one or two in the next month, but not more than that. Focused on the pass manager for now. If none get there first, I'll eventually circle back though, so they won't rot forever. On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote: > I filed a couple more, in case they're actually different issues: > -

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

Dear LLVMdev, I've noticed an unusual behavior of the LLVM x86 code generator (with default options) that results in nearly a 4x slow-down in floating-point throughput for my microbenchmark. I've written a compute-intensive microbenchmark to approach theoretical peak throughput of the target processor by issuing a large number of independent floating-point multiplies. The distance

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 23

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Greetings LLVM hackers and x86 vector shufflers! I would like to flip on another chunk of the new vector shuffling, specifically the logic to mark ~all shuffles as "legal". This can be tested today with the flag "-x86-experimental-vector-shuffle-legality". I would essentially like to make this the default (by removing the "false" path). Doing this will allow me to

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: > On Oct 19, 2010, at 6:37 PM, Roland Scheidegger wrote: > >> Thanks for giving it a look! >> >> On 19.10.2010 23:21, Jakob Stoklund Olesen wrote: >>> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote: >>> >>>> So I saw that the code is doing lots of register >>>>

[LLVMdev] Suboptimal code due to excessive spilling

2012 Mar 28

[LLVMdev] Suboptimal code due to excessive spilling

Hi, I have run into the following strange behavior and wanted to ask for some advice. For the C program below, function sum() gets inlined in foo() but the code generated looks very suboptimal (the code is an extract from a larger program). Below I show the 32-bit x86 assembly as produced by the demo page on the llvm home page ("Output A"). As you can see from the assembly, after

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

[LLVMdev] Poor code generation for odd sized vectors

2011 Sep 27

[LLVMdev] Poor code generation for odd sized vectors

Hi all, I'm compiling LLCM IR code like this on x86-64: define linkonce ccc <16 x float> @vector_add_float(<16 x float> %a.78, <16 x float> %a.79) align 8 { entry: %result.80 = fadd <16 x float> %a.78, %a.79 ret <18 x float> %result.80 } This works really well when the vector length (16 in the above) is an integer multiple of the SSE vector

[LLVMdev] Poor register allocation (constants causing spilling)

2015 Jul 14

[LLVMdev] Poor register allocation (constants causing spilling)

Hi, While investigating a performance issue with an internal codebase I came across what looks to be poor register allocation. I have constructed a small(ish) reproducible which demonstrates the issue (see test.ll attached). I have spent some time going through the register allocator to understand what is happening. I have also experimented with some small changes to try and improve the

[LLVMdev] llvm register reload/spilling around calls

2010 Oct 20

[LLVMdev] llvm register reload/spilling around calls

On Oct 20, 2010, at 7:46 AM, Roland Scheidegger wrote: > On 20.10.2010 05:00, Jakob Stoklund Olesen wrote: >> Look in X86InstrControl.td. The call instructions are all prefixed >> by: >> >> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2, >> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, >> XMM0, XMM1, XMM2, XMM3,

[LLVMdev] Suboptimal code due to excessive spilling

2012 Apr 05

[LLVMdev] Suboptimal code due to excessive spilling

I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround? /Patrik Hägglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker Sent: den 28 mars 2012 03:18 To: llvmdev Subject: [LLVMdev] Suboptimal code due to excessive spilling Hi, I have run into the following strange behavior

similar to: [LLVMdev] SIMD for sdiv <2 x i64>