Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] SIMD for sdiv <2 x i64>"
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote:
>
> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing also happens to zext <2 x i32> -> <2 x
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
On 07/24/2015 03:42 AM, Benjamin Kramer wrote:
>> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote:
>>
>> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
------------------------------------ IR
------------------------------------------------------------------
if.then.i.i.i.i.i.i: ; preds = %if.then4
%S25_D = zext <2 x i32> %splatLDS17_D.splat to <2 x i64>
%umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3>
%extumul_with_overflow.i.iS26_D = extractelement <2 x i64>
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
This snippet of IR is interesting:
%sub.ptr.div.iS37_D = sdiv <2 x i64> %sub.ptr.sub.iS36_D, <i64 24,
i64 24>
%cmp10S38_D = icmp ugt <2 x i64> %sub.ptr.div.iS37_D,
%splatInsMapS1_D.splat
%zextS39_D = sext <2 x i1> %cmp10S38_D to <2 x i64>
%BCS39_D = bitcast <2 x i64> %zextS39_D to i128
%mskS39_D = icmp ne i128 %BCS39_D, 0
br i1 %mskS39_D,
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
For example, I have the following IR code,
for.cond.preheader: ; preds = %if.end18
%mul = mul i32 %12, %3
%cmp21128 = icmp sgt i32 %mul, 0
br i1 %cmp21128, label %for.body.preheader, label %return
for.body.preheader: ; preds =
%for.cond.preheader
%19 = mul i32 %12, %3
%20 = add i32 %19, -1
%21 = zext i32 %20 to i64
%22 =
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
wrote:
> Hi Chandler,
>
> I've been looking at the regressions Quentin mentioned, and filed a PR
> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
>
> As for the others, I'm working on reducing them, but for now, here are
> some raw observations, in case any of
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I filed a couple more, in case they're actually different issues:
- http://llvm.org/bugs/show_bug.cgi?id=22412
- http://llvm.org/bugs/show_bug.cgi?id=22413
And that's pretty much it for internal changes. I'm fine with flipping the
switch; Quentin, are you?
Also, just to have an idea, do you (or someone else!) plan to tackle these
in the near future?
-Ahmed
On Thu, Jan 29, 2015 at
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
>
> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com>
> wrote:
>
>> Hi Chandler,
>>
>> I've been looking at the regressions Quentin mentioned, and filed a PR
>> for the most egregious one: http://llvm.org/bugs/show_bug.cgi?id=22377
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
I may get one or two in the next month, but not more than that. Focused on
the pass manager for now. If none get there first, I'll eventually circle
back though, so they won't rot forever.
On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at gmail.com> wrote:
> I filed a couple more, in case they're actually different issues:
> -
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
Dear LLVMdev,
I've noticed an unusual behavior of the LLVM x86 code generator (with default options)
that results in nearly a 4x slow-down in floating-point throughput for my microbenchmark.
I've written a compute-intensive microbenchmark to approach theoretical peak
throughput of the target processor by issuing a large number of independent
floating-point multiplies. The distance
2015 Jan 23
5
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
Greetings LLVM hackers and x86 vector shufflers!
I would like to flip on another chunk of the new vector shuffling,
specifically the logic to mark ~all shuffles as "legal".
This can be tested today with the flag
"-x86-experimental-vector-shuffle-legality". I would essentially like to
make this the default (by removing the "false" path). Doing this will allow
me to
2010 Oct 20
2
[LLVMdev] llvm register reload/spilling around calls
On 20.10.2010 05:00, Jakob Stoklund Olesen wrote:
> On Oct 19, 2010, at 6:37 PM, Roland Scheidegger wrote:
>
>> Thanks for giving it a look!
>>
>> On 19.10.2010 23:21, Jakob Stoklund Olesen wrote:
>>> On Oct 19, 2010, at 11:40 AM, Roland Scheidegger wrote:
>>>
>>>> So I saw that the code is doing lots of register
>>>>
2012 Mar 28
2
[LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior and wanted to ask for
some advice. For the C program below, function sum() gets inlined in
foo() but the code generated looks very suboptimal (the code is an
extract from a larger program).
Below I show the 32-bit x86 assembly as produced by the demo page on
the llvm home page ("Output A"). As you can see from the assembly,
after
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
Hello,
Depending on how I extract integer lanes from an x86_64 xmm register, the
backend may spill that register in order to load scalars. The effect was
observed on two targets: corei7-avx and btver1 (I haven't checked other
targets).
Here's a test case with spilling/no-spilling code put on conditional
compile:
#if __SSE4_1__ != 0
#include <smmintrin.h>
#else
#include
2011 Sep 27
2
[LLVMdev] Poor code generation for odd sized vectors
Hi all,
I'm compiling LLCM IR code like this on x86-64:
define linkonce ccc <16 x float> @vector_add_float(<16 x float> %a.78, <16 x float> %a.79) align 8
{
entry:
%result.80 = fadd <16 x float> %a.78, %a.79
ret <18 x float> %result.80
}
This works really well when the vector length (16 in the above) is
an integer multiple of the SSE vector
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
Hi,
While investigating a performance issue with an internal codebase I
came across what looks to be poor register allocation. I have
constructed a small(ish) reproducible which demonstrates the issue
(see test.ll attached).
I have spent some time going through the register allocator to
understand what is happening. I have also experimented with some
small changes to try and improve the
2010 Oct 20
0
[LLVMdev] llvm register reload/spilling around calls
On Oct 20, 2010, at 7:46 AM, Roland Scheidegger wrote:
> On 20.10.2010 05:00, Jakob Stoklund Olesen wrote:
>> Look in X86InstrControl.td. The call instructions are all prefixed
>> by:
>>
>> let Defs = [RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11, FP0, FP1, FP2,
>> FP3, FP4, FP5, FP6, ST0, ST1, MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
>> XMM0, XMM1, XMM2, XMM3,
2012 Apr 05
0
[LLVMdev] Suboptimal code due to excessive spilling
I don't know much about this, but maybe -mllvm -unroll-count=1 can be used as a workaround?
/Patrik Hägglund
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Brent Walker
Sent: den 28 mars 2012 03:18
To: llvmdev
Subject: [LLVMdev] Suboptimal code due to excessive spilling
Hi,
I have run into the following strange behavior