Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Generate scalar SSE instructions instead of packed instructions"
2013 Feb 21
0
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
You can change the input LLVM-IR.
On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote:
> Hi,
>
> I am interested in evaluating the performance of packed vs scalar double-precision floating point instructions on x86-atom and I was wondering if anyone knows more precisely where to modify llvm to use one or the other. I know I probably need
2013 Feb 21
2
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
On Thu, Feb 21, 2013 at 12:14 PM, Nadav Rotem <nrotem at apple.com> wrote:
> You can change the input LLVM-IR.
>
> On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com>
> wrote:
>
> Hi,****
>
> ** **
>
> I am interested in evaluating the performance of packed vs scalar
> double-precision floating point instructions on
2013 Feb 26
0
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
Thanks for the reply, they were very helpful.
Is it enough to prevent BBVectorize from packing together double precision instructions? If a non-clang frontend is used, such as ISPC, is it possible that the IR may contain packed double instruction?
Tyler
From: Cameron McInally [mailto:cameron.mcinally at nyu.edu]
Sent: Thursday, February 21, 2013 6:39 PM
To: Nowicki, Tyler
Cc: Nadav Rotem; LLVM
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Thanks, that did it!
Are there any plans to enable the loop vectorizer by default?
From: Nadav Rotem [mailto:nrotem at apple.com]
Sent: Wednesday, April 03, 2013 13:33 PM
To: Nowicki, Tyler
Cc: LLVM Developers Mailing List
Subject: Re: Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi,
I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double.
If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi Tyler,
Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations.
Thanks,
Nadav
On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote:
> Hi,
>
> I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are
2012 Jun 27
2
[LLVMdev] 8-bit DIV IR irregularities
Hi,
I noticed that when dividing with signed 8-bit values the IR uses a 32-bit signed divide, however, when unsigned 8-bit values are used the IR uses an 8-bit unsigned divide. Why not use a 8-bit signed divide when using 8-bit signed values?
Here is the C code and IR:
char idiv8(char a, char b)
{
char c = a / b;
return c;
}
define signext i8 @idiv8(i8 signext %a, i8 signext %b) nounwind
2012 Sep 28
2
[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
Hi,
Here is an update on our proposal to improve the uses of LEA on Atom processors.
1. Disable current generation of LEAs
Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs efficiently must know how many
2012 Jun 28
2
[LLVMdev] 8-bit DIV IR irregularities
I understand, but this sounds like legalization. Does every architecture trigger an overflow exception, as opposed to setting a bit? Perhaps it makes more sense to do this in the backends that trigger an overflow exception?
I'm working on a modification for DIV right now in the x86 backend for Intel Atom that will improve performance, however because the *actual* operation has been replaced
2012 Jun 27
0
[LLVMdev] 8-bit DIV IR irregularities
On Wed, Jun 27, 2012 at 4:02 PM, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:
> Hi,
>
>
>
> I noticed that when dividing with signed 8-bit values the IR uses a 32-bit
> signed divide, however, when unsigned 8-bit values are used the IR uses an
> 8-bit unsigned divide. Why not use a 8-bit signed divide when using 8-bit
> signed values?
"sdiv i8 -128,
2013 Sep 30
0
[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
Was there any development on this? I noticed that clang still produces
a lea for the testcase in llvm.org/pr13320.
On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:
> Hi,
>
>
>
> Here is an update on our proposal to improve the uses of LEA on Atom
> processors.
>
>
>
> 1. Disable current generation of LEAs
>
>
>
> Due to
2011 Nov 24
2
[LLVMdev] x86 backend assembly - mov esp->reg
Hi,
I've noticed an inconsistency with the x86 backend assembly output in how it treats arguments of a function. Here is a simple test to illustrate the inconsistency:
<from test.c>
void test()
{
char ac, bc, cc, dc, fc;
ac = (char)Rand();
bc = (char)Rand();
cc = (char)Rand();
dc = (char)Rand();
fc = PartialRegisterOperationsTestChar(ac, bc, cc, dc);
}
<from
2012 Jun 28
0
[LLVMdev] 8-bit DIV IR irregularities
On Wed, Jun 27, 2012 at 5:22 PM, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:
> I understand, but this sounds like legalization. Does every architecture trigger an overflow exception, as opposed to setting a bit? Perhaps it makes more sense to do this in the backends that trigger an overflow exception?
The IR instruction has undefined behavior on overflow. This has
nothing to do
2011 Nov 24
0
[LLVMdev] x86 backend assembly - mov esp->reg
On Thu, Nov 24, 2011 at 11:39:32AM -0700, Nowicki, Tyler wrote:
> When compiled for atom with clang in 32-bit mode the 8-bit variables
> in test use 32-bit registers:
That's fine since it can avoid partial stales and the value of the
padding is undefined.
> However, the 8-bit variables in PartialRegisterOperationsTestChar use
> 8-bit registers:
Same argument. It wants to use the
2012 Jun 18
2
[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?
Hi,
-Does anyone know where a backend-specific optimization can be added to replace an instruction with code containing control flow?
I'm interested in adding an optimization for the DIV instruction (x86-atom) which replace the IDIV/DIV with code containing control flow to select between the intended IDIV/DIV and an 8-bit DIV with movzx, as described in the Intel Atom Optimization Guide. My
2016 May 17
2
Working on FP SCEV Analysis
> On May 16, 2016, at 5:35 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> ----- Original Message -----
>> From: "Sanjoy Das via llvm-dev" <llvm-dev at lists.llvm.org>
>> To: escha at apple.com
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Michael V Zolotukhin" <michael.v.zolotukhin at
2012 Jun 19
0
[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?
Hi Tyler,
> -Does anyone know where a backend-specific optimization can be added to replace
> an instruction with code containing control flow?
I think the backend lowering of atomic intrinsics generates control flow
(loops), so that may give you a clue.
Ciao, Duncan.
2013 Apr 15
1
[LLVMdev] State of Loop Unrolling and Vectorization in LLVM
Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki)
{code}
extern float * array;
extern int array_size;
float g()
{
int i;
float total = 0;
for(i = 0; i < array_size; i++)
{
total += array[i];
}
return total;
}
{code}
When
2015 Aug 22
2
SSE return w/ elf64 ABI
Hi,
LLVM made a change a few months ago and starting erroring out when a float
is returned in x64 and SSE is disabled. This makes sense, really, since
it's specified by the ABI that the return value must be put in a register
you were told to disable, but it's breaking soft floats in Rust on x64. It
seems there are two options: LLVM could break the ABI spec and have working
soft floats on
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local: