thr3ads.net - similar to: "[LLVMdev] Generate scalar SSE instructions instead of packed instructions"

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Generate scalar SSE instructions instead of packed instructions"

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

2013 Feb 21

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

You can change the input LLVM-IR. On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I am interested in evaluating the performance of packed vs scalar double-precision floating point instructions on x86-atom and I was wondering if anyone knows more precisely where to modify llvm to use one or the other. I know I probably need

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

2013 Feb 21

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

On Thu, Feb 21, 2013 at 12:14 PM, Nadav Rotem <nrotem at apple.com> wrote: > You can change the input LLVM-IR. > > On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> > wrote: > > Hi,**** > > ** ** > > I am interested in evaluating the performance of packed vs scalar > double-precision floating point instructions on

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

2013 Feb 26

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

Thanks for the reply, they were very helpful. Is it enough to prevent BBVectorize from packing together double precision instructions? If a non-clang frontend is used, such as ISPC, is it possible that the IR may contain packed double instruction? Tyler From: Cameron McInally [mailto:cameron.mcinally at nyu.edu] Sent: Thursday, February 21, 2013 6:39 PM To: Nowicki, Tyler Cc: Nadav Rotem; LLVM

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations. Thanks, Nadav On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are

[LLVMdev] 8-bit DIV IR irregularities

2012 Jun 27

[LLVMdev] 8-bit DIV IR irregularities

Hi, I noticed that when dividing with signed 8-bit values the IR uses a 32-bit signed divide, however, when unsigned 8-bit values are used the IR uses an 8-bit unsigned divide. Why not use a 8-bit signed divide when using 8-bit signed values? Here is the C code and IR: char idiv8(char a, char b) { char c = a / b; return c; } define signext i8 @idiv8(i8 signext %a, i8 signext %b) nounwind

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

2012 Sep 28

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Hi, Here is an update on our proposal to improve the uses of LEA on Atom processors. 1. Disable current generation of LEAs Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs efficiently must know how many

[LLVMdev] 8-bit DIV IR irregularities

2012 Jun 28

[LLVMdev] 8-bit DIV IR irregularities

I understand, but this sounds like legalization. Does every architecture trigger an overflow exception, as opposed to setting a bit? Perhaps it makes more sense to do this in the backends that trigger an overflow exception? I'm working on a modification for DIV right now in the x86 backend for Intel Atom that will improve performance, however because the *actual* operation has been replaced

[LLVMdev] 8-bit DIV IR irregularities

2012 Jun 27

[LLVMdev] 8-bit DIV IR irregularities

On Wed, Jun 27, 2012 at 4:02 PM, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > Hi, > > > > I noticed that when dividing with signed 8-bit values the IR uses a 32-bit > signed divide, however, when unsigned 8-bit values are used the IR uses an > 8-bit unsigned divide. Why not use a 8-bit signed divide when using 8-bit > signed values? "sdiv i8 -128,

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

2013 Sep 30

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs > > > > Due to

[LLVMdev] x86 backend assembly - mov esp->reg

2011 Nov 24

[LLVMdev] x86 backend assembly - mov esp->reg

Hi, I've noticed an inconsistency with the x86 backend assembly output in how it treats arguments of a function. Here is a simple test to illustrate the inconsistency: <from test.c> void test() { char ac, bc, cc, dc, fc; ac = (char)Rand(); bc = (char)Rand(); cc = (char)Rand(); dc = (char)Rand(); fc = PartialRegisterOperationsTestChar(ac, bc, cc, dc); } <from

[LLVMdev] 8-bit DIV IR irregularities

2012 Jun 28

[LLVMdev] 8-bit DIV IR irregularities

On Wed, Jun 27, 2012 at 5:22 PM, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > I understand, but this sounds like legalization. Does every architecture trigger an overflow exception, as opposed to setting a bit? Perhaps it makes more sense to do this in the backends that trigger an overflow exception? The IR instruction has undefined behavior on overflow. This has nothing to do

[LLVMdev] x86 backend assembly - mov esp->reg

2011 Nov 24

[LLVMdev] x86 backend assembly - mov esp->reg

On Thu, Nov 24, 2011 at 11:39:32AM -0700, Nowicki, Tyler wrote: > When compiled for atom with clang in 32-bit mode the 8-bit variables > in test use 32-bit registers: That's fine since it can avoid partial stales and the value of the padding is undefined. > However, the 8-bit variables in PartialRegisterOperationsTestChar use > 8-bit registers: Same argument. It wants to use the

[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?

2012 Jun 18

[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?

Hi, -Does anyone know where a backend-specific optimization can be added to replace an instruction with code containing control flow? I'm interested in adding an optimization for the DIV instruction (x86-atom) which replace the IDIV/DIV with code containing control flow to select between the intended IDIV/DIV and an 8-bit DIV with movzx, as described in the Intel Atom Optimization Guide. My

Working on FP SCEV Analysis

2016 May 17

Working on FP SCEV Analysis

> On May 16, 2016, at 5:35 PM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > ----- Original Message ----- >> From: "Sanjoy Das via llvm-dev" <llvm-dev at lists.llvm.org> >> To: escha at apple.com >> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Michael V Zolotukhin" <michael.v.zolotukhin at

[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?

2012 Jun 19

[LLVMdev] Best way to replace LLVM IR operation with code containing control flow?

Hi Tyler, > -Does anyone know where a backend-specific optimization can be added to replace > an instruction with code containing control flow? I think the backend lowering of atomic intrinsics generates control flow (loops), so that may give you a clue. Ciao, Duncan.

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

2013 Apr 15

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki) {code} extern float * array; extern int array_size; float g() { int i; float total = 0; for(i = 0; i < array_size; i++) { total += array[i]; } return total; } {code} When

SSE return w/ elf64 ABI

2015 Aug 22

SSE return w/ elf64 ABI

Hi, LLVM made a change a few months ago and starting erroring out when a float is returned in x64 and SSE is disabled. This makes sense, really, since it's specified by the ABI that the return value must be put in a register you were told to disable, but it's breaking soft floats in Rust on x64. It seems there are two options: LLVM could break the ABI spec and have working soft floats on

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:

similar to: [LLVMdev] Generate scalar SSE instructions instead of packed instructions