thr3ads.net - similar to: "[LLVMdev] Lowering to MMX"

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] Lowering to MMX"

2011 Oct 25

[LLVMdev] Lowering to MMX

On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > Hi all, > > I'm working on a graphics project which uses LLVM for dynamic code > generation, and I noticed a major performance regression when upgrading > from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I skipped it > entirely). > > I found out that the performance regression is due to removing

[LLVMdev] Lowering to MMX

2011 Oct 26

[LLVMdev] Lowering to MMX

Hi Bill, Comments inline: On 24/10/2011 9:50 PM, Bill Wendling wrote: > On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > >> Hi all, >> >> I'm working on a graphics project which uses LLVM for dynamic code >> generation, and I noticed a major performance regression when upgrading >> from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I

[LLVMdev] Lowering to MMX

2011 Oct 26

[LLVMdev] Lowering to MMX

On Oct 26, 2011, at 1:18 PM, Nicolas Capens wrote: > On 24/10/2011 9:50 PM, Bill Wendling wrote: >> On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: >> >>> Hi all, >>> >>> I'm working on a graphics project which uses LLVM for dynamic code >>> generation, and I noticed a major performance regression when upgrading >>> from LLVM

[LLVMdev] Lowering to MMX

2011 Oct 25

[LLVMdev] Lowering to MMX

Hi Nicolas, > I found out that the performance regression is due to removing support > for lowering 64-bit vector operations to MMX, and using SSE2 instead. My > code uses a mix of MMX intrinsics and v4i16 operations, so it ping-pongs > back and forth between MMX and SSE2 instructions in the generated code. > > To get more optimal code, I see three options, and I was wondering

[PATCH] promised MMX patches rc1

2005 Mar 23

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

MMX/mmxext optimisations

2004 Aug 24

MMX/mmxext optimisations

quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Hi, I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind. Below I provide the optimized asm that is produced from

MMX loop filter for theora-exp

2005 Aug 17

MMX loop filter for theora-exp

Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Here's the optimized versions: $ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o - [...] define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 { %roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;

[LLVMdev] LLVM 2.8 and MMX

2010 Sep 07

[LLVMdev] LLVM 2.8 and MMX

On Sep 7, 2010, at 7:45 AM, Nicolas Capens wrote: > Hi all, > > I've tested a recent revision and noticed that using 64-bit vectors became very slow. It looks like they are expanded to non-MMX instructions to avoid breaking code which does not clear the MMX state using emms? > > For my project I'm already manually inserting emms instructions in the right places, so

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Using MM registers is wrong unless the user has specifically asked for it, which doesn't seem to be the case here. In the awesome MMX architecture, touching an MM register makes subsequent x87 operations fail unless an EMMS instruction is issued first; none of the compilers here are smart enough to insert EMMS instructions in the right places, so the only safe thing is not to use

[LLVMdev] LLVM 2.8 and MMX

2010 Sep 08

[LLVMdev] LLVM 2.8 and MMX

On Sep 8, 2010, at 7:24 AM, Eli Friedman wrote: > On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens > <nicolas.capens at gmail.com> wrote: >> Hi Chris, >> >> It's not broken, but the performance is crippled. >> >> I noticed that the code still contains some MMX instructions, but several >> operations get expanded (apparently swizzling and such

Legalizing vector types

2020 Jan 03

Legalizing vector types

Hi all, I am working on a target that has support for v4i16 vectors, and no support for v4i8 / v8i8 / v8i16 V4i8 is promoted to v4i16 which is nice V8i16 is split to 2 x v4i16 which is nice as well Now v8i8 is scalarized, which is not so nice. Ideally I would like v8i8 to be first promoted to v8i16 then split to 2xv4i16 (or split to 2xV4i8 then promoted to 2xv4i16) Is there a way to achieve

Proposal to remove MMX support.

2020 Aug 31

Proposal to remove MMX support.

On Mon, Aug 31, 2020 at 3:02 PM Eli Friedman <efriedma at quicinc.com> wrote: > Broadly speaking, I see two problems with implicitly enabling MMX > emulation on a target that has SSE2: > > > > 1. The interaction with inline asm. Inline asm can still have MMX > operands/results/clobbers, and can still put the processor in MMX mode. If > code is mixing MMX

Proposal to remove MMX support.

2020 Aug 30

Proposal to remove MMX support.

I recently diagnosed a bug in someone else's software, which turned out to be due to incorrect MMX intrinsics usage: if you use any of the x86 intrinsics that accept or return __m64 values, then you, the *programmer* are required to call _mm_empty() before using any x87 floating point instructions or leaving the function. I was aware that this was required at the assembly-level, but not that

[LLVMdev] LLVM 2.8 and MMX

2010 Sep 08

[LLVMdev] LLVM 2.8 and MMX

On Wed, Sep 8, 2010 at 12:35 AM, Nicolas Capens <nicolas.capens at gmail.com> wrote: > Hi Chris, > > It's not broken, but the performance is crippled. > > I noticed that the code still contains some MMX instructions, but several > operations get expanded (apparently swizzling and such get expanded to a > large number of byte moves). I think some changes related to

MMX IDCT for theora-exp

2005 Jul 20

MMX IDCT for theora-exp

Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

2011 Apr 14

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

I finally got all of the 3DNow! instruction intrinsics and builtins into LLVM and Clang, however, while testing them, I've noticed that they produce incorrect results. For example: typedef float V2f __attribute__((vector_size(8))); int main() { V2f dest, a = {1.0, 3.0}, b = {10.0, 3.5}; dest = __builtin_ia32_pfadd(a, b); printf("(%f, %f)\n", dest[0], dest[1]); } Should

[LLVMdev] LLVM 2.2 Release Notes

2008 Feb 11

[LLVMdev] LLVM 2.2 Release Notes

> This is a matter of presentation, but some of the "GCC extensions" are > standard C99 (now, at least). I noticed long long, C++-style comments > and designated initializers. > > I have plenty of complaints about the GCC documentation you're > pointing at, but this probably isn't the right forum for that. I do > think dropping "as fast as

[LLVMdev] X86 disassembler is quite broken on handling REX

2014 Dec 24

[LLVMdev] X86 disassembler is quite broken on handling REX

On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com> wrote: > I believe this particular error is caused by this. That seems easy enough > to just drop the bit. Do you have other non-mmx examples? > > case TYPE_MM: \ > if (index > 7) \ > *valid = 0;

similar to: [LLVMdev] Lowering to MMX