thr3ads.net - similar to: "[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected."

Displaying 20 results from an estimated 400 matches similar to: "[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected."

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

2011 Apr 14

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

On Thu, Apr 14, 2011 at 12:16 PM, Michael Spencer <bigcheesegs at gmail.com> wrote: > I finally got all of the 3DNow! instruction intrinsics and builtins > into LLVM and Clang, however, while testing them, I've noticed that > they produce incorrect results. > > For example: > > typedef float V2f __attribute__((vector_size(8))); > > int main() { > V2f dest,

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

2011 Apr 14

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

On Apr 14, 2011, at 12:47 PM, Eli Friedman wrote: >> I looked at the program using a debugger, and the pfadd instruction is >> executed correctly and the MMX register contains the correct values. >> The code that prepares the stack for the printf call seems to be >> messing it up. > > I would call that "user error"; basically, using MMX instructions >

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

2011 Apr 14

[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.

On Thu, Apr 14, 2011 at 5:37 PM, Chris Lattner <clattner at apple.com> wrote: > > On Apr 14, 2011, at 12:47 PM, Eli Friedman wrote: > >>> I looked at the program using a debugger, and the pfadd instruction is >>> executed correctly and the MMX register contains the correct values. >>> The code that prepares the stack for the printf call seems to be

Ref Classes: bug with using '.self' within initialize methods?

2011 Jun 29

Ref Classes: bug with using '.self' within initialize methods?

Dear list, I'm wondering if the following error I'm getting is a small bug in the Reference Class paradigm or if it makes perfect sense. When you write an explicit initialize method for a Ref Class, can you then make use of '.self' WITHIN this initialize method just as you would once an object of the class has actually been initialized? Because it seems to me that you can not.

MMX loop filter for theora-exp

2005 Aug 17

MMX loop filter for theora-exp

Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask

[LLVMdev] Lowering to MMX

2011 Oct 26

[LLVMdev] Lowering to MMX

On Oct 26, 2011, at 1:18 PM, Nicolas Capens wrote: > On 24/10/2011 9:50 PM, Bill Wendling wrote: >> On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: >> >>> Hi all, >>> >>> I'm working on a graphics project which uses LLVM for dynamic code >>> generation, and I noticed a major performance regression when upgrading >>> from LLVM

An assembly optimization and fix

2004 Sep 10

An assembly optimization and fix

I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov function and fixed bug when data_len == 0. Now the function is about 50% faster and flac -5 is about 5% faster on my box. I have tested it thoroughly, I think it can go to flac 1.0.4. -- Miroslav Lichvar -------------- next part -------------- --- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100 +++

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

On Aug 31, 2010, at 1:21 PMPDT, Argyrios Kyrtzidis wrote: > > Just to be clear, are you saying that the fact that, after using llc > on the second IR, the produced asm is using MM registers, indicates > a bug ? Yes. It's not immediately obvious whether it's in the opt or llc, though. Chris was doing work involving <2 x float> and may know about this. >

[PATCH]

2005 Mar 23

[PATCH]

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

[LLVMdev] Using MSVC _ftol2 runtime function for fptoui on Win32

2013 Feb 13

[LLVMdev] Using MSVC _ftol2 runtime function for fptoui on Win32

Hi Joe & Michael, In rev. 151382 you have changed the fptoui implementation of the x86 codegen for win32. Before the change fptoui was lowered to flds 16(%esp) fisttpll 8(%esp) movl 8(%esp), %eax After the change fptoui is lowered to flds 40(%esp) calll _ftol2 Please note that the assumption that _ftol2 doesn't modify ECX isn't true on sandybridge platform.

[PATCH] promised MMX patches rc1

2005 Mar 23

[PATCH] promised MMX patches rc1

PATCH: Add fields argument to installed.packages and available.packages

2006 Aug 29

PATCH: Add fields argument to installed.packages and available.packages

Hi all, The write_PACKAGES function has a 'fields' argument that allows a user generating a PACKAGES file to specify additional fields to include. For symmetry, it would be nice for the available.packages function to be able to read those extra fields when specified. Similarly, it would be useful for installed.packages to have a 'fields' argument. This would allow a user to

experimental patch for libtheora1.1beta3

2009 Aug 30

experimental patch for libtheora1.1beta3

Good morning in the Lord Regarding the port of libtheora1.1beta3 for OpenBSD for amd64 and the problem I described at: http://lists.xiph.org/pipermail/theora/2009-August/002640.html Attached is a patch for libtheora/patches/patch-lib_x86_mmxencfrag_c I can play videos with it. ?Does it work for you? Best regards -- Dios, gracias por tu amor infinito.

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Using MM registers is wrong unless the user has specifically asked for it, which doesn't seem to be the case here. In the awesome MMX architecture, touching an MM register makes subsequent x87 operations fail unless an EMMS instruction is issued first; none of the compilers here are smart enough to insert EMMS instructions in the right places, so the only safe thing is not to use

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Here's the optimized versions: $ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o - [...] define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 { %roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;

Diference in results from doBy::popMeans, multcomp::glht and contrast::contrast for a lme model

2012 Nov 05

Diference in results from doBy::popMeans, multcomp::glht and contrast::contrast for a lme model

Hello R users, I'm analyzing an experiment in a balanced incomplet block design (BIB). The effect of blocks are assumed to be random, so I'm using nlme::lme for this. I'm analysing another more complex experiments and I notice some diferences from doBy::popMeans() compared multcomp::glht() and contrast::contrast(). In my example, glht() and contrast() were equal I suspect popMeans()

[LLVMdev] Lowering to MMX

2011 Oct 26

[LLVMdev] Lowering to MMX

Hi Bill, Comments inline: On 24/10/2011 9:50 PM, Bill Wendling wrote: > On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote: > >> Hi all, >> >> I'm working on a graphics project which uses LLVM for dynamic code >> generation, and I noticed a major performance regression when upgrading >> from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I

[LLVMdev] X86 disassembler is quite broken on handling REX

2014 Dec 24

[LLVMdev] X86 disassembler is quite broken on handling REX

hi, i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code. below are some examples: $ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64 .text por %mm3, %mm0 $ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64 .text por %mm3, %mm0 $ echo

[LLVMdev] X86 disassembler is quite broken on handling REX

2014 Dec 24

[LLVMdev] X86 disassembler is quite broken on handling REX

On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com> wrote: > I believe this particular error is caused by this. That seems easy enough > to just drop the bit. Do you have other non-mmx examples? > > case TYPE_MM: \ > if (index > 7) \ > *valid = 0;

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

2010 Aug 31

[LLVMdev] "equivalent" .ll files diverge after optimizations are applied

Hi, I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind. Below I provide the optimized asm that is produced from

similar to: [LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.