Displaying 20 results from an estimated 400 matches similar to: "[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected."
2011 Apr 14
0
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
On Thu, Apr 14, 2011 at 12:16 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> I finally got all of the 3DNow! instruction intrinsics and builtins
> into LLVM and Clang, however, while testing them, I've noticed that
> they produce incorrect results.
>
> For example:
>
> typedef float V2f __attribute__((vector_size(8)));
>
> int main() {
> V2f dest,
2011 Apr 14
2
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
On Apr 14, 2011, at 12:47 PM, Eli Friedman wrote:
>> I looked at the program using a debugger, and the pfadd instruction is
>> executed correctly and the MMX register contains the correct values.
>> The code that prepares the stack for the printf call seems to be
>> messing it up.
>
> I would call that "user error"; basically, using MMX instructions
>
2011 Apr 14
0
[LLVMdev] [x86 codegen] 3DNow! intrinsics not behaving as expected.
On Thu, Apr 14, 2011 at 5:37 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Apr 14, 2011, at 12:47 PM, Eli Friedman wrote:
>
>>> I looked at the program using a debugger, and the pfadd instruction is
>>> executed correctly and the MMX register contains the correct values.
>>> The code that prepares the stack for the printf call seems to be
2011 Jun 29
1
Ref Classes: bug with using '.self' within initialize methods?
Dear list,
I'm wondering if the following error I'm getting is a small bug in the
Reference Class paradigm or if it makes perfect sense.
When you write an explicit initialize method for a Ref Class, can you
then make use of '.self' WITHIN this initialize method just as you would
once an object of the class has actually been initialized?
Because it seems to me that you can not.
2005 Aug 17
2
MMX loop filter for theora-exp
Hello,
I would like to announce the semi-optimized oc_state_loop_filter_frag_rows
It gains like 7% speedup. Unfortunately it has some issues:
1) wont compile on 64bit (I will fix it later hopefully)
2) is not yet fully optimized (instruction stalls)
Here are the results.
CPU: Athlon, speed 1466.91 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
2011 Oct 26
0
[LLVMdev] Lowering to MMX
On Oct 26, 2011, at 1:18 PM, Nicolas Capens wrote:
> On 24/10/2011 9:50 PM, Bill Wendling wrote:
>> On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote:
>>
>>> Hi all,
>>>
>>> I'm working on a graphics project which uses LLVM for dynamic code
>>> generation, and I noticed a major performance regression when upgrading
>>> from LLVM
2004 Sep 10
2
An assembly optimization and fix
I have optimized FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov
function and fixed bug when data_len == 0. Now the function is about
50% faster and flac -5 is about 5% faster on my box. I have tested it
thoroughly, I think it can go to flac 1.0.4.
--
Miroslav Lichvar
-------------- next part --------------
--- src/libFLAC/ia32/fixed_asm.nasm.orig 2002-01-26 19:05:12.000000000 +0100
+++
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
On Aug 31, 2010, at 1:21 PMPDT, Argyrios Kyrtzidis wrote:
>
> Just to be clear, are you saying that the fact that, after using llc
> on the second IR, the produced asm is using MM registers, indicates
> a bug ?
Yes. It's not immediately obvious whether it's in the opt or llc,
though.
Chris was doing work involving <2 x float> and may know about this.
>
2005 Mar 23
0
[PATCH]
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of
2013 Feb 13
1
[LLVMdev] Using MSVC _ftol2 runtime function for fptoui on Win32
Hi Joe & Michael,
In rev. 151382 you have changed the fptoui implementation of the x86 codegen for win32.
Before the change fptoui was lowered to
flds 16(%esp)
fisttpll 8(%esp)
movl 8(%esp), %eax
After the change fptoui is lowered to
flds 40(%esp)
calll _ftol2
Please note that the assumption that _ftol2 doesn't modify ECX isn't true on sandybridge platform.
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of
2006 Aug 29
1
PATCH: Add fields argument to installed.packages and available.packages
Hi all,
The write_PACKAGES function has a 'fields' argument that allows a user
generating a PACKAGES file to specify additional fields to include.
For symmetry, it would be nice for the available.packages function to
be able to read those extra fields when specified.
Similarly, it would be useful for installed.packages to have a
'fields' argument. This would allow a user to
2009 Aug 30
3
experimental patch for libtheora1.1beta3
Good morning in the Lord
Regarding the port of libtheora1.1beta3 for OpenBSD for amd64 and the
problem I described at:
http://lists.xiph.org/pipermail/theora/2009-August/002640.html
Attached is a patch for
libtheora/patches/patch-lib_x86_mmxencfrag_c
I can play videos with it. ?Does it work for you?
Best regards
--
Dios, gracias por tu amor infinito.
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Using MM registers is wrong unless the user has specifically asked for
it, which doesn't seem to be the case here.
In the awesome MMX architecture, touching an MM register makes
subsequent x87 operations fail unless an EMMS instruction is issued
first; none of the compilers here are smart enough to insert EMMS
instructions in the right places, so the only safe thing is not to use
2010 Aug 31
2
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Here's the optimized versions:
$ opt -std-compile-opts unopt-pass.ll -o - | llvm-dis -o -
[...]
define %3 @_ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE(%"class.WebCore::GraphicsContext"* %this, %"struct.WebCore::FloatRect"* %rect) nounwind ssp align 2 {
%roundedOrigin = alloca %"class.WebCore::FloatSize", align 4 ;
2012 Nov 05
0
Diference in results from doBy::popMeans, multcomp::glht and contrast::contrast for a lme model
Hello R users,
I'm analyzing an experiment in a balanced incomplet block design (BIB). The
effect of blocks are assumed to be random, so I'm using nlme::lme for this.
I'm analysing another more complex experiments and I notice some diferences
from doBy::popMeans() compared multcomp::glht() and contrast::contrast().
In my example, glht() and contrast() were equal I suspect popMeans()
2011 Oct 26
2
[LLVMdev] Lowering to MMX
Hi Bill,
Comments inline:
On 24/10/2011 9:50 PM, Bill Wendling wrote:
> On Oct 20, 2011, at 8:42 AM, Nicolas Capens wrote:
>
>> Hi all,
>>
>> I'm working on a graphics project which uses LLVM for dynamic code
>> generation, and I noticed a major performance regression when upgrading
>> from LLVM 2.8 to 3.0-rc1 (LLVM 2.9 didn't support Win64 so I
2014 Dec 24
2
[LLVMdev] X86 disassembler is quite broken on handling REX
hi,
i think the current X86 disassembler is quite broken and fails badly on
handling REX for x86_64 code.
below are some examples:
$ echo "0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
-triple=x86_64
.text
por %mm3, %mm0
$ echo "0x40,0x0f,0xeb,0xc3"|./Release+Asserts/bin/llvm-mc -disassemble
-triple=x86_64
.text
por %mm3, %mm0
$ echo
2014 Dec 24
2
[LLVMdev] X86 disassembler is quite broken on handling REX
On Wed, Dec 24, 2014 at 2:43 PM, Craig Topper <craig.topper at gmail.com>
wrote:
> I believe this particular error is caused by this. That seems easy enough
> to just drop the bit. Do you have other non-mmx examples?
>
> case TYPE_MM: \
> if (index > 7) \
> *valid = 0;
2010 Aug 31
5
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
Hi,
I've attached 2 .ll files which are supposed to be equivalent but 'unopt-fail.ll' causes a crash in webkit's test suite while 'unopt-pass.ll' does not. I can't give more details about the crash, when I run the crashing test it in isolation it passes, when I run the full suite it crashes; it boggles the mind.
Below I provide the optimized asm that is produced from