thr3ads.net - similar to: "[Fwd: Re: libtheora MMX patch]"

Displaying 20 results from an estimated 400 matches similar to: "[Fwd: Re: libtheora MMX patch]"

Patch: fragment reconstruction MMX for GCC

2007 Dec 30

Patch: fragment reconstruction MMX for GCC

Hi again, I measured my fragment reconstructions against the compiler output from GCC and well - the new codes perform better, so I brushed up my gcc inline assembler skills and made a port. Code is here: http://torus.untergrund.net/code/mmxfrag.c All routines perform much better now. Inter2 alone got a speedup of factor 5 on Pentium-M. Athlon CPU's execute roughly 3 times faster.

Issues with Win32 MMX code

2009 Feb 03

Issues with Win32 MMX code

Hi folks. Mozilla had some issues with the MMX optimized frag_recon functions over the last days, and I was able to track the problem down. The code itself is fine, but it unfortunately it has the tendency to cause a non-deterministic compiler bug. The whole discussion is here: https://bugzilla.mozilla.org/show_bug.cgi?id=474937 After thinking about the problem I've suggested to

VC2005 MMX patch.

2007 Dec 25

VC2005 MMX patch.

Here is the patch with my changes. Most work went into the decoder. I just changed on the encoder if something was nessesary to build the library. You can find the patch here (quite big).. http://torus.untergrund.net/code/theora_mmx_vc2005.diff Please let me know if the encoder works without problems. I just did a very brief testing of it. The decoder has been tested against the test

libtheora-mmx-1.0alpha5 release

2005 Aug 25

libtheora-mmx-1.0alpha5 release

Along with libtheora-1.0alpha5 this is a release of theora-mmx. A drop in replacement that uses MMX assembly to speedup some of the most demanding routines in theora encoding/decoding. Right now it only works on 32bit x86 CPUs. Thanks to everyone whose work made this release possible! Download links: http://downloads.xiph.org/releases/theora/libtheora-mmx-1.0alpha5.tar.bz2

Benchmarks Inline-ASM vs. Intrinsics

2009 Feb 11

Benchmarks Inline-ASM vs. Intrinsics

Hi folks, FYI: I've finally made some benchmarks for inline-assembler versus intrinsic based mmx code. I've just applied the changes to the fragment reconstruction functions as writing the IDCT and loopfilter have not been ported yet. Nevertheless here are some numbers: As a baseline I'll take the current version from the trunk with all inline assembler functions enabled. Lower

Can't compile libtheora vs2010

2011 Apr 22

Can't compile libtheora vs2010

I'm getting errors like so on initial build of libtheora - 1>c1 : fatal error C1083: Cannot open source file: '..\lib\dec\x86_vc\x86stat.c': No such file or directory 1> mmxstate.c (TaskId:16) 1>c1 : fatal error C1083: Cannot open source file: '..\lib\dec\x86_vc\mmxstate.c': No such file or directory 1> mmxloopfilter.c (TaskId:16) 1>c1 : fatal error C1083:

MMX IDCT for theora-exp

2005 Jul 20

MMX IDCT for theora-exp

Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with

MMX patch to speed up Theora decoding

2007 Mar 25

MMX patch to speed up Theora decoding

Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in

Theora, MMX and optimisation

2005 Apr 11

Theora, MMX and optimisation

Hi everyone, I just landed into the theora planet, as a game programmer, I searched for a free video fomat/codec and the theora choice became obvious. However I experienced rather bad performance (at least from a game programming point of view) After a couple a profiling, I discovered, as previous discused in a post found via Google, that the bottleneck is in the ogg library. An unsane part of the

[PATCH] promised MMX patches rc1

2005 Mar 23

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

What goes to Hardware ?

2006 Jul 02

What goes to Hardware ?

Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and

MMX and extended-MMX acceleration patch for encoding

2003 May 08

MMX and extended-MMX acceleration patch for encoding

Hello, attached is a gzipped patch file to the lib/mcomp.c source file of theora (as of AnonCVS current version) that implements MMX and extended-MMX optimizations in the most frequently used functions of the encoder (as shown by gprof). This is more a proof of concept than a real request for inclusion into the source tree. My personal intent was more to look deeper into the MMX instruction set

bitpack.c odditiy

2008 Dec 16

bitpack.c odditiy

While browsing the code I came across line 83 bitpack.c: *_ret=((ret&0xFFFFFFFFUL)>>(m>>1))>>(m+1>>1); Is there any reason why this is so convoluted? Maybe endianess or 64 bit issues? If I'm not mistaken it does exactly the the same as: *_ret = ret >> m; Cheers, Nils

configure option --with-ogg broken?

2008 Dec 18

configure option --with-ogg broken?

Hi there. I try to cross-compile the theora libraries to test my ARMv6 optimizations (almost done, time to do some benchmarking and testing). While doing so I found out that --with-ogg seems to do nothing. If I simply run: .configure --with-ogg=$HOME The build succeeds also it shouldn't (I don't have ogg installed at $HOME). As an example here is the last line that make executes.

MMX version of Theora

2010 Jul 20

MMX version of Theora

Hi all, I am trying to build the mmx version of the theora and the encoderwin is throwing the following errors. 1>------ Build started: Project: encoderwin, Configuration: Debug Win32 ------ 1>Linking... 1> Creating library encoderwin.lib and object encoderwin.exp 1>LINK : warning LNK4098: defaultlib 'LIBCMTD' conflicts with use of other libs; use /NODEFAULTLIB:library

Optimizing on AMD Geode (MMX, no SSE)

2015 Jan 07

Optimizing on AMD Geode (MMX, no SSE)

I'm trying to improve Opus on an AMD Geode CPU, which has limited SSE support (called 3DNow!), but MMX. Without optimizations I can only encode 16 bit audio @16KHz with complexity up to 2-3 without underruns. I tried compiling with SSE2/4 optimizations, but all I got was a crash with SIGILL, so I looked into optimized code and found that a good starting point was the dot product, so I

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

Might you instead consider just adding a -disable-mmx option? Preston On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get

[LLVMdev] changing -mattr behavior with mmx and sse

2008 Nov 20

[LLVMdev] changing -mattr behavior with mmx and sse

On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx

[LLVMdev] Implementing MMX and SSE shifts

2009 Mar 19

[LLVMdev] Implementing MMX and SSE shifts

Hi all, Recently some great work has been done to implement vector shifts as described in the language reference, and I'd like to contribute by attempting to match these operations on x86 to MMX and SSE instructions whenever possible. I'm experienced in writing MMX and SSE assembly but I'm unfamiliar with how LLVM performs instruction selection. So every bit of information to

[LLVMdev] LLVM 2.8 and MMX

2010 Sep 07

[LLVMdev] LLVM 2.8 and MMX

Hi all, I've tested a recent revision and noticed that using 64-bit vectors became very slow. It looks like they are expanded to non-MMX instructions to avoid breaking code which does not clear the MMX state using emms? For my project I'm already manually inserting emms instructions in the right places, so I'd really like 64-bit vector operations to be lowered to MMX

similar to: [Fwd: Re: libtheora MMX patch]