similar to: [Fwd: Re: libtheora MMX patch]

Displaying 20 results from an estimated 400 matches similar to: "[Fwd: Re: libtheora MMX patch]"

2007 Dec 30
2
Patch: fragment reconstruction MMX for GCC
Hi again, I measured my fragment reconstructions against the compiler output from GCC and well - the new codes perform better, so I brushed up my gcc inline assembler skills and made a port. Code is here: http://torus.untergrund.net/code/mmxfrag.c All routines perform much better now. Inter2 alone got a speedup of factor 5 on Pentium-M. Athlon CPU's execute roughly 3 times faster.
2009 Feb 03
3
Issues with Win32 MMX code
Hi folks. Mozilla had some issues with the MMX optimized frag_recon functions over the last days, and I was able to track the problem down. The code itself is fine, but it unfortunately it has the tendency to cause a non-deterministic compiler bug. The whole discussion is here: https://bugzilla.mozilla.org/show_bug.cgi?id=474937 After thinking about the problem I've suggested to
2007 Dec 25
2
VC2005 MMX patch.
Here is the patch with my changes. Most work went into the decoder. I just changed on the encoder if something was nessesary to build the library. You can find the patch here (quite big).. http://torus.untergrund.net/code/theora_mmx_vc2005.diff Please let me know if the encoder works without problems. I just did a very brief testing of it. The decoder has been tested against the test
2005 Aug 25
0
libtheora-mmx-1.0alpha5 release
Along with libtheora-1.0alpha5 this is a release of theora-mmx. A drop in replacement that uses MMX assembly to speedup some of the most demanding routines in theora encoding/decoding. Right now it only works on 32bit x86 CPUs. Thanks to everyone whose work made this release possible! Download links: http://downloads.xiph.org/releases/theora/libtheora-mmx-1.0alpha5.tar.bz2
2009 Feb 11
4
Benchmarks Inline-ASM vs. Intrinsics
Hi folks, FYI: I've finally made some benchmarks for inline-assembler versus intrinsic based mmx code. I've just applied the changes to the fragment reconstruction functions as writing the IDCT and loopfilter have not been ported yet. Nevertheless here are some numbers: As a baseline I'll take the current version from the trunk with all inline assembler functions enabled. Lower
2011 Apr 22
2
Can't compile libtheora vs2010
I'm getting errors like so on initial build of libtheora - 1>c1 : fatal error C1083: Cannot open source file: '..\lib\dec\x86_vc\x86stat.c': No such file or directory 1> mmxstate.c (TaskId:16) 1>c1 : fatal error C1083: Cannot open source file: '..\lib\dec\x86_vc\mmxstate.c': No such file or directory 1> mmxloopfilter.c (TaskId:16) 1>c1 : fatal error C1083:
2005 Jul 20
1
MMX IDCT for theora-exp
Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with
2007 Mar 25
3
MMX patch to speed up Theora decoding
Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in
2005 Apr 11
2
Theora, MMX and optimisation
Hi everyone, I just landed into the theora planet, as a game programmer, I searched for a free video fomat/codec and the theora choice became obvious. However I experienced rather bad performance (at least from a game programming point of view) After a couple a profiling, I discovered, as previous discused in a post found via Google, that the bottleneck is in the ogg library. An unsane part of the
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2006 Jul 02
5
What goes to Hardware ?
Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and
2003 May 08
3
MMX and extended-MMX acceleration patch for encoding
Hello, attached is a gzipped patch file to the lib/mcomp.c source file of theora (as of AnonCVS current version) that implements MMX and extended-MMX optimizations in the most frequently used functions of the encoder (as shown by gprof). This is more a proof of concept than a real request for inclusion into the source tree. My personal intent was more to look deeper into the MMX instruction set
2008 Dec 16
1
bitpack.c odditiy
While browsing the code I came across line 83 bitpack.c: *_ret=((ret&0xFFFFFFFFUL)>>(m>>1))>>(m+1>>1); Is there any reason why this is so convoluted? Maybe endianess or 64 bit issues? If I'm not mistaken it does exactly the the same as: *_ret = ret >> m; Cheers, Nils
2008 Dec 18
1
configure option --with-ogg broken?
Hi there. I try to cross-compile the theora libraries to test my ARMv6 optimizations (almost done, time to do some benchmarking and testing). While doing so I found out that --with-ogg seems to do nothing. If I simply run: .configure --with-ogg=$HOME The build succeeds also it shouldn't (I don't have ogg installed at $HOME). As an example here is the last line that make executes.
2010 Jul 20
0
MMX version of Theora
Hi all, I am trying to build the mmx version of the theora and the encoderwin is throwing the following errors. 1>------ Build started: Project: encoderwin, Configuration: Debug Win32 ------ 1>Linking... 1> Creating library encoderwin.lib and object encoderwin.exp 1>LINK : warning LNK4098: defaultlib 'LIBCMTD' conflicts with use of other libs; use /NODEFAULTLIB:library
2015 Jan 07
1
Optimizing on AMD Geode (MMX, no SSE)
I'm trying to improve Opus on an AMD Geode CPU, which has limited SSE support (called 3DNow!), but MMX. Without optimizations I can only encode 16 bit audio @16KHz with complexity up to 2-3 without underruns. I tried compiling with SSE2/4 optimizations, but all I got was a crash with SIGILL, so I looked into optimized code and found that a good starting point was the dot product, so I
2008 Nov 20
0
[LLVMdev] changing -mattr behavior with mmx and sse
Might you instead consider just adding a -disable-mmx option? Preston On Thu, 2008-20-11 at 02:57 -0500, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get
2008 Nov 20
0
[LLVMdev] changing -mattr behavior with mmx and sse
On Nov 19, 2008, at 11:57 PMPST, Mon Ping Wang wrote: > Hi, > > When setting -mattr option on X86, I would like to treat MMX > separately from SSE levels. This would allow a client who sets the > attributes directly to set the SSE level independent of MMX, e.g., llc > -march=x86 -mattr=sse41, one would get sse4.1 with mmx disabled while > llc -march=x86 -mattr=mmx
2009 Mar 19
1
[LLVMdev] Implementing MMX and SSE shifts
Hi all, Recently some great work has been done to implement vector shifts as described in the language reference, and I'd like to contribute by attempting to match these operations on x86 to MMX and SSE instructions whenever possible. I'm experienced in writing MMX and SSE assembly but I'm unfamiliar with how LLVM performs instruction selection. So every bit of information to
2010 Sep 07
0
[LLVMdev] LLVM 2.8 and MMX
Hi all, I've tested a recent revision and noticed that using 64-bit vectors became very slow. It looks like they are expanded to non-MMX instructions to avoid breaking code which does not clear the MMX state using emms? For my project I'm already manually inserting emms instructions in the right places, so I'd really like 64-bit vector operations to be lowered to MMX