Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in http://svn.xiph.org/trunk/vp32/CoreLibs/CDXV/Vp31/dx/win32/ * change FiltBoundingValue to ogg_int16_t and reduce it to 256 entries. (It is safe if I read the spec correctly) * comment out unused idct_short__c, remove unused LoopFilterLimitValuesV2 An --enable-mmx option is added to configure (you need to run autogen.sh to enable it after patch). Compile time switch is used instead of a runtime one because it seems easier to me :) Regards, Chih-Chung Chang -------------- next part -------------- A non-text attachment was scrubbed... Name: theora.patch.20070326.gz Type: application/x-gzip Size: 14900 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20070326/5fc3c925/theora.patch.20070326.bin
Hi, great news. a faster theora decoder is always welcome.> An --enable-mmx option is added to configure (you need to run > autogen.sh to enable it after patch). Compile time switch is used > instead of a runtime one because it seems easier to me :)why another check and condition, USE_ASM should be the one used and runtime detection of mmx code is the only sensible way. right now it does not compile on Mac OS X: ld: .libs/libtheora_la-dct_decode.o has external relocation entries in non-writable section (__TEXT,__text) for symbols: V804 V3 ld: .libs/libtheora_la-idct.o has external relocation entries in non- writable section (__TEXT,__text) for symbols: idctconstants does it work on 64bit? j
On Mar 28, 2007, at 15:06:54, Chih-Chung Chang wrote:> I found access to a MacBook, so here is how I build the patched > Theora, > hope this works for you: (Using XCode 2.4.1) > > tar xzvf libtheora-1.0alpha7.tar.gz > gzip -cd theora.patch.20070327.gz | patch -p0 > cd libtheora-1.0alpha7 > ./configure --with-ogg=/Volumes/USB/build/ > vi config.h > # add #define USE_MMX > vi lib/Makefile > # add "-Wl,-read_only_relocs,suppress" just before -o in the > following line: > # LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $ > (CFLAGS) \ > # $(AM_LDFLAGS) $(LDFLAGS) -Wl,-read_only_relocs,suppress -o $@works great now, i would still prefere to only have one way of detecting and activating mmx code. i agree that for development its better to do it inline but in terms of maintainability it is better to just have one way of adding mmx/asm code, i put a patch that moves your code into x86_32 and adds the functions for the respective dsp calles at: http://people.xiph.org/~j/theora_idct_and_dctdecode_filter_mmx.patch if there are no bigger problems with the patch, i will commit it in the next days to svn. j
On 3/29/07, j@v2v.cc <j@v2v.cc> wrote:> On Mar 28, 2007, at 15:06:54, Chih-Chung Chang wrote: > > I found access to a MacBook, so here is how I build the patched > > Theora, > > hope this works for you: (Using XCode 2.4.1) > > > > tar xzvf libtheora-1.0alpha7.tar.gz > > gzip -cd theora.patch.20070327.gz | patch -p0 > > cd libtheora-1.0alpha7 > > ./configure --with-ogg=/Volumes/USB/build/ > > vi config.h > > # add #define USE_MMX > > vi lib/Makefile > > # add "-Wl,-read_only_relocs,suppress" just before -o in the > > following line: > > # LINK = $(LIBTOOL) --tag=CC --mode=link $(CCLD) $(AM_CFLAGS) $ > > (CFLAGS) \ > > # $(AM_LDFLAGS) $(LDFLAGS) -Wl,-read_only_relocs,suppress -o $@ > > works great now, i would still prefere to only have one way of detecting > and activating mmx code. i agree that for development its better to > do it > inline but in terms of maintainability it is better to just have one way > of adding mmx/asm code, i put a patch that moves your code into x86_32 > and adds the functions for the respective dsp calles at: > > http://people.xiph.org/~j/theora_idct_and_dctdecode_filter_mmx.patch > > if there are no bigger problems with the patch, > i will commit it in the next days to svn. > > jHi, Thanks a lot for cleaning up the patch! It looks good. Attached is an incremental patch which removes some unnecessary code in idct.c (now idct_mmx.c). Thanks, Chih-Chung Chang -------------- next part -------------- A non-text attachment was scrubbed... Name: idct.more Type: application/octet-stream Size: 15311 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20070329/5293cf51/idct.obj