thr3ads.net - similar to: "idct8x8 C version in libtheora1.1 release"

2005 Jul 20

1

MMX IDCT for theora-exp

Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with

[PATCH] remove some FZIGZAG

2005 Aug 20

0

[PATCH] remove some FZIGZAG

Hello, As we discussed with derf some time ago, it seems it is not neccessary to enforce "forward" order of dct_coeffs. This patch gains .99366902855226196000% so approx 1% speedup. Meausurement method: time nice -n -19 ./dump /mnt/disc4/theora/unix/gripen.ogg > /dev/null Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset

Bug in reference idct.

2008 Mar 07

1

Bug in reference idct.

Hi The Theora specification states, in section 7.9.3 ("The 1D Inverse DCT") steps 14-16: 14. Assign T[5] the value T[4] - T[5]. 15. Truncate T[5] to a 16-bit representation by dropping any higher-order bits. 16. Assign T[5] the value C4 * (-T[5]) >> 16. However, the relevant section of code in the reference decoder (lib/dec/idct.c line 50) is:

16 bits, cast on idct function

2006 May 30

2

16 bits, cast on idct function

Hi all, Just a stupid question The IDctSlow function on file idct.c has this line : ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0); The ip[0] , _Gd and _Cd are of type ogg_int32_t My question is: The result of (_Gd + _Cd) can be a number with more than 16 bits ? (yes, it can be because they are int32, but the algorithm could guarantee something about that... I dont know...) If

MMX patch to speed up Theora decoding

2007 Mar 25

3

MMX patch to speed up Theora decoding

Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in

idct/fdct.c function calls

2011 Mar 28

1

idct/fdct.c function calls

Hi. I am trying to find calls of idct/fdct.c functions by tracing png2theora.c calls. But found only: analyze.c:oc_dct_cost2() Where and when idct/fdct/mmxidct/mmxfdct.c functions are used? Mentions of "dct" word: ==== pacify at optima-amd64:/usr/src/libtheora-1.2.0alpha1/lib$ grep dct *.c | cut -f1 -d":" | uniq -c ???? 19 analyze.c ???? 28 decode.c ???? 22 encode.c ????? 4

MMX loop filter for theora-exp

2005 Aug 17

2

MMX loop filter for theora-exp

Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask

Theora decoding problem on PowerPC

2007 Sep 26

1

Theora decoding problem on PowerPC

Hi, I'm attempting to decode Theora videos on a PowerPC running a Linux 2.6.19 kernel. The version of GCC I'm cross-compiling from is 3.4.4. The software versions I'm running are: libogg-1.1.3 libpng-1.2.20 libtheora-1.0beta1 libvorbis-1.2.0 These are all the latest I was able to download. Here's a back trace I got while running "dump_video" under

[PATCH] promised MMX patches rc1

2005 Mar 23

3

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

VP3 IDCT

2003 Mar 05

5

VP3 IDCT

Hi, Is there anything special I need to know about VP3's IDCT? I mean besides the fact that there are separate IDCTs to handle sparse coefficient matrices. Are the IDCT functions mathematically equivalent to any textbook IDCT functions? Thanks... -- -Mike Melanson --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To

What goes to Hardware ?

2006 Jul 02

5

What goes to Hardware ?

Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and

[PATCH]

2005 Mar 23

0

[PATCH]

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

Changing the IDCT spec

2005 Feb 11

1

Changing the IDCT spec

So, in preparation for some decoder optimization work planned by Rudolf Marek, the subject of the size of the registers needed in the IDCT came up. The current spec language ensures that the result is exactly compatible with the C code for VP3. This language requires that some of the arguments to the multiplies be 17 or 18 bits, because they need to hold the sum or difference of two 16-bit

theorarm build

2010 Jul 24

2

theorarm build

Hi all-- I tried building the ARM-optimized theora codec from the theorarm- merge-branch, and encountered the following compile and runtime problems before getting something to run. If there is another way to build it, it would be nice to know, but I got the sense that its current state in svn is incomplete. I'm using a gcc cross-compiler for ARM on an x86 Linux PC. After running

Delay occurred when the makefile change

2008 Apr 10

2

Delay occurred when the makefile change

I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the

Benchmarks Inline-ASM vs. Intrinsics

2009 Feb 11

4

Benchmarks Inline-ASM vs. Intrinsics

Hi folks, FYI: I've finally made some benchmarks for inline-assembler versus intrinsic based mmx code. I've just applied the changes to the fragment reconstruction functions as writing the IDCT and loopfilter have not been ported yet. Nevertheless here are some numbers: As a baseline I'll take the current version from the trunk with all inline assembler functions enabled. Lower

VC6 Patch

2007 Oct 09

1

VC6 Patch

Here is a patch that gets the theora_static.dsp project for VC6 building again. Aaron -------------- next part -------------- Index: win32/theora_static.dsp =================================================================== --- win32/theora_static.dsp (revision 13945) +++ win32/theora_static.dsp (working copy) @@ -41,7 +41,7 @@ # PROP Intermediate_Dir "Static_Release" # PROP

theora MMX decoder

2002 Aug 03

7

theora MMX decoder

I try to merge VP3's mmx decoder into theora. http://kyoto.cool.ne.jp/vp3/developers/theora-alpha3-MMXd-src.zip You can see the change by searching the keyword "_UsingMMX_" in all lib folder's file. From VP3???YO vp3@go8.enjoy.ne.jp

Theora, MMX and optimisation

2005 Apr 11

2

Theora, MMX and optimisation

Hi everyone, I just landed into the theora planet, as a game programmer, I searched for a free video fomat/codec and the theora choice became obvious. However I experienced rather bad performance (at least from a game programming point of view) After a couple a profiling, I discovered, as previous discused in a post found via Google, that the bottleneck is in the ogg library. An unsane part of the

Theora got extreamly slow (Makefile.am was changed)

2008 Apr 23

1

Theora got extreamly slow (Makefile.am was changed)

I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the

similar to: idct8x8 C version in libtheora1.1 release