similar to: idct8x8 C version in libtheora1.1 release

Displaying 20 results from an estimated 300 matches similar to: "idct8x8 C version in libtheora1.1 release"

2005 Jul 20
1
MMX IDCT for theora-exp
Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with
2005 Aug 20
0
[PATCH] remove some FZIGZAG
Hello, As we discussed with derf some time ago, it seems it is not neccessary to enforce "forward" order of dct_coeffs. This patch gains .99366902855226196000% so approx 1% speedup. Meausurement method: time nice -n -19 ./dump /mnt/disc4/theora/unix/gripen.ogg > /dev/null Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset
2008 Mar 07
1
Bug in reference idct.
Hi The Theora specification states, in section 7.9.3 ("The 1D Inverse DCT") steps 14-16: 14. Assign T[5] the value T[4] - T[5]. 15. Truncate T[5] to a 16-bit representation by dropping any higher-order bits. 16. Assign T[5] the value C4 * (-T[5]) >> 16. However, the relevant section of code in the reference decoder (lib/dec/idct.c line 50) is:
2006 May 30
2
16 bits, cast on idct function
Hi all, Just a stupid question The IDctSlow function on file idct.c has this line : ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0); The ip[0] , _Gd and _Cd are of type ogg_int32_t My question is: The result of (_Gd + _Cd) can be a number with more than 16 bits ? (yes, it can be because they are int32, but the algorithm could guarantee something about that... I dont know...) If
2007 Mar 25
3
MMX patch to speed up Theora decoding
Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in
2011 Mar 28
1
idct/fdct.c function calls
Hi. I am trying to find calls of idct/fdct.c functions by tracing png2theora.c calls. But found only: analyze.c:oc_dct_cost2() Where and when idct/fdct/mmxidct/mmxfdct.c functions are used? Mentions of "dct" word: ==== pacify at optima-amd64:/usr/src/libtheora-1.2.0alpha1/lib$ grep dct *.c | cut -f1 -d":" | uniq -c ???? 19 analyze.c ???? 28 decode.c ???? 22 encode.c ????? 4
2005 Aug 17
2
MMX loop filter for theora-exp
Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
2007 Sep 26
1
Theora decoding problem on PowerPC
Hi, I'm attempting to decode Theora videos on a PowerPC running a Linux 2.6.19 kernel. The version of GCC I'm cross-compiling from is 3.4.4. The software versions I'm running are: libogg-1.1.3 libpng-1.2.20 libtheora-1.0beta1 libvorbis-1.2.0 These are all the latest I was able to download. Here's a back trace I got while running "dump_video" under
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2003 Mar 05
5
VP3 IDCT
Hi, Is there anything special I need to know about VP3's IDCT? I mean besides the fact that there are separate IDCTs to handle sparse coefficient matrices. Are the IDCT functions mathematically equivalent to any textbook IDCT functions? Thanks... -- -Mike Melanson --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To
2006 Jul 02
5
What goes to Hardware ?
Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and
2005 Mar 23
0
[PATCH]
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2005 Feb 11
1
Changing the IDCT spec
So, in preparation for some decoder optimization work planned by Rudolf Marek, the subject of the size of the registers needed in the IDCT came up. The current spec language ensures that the result is exactly compatible with the C code for VP3. This language requires that some of the arguments to the multiplies be 17 or 18 bits, because they need to hold the sum or difference of two 16-bit
2010 Jul 24
2
theorarm build
Hi all-- I tried building the ARM-optimized theora codec from the theorarm- merge-branch, and encountered the following compile and runtime problems before getting something to run. If there is another way to build it, it would be nice to know, but I got the sense that its current state in svn is incomplete. I'm using a gcc cross-compiler for ARM on an x86 Linux PC. After running
2008 Apr 10
2
Delay occurred when the makefile change
I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the
2009 Feb 11
4
Benchmarks Inline-ASM vs. Intrinsics
Hi folks, FYI: I've finally made some benchmarks for inline-assembler versus intrinsic based mmx code. I've just applied the changes to the fragment reconstruction functions as writing the IDCT and loopfilter have not been ported yet. Nevertheless here are some numbers: As a baseline I'll take the current version from the trunk with all inline assembler functions enabled. Lower
2007 Oct 09
1
VC6 Patch
Here is a patch that gets the theora_static.dsp project for VC6 building again. Aaron -------------- next part -------------- Index: win32/theora_static.dsp =================================================================== --- win32/theora_static.dsp (revision 13945) +++ win32/theora_static.dsp (working copy) @@ -41,7 +41,7 @@ # PROP Intermediate_Dir "Static_Release" # PROP
2002 Aug 03
7
theora MMX decoder
I try to merge VP3's mmx decoder into theora. http://kyoto.cool.ne.jp/vp3/developers/theora-alpha3-MMXd-src.zip You can see the change by searching the keyword "_UsingMMX_" in all lib folder's file. From VP3???YO vp3@go8.enjoy.ne.jp
2005 Apr 11
2
Theora, MMX and optimisation
Hi everyone, I just landed into the theora planet, as a game programmer, I searched for a free video fomat/codec and the theora choice became obvious. However I experienced rather bad performance (at least from a game programming point of view) After a couple a profiling, I discovered, as previous discused in a post found via Google, that the bottleneck is in the ogg library. An unsane part of the
2008 Apr 23
1
Theora got extreamly slow (Makefile.am was changed)
I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the