similar to: Bug in reference idct.

Displaying 20 results from an estimated 800 matches similar to: "Bug in reference idct."

2010 May 18
2
idct8x8 C version in libtheora1.1 release
When using the IDCT routines, the C version [ lib/idct.c: oc_idct8x8_c(ogg_int16_t _y[64],int _last_zzi)] in libtheora 1.1.1, the decoded image is garbled. Is it functionally equivalent to the MMX optimized version [lib/x86/mmxidct.c: oc_idct8x8_mmx(ogg_int16_t _y[64],int _last_zzi)] ? I used some of the Theora video files from here: http://wiki.xiph.org/index.php/List_of_Theora_videos for
2006 May 30
2
16 bits, cast on idct function
Hi all, Just a stupid question The IDctSlow function on file idct.c has this line : ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0); The ip[0] , _Gd and _Cd are of type ogg_int32_t My question is: The result of (_Gd + _Cd) can be a number with more than 16 bits ? (yes, it can be because they are int32, but the algorithm could guarantee something about that... I dont know...) If
2005 Jul 20
1
MMX IDCT for theora-exp
Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with
2003 Mar 05
5
VP3 IDCT
Hi, Is there anything special I need to know about VP3's IDCT? I mean besides the fact that there are separate IDCTs to handle sparse coefficient matrices. Are the IDCT functions mathematically equivalent to any textbook IDCT functions? Thanks... -- -Mike Melanson --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To
2005 Feb 11
1
Changing the IDCT spec
So, in preparation for some decoder optimization work planned by Rudolf Marek, the subject of the size of the registers needed in the IDCT came up. The current spec language ensures that the result is exactly compatible with the C code for VP3. This language requires that some of the arguments to the multiplies be 17 or 18 bits, because they need to hold the sum or difference of two 16-bit
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2011 Mar 28
1
idct/fdct.c function calls
Hi. I am trying to find calls of idct/fdct.c functions by tracing png2theora.c calls. But found only: analyze.c:oc_dct_cost2() Where and when idct/fdct/mmxidct/mmxfdct.c functions are used? Mentions of "dct" word: ==== pacify at optima-amd64:/usr/src/libtheora-1.2.0alpha1/lib$ grep dct *.c | cut -f1 -d":" | uniq -c ???? 19 analyze.c ???? 28 decode.c ???? 22 encode.c ????? 4
2007 Mar 25
3
MMX patch to speed up Theora decoding
Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in
2005 Aug 20
0
[PATCH] remove some FZIGZAG
Hello, As we discussed with derf some time ago, it seems it is not neccessary to enforce "forward" order of dct_coeffs. This patch gains .99366902855226196000% so approx 1% speedup. Meausurement method: time nice -n -19 ./dump /mnt/disc4/theora/unix/gripen.ogg > /dev/null Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset
2004 Oct 22
5
theora-mmx_on_win32?
Hi. Has anyone tried http://svn.xiph.org/branches/theora-mmx this code on Win32 ? I can compile it with very small modification, 304c304 < ogg_int16_t *const temp= (ogg_int16_t*)align_tmp; --- > ogg_int16_t *const temp= (int16_t*)align_tmp; but outputs seem terribly broken. -> ex. http://mycomputer.cc/temp/mmx-out.ogg GCC version is 3.4.2. $ gcc --version gcc.exe (GCC) 3.4.2
2005 Mar 23
0
[PATCH]
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2007 Oct 09
1
VC6 Patch
Here is a patch that gets the theora_static.dsp project for VC6 building again. Aaron -------------- next part -------------- Index: win32/theora_static.dsp =================================================================== --- win32/theora_static.dsp (revision 13945) +++ win32/theora_static.dsp (working copy) @@ -41,7 +41,7 @@ # PROP Intermediate_Dir "Static_Release" # PROP
2006 Jul 02
5
What goes to Hardware ?
Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and
2002 Dec 10
2
mingw compiling problem for libogg
(i hope this is correct m.list) Hi, there is a small compiling problem for mingw when compiling on libogg.. in include/ogg/os_types.h : ogg_int64_t, ogg_int32_t, etc are defined correctly on cygwin and MSVC/Borland but not on mingw... i have attached a patch that will fix this problem (i hope it attaches correctly) thx, Nehal --- os_types.h.old Fri Jul 19 02:25:52 2002 +++ os_types.h Tue
2011 Apr 05
0
quantize after fdct, _dequant table, and idct
1) What are you doing "mathematically" in a procedure x86/x86enquant: oc_enc_quantize_sse2()? This - the assembler code, and I do not understand mathematically - that's going on there. --- A: 120 121 28 73 -20 -99 -98 -100 123 122 112 108 73 -32 -102 -98 123 123 117 121 100
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed. attached the updated patch to apply to svn/trunk. j -------------- next part -------------- A non-text attachment was scrubbed... Name: theora-mmx.patch.gz Type: application/x-gzip Size: 8648 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2008 Apr 10
2
Delay occurred when the makefile change
I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the
2009 Feb 11
4
Benchmarks Inline-ASM vs. Intrinsics
Hi folks, FYI: I've finally made some benchmarks for inline-assembler versus intrinsic based mmx code. I've just applied the changes to the fragment reconstruction functions as writing the IDCT and loopfilter have not been ported yet. Nevertheless here are some numbers: As a baseline I'll take the current version from the trunk with all inline assembler functions enabled. Lower
2008 Apr 23
1
Theora got extreamly slow (Makefile.am was changed)
I have tried to add a plunging to the "libtheora-1.0beta2" (network bandwidth measuring component was added) and Got it success for some far now the problem is when it is added the encoding process get extremely slow (around 20 seconds delay). I think that the problem is with my modified Makefile (some flag may have missed). the following is my modified Makefile.am which is in the
2006 Jun 05
0
Idct - fpga - improved
Good news, Working with synchrounous RAM (fpga internal SRAM blocks) the area usage drop from 20% to 5% of Logic Cells. And the clock frequency from 30 Mhz to 90 Mhz. Now I'm improving the latency of samples (number of clock cycles needed to decode a data sample). Report: -------------- Fitter Status : Successful - Mon Jun 5 16:38:21 2006 Quartus II Version : 5.1 Build 176 10/26/2005 SJ