Displaying 20 results from an estimated 800 matches similar to: "Bug in reference idct."
2010 May 18
2
idct8x8 C version in libtheora1.1 release
When using the IDCT routines, the C version [ lib/idct.c:
oc_idct8x8_c(ogg_int16_t _y[64],int _last_zzi)] in libtheora 1.1.1, the
decoded image is garbled. Is it functionally equivalent to the MMX optimized
version [lib/x86/mmxidct.c: oc_idct8x8_mmx(ogg_int16_t _y[64],int
_last_zzi)] ?
I used some of the Theora video files from here:
http://wiki.xiph.org/index.php/List_of_Theora_videos for
2006 May 30
2
16 bits, cast on idct function
Hi all,
Just a stupid question
The IDctSlow function on file idct.c has this line :
ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0);
The ip[0] , _Gd and _Cd are of type ogg_int32_t
My question is:
The result of (_Gd + _Cd) can be a number with more than 16 bits ?
(yes, it can be because they are int32, but the algorithm could
guarantee something about that... I dont know...)
If
2005 Jul 20
1
MMX IDCT for theora-exp
Hello,
I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip.
It should work on 64bit X86 platform too.
Here is most used functions when playing video with jet aircrafts (gripen)
Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video
Encoded frame content is 720x480 with 0x0 offset
I can play this video with like 200-300 frame drops on Athlon XP 1700+
CPU load (with
2003 Mar 05
5
VP3 IDCT
Hi,
Is there anything special I need to know about VP3's IDCT? I mean
besides the fact that there are separate IDCTs to handle sparse
coefficient matrices. Are the IDCT functions mathematically equivalent to
any textbook IDCT functions?
Thanks...
--
-Mike Melanson
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To
2005 Feb 11
1
Changing the IDCT spec
So, in preparation for some decoder optimization work planned by Rudolf
Marek, the subject of the size of the registers needed in the IDCT
came up.
The current spec language ensures that the result is exactly compatible
with the C code for VP3. This language requires that some of the
arguments to the multiplies be 17 or 18 bits, because they need to hold
the sum or difference of two 16-bit
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of
2011 Mar 28
1
idct/fdct.c function calls
Hi.
I am trying to find calls of idct/fdct.c functions by tracing png2theora.c calls.
But found only:
analyze.c:oc_dct_cost2()
Where and when idct/fdct/mmxidct/mmxfdct.c functions are used?
Mentions of "dct" word:
====
pacify at optima-amd64:/usr/src/libtheora-1.2.0alpha1/lib$ grep dct *.c | cut -f1 -d":" | uniq -c
???? 19 analyze.c
???? 28 decode.c
???? 22 encode.c
????? 4
2007 Mar 25
3
MMX patch to speed up Theora decoding
Hi,
Attached is a patch against 1.0alpha7 to speed up Theora decoding. It
is about 15~20% faster in my test. It consists of following things:
* MMX loop filter based on Rudolf Marek's patch in
http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html
* MMX IDCT based on Rudolf Marek's patch in
http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html
and the code in
2005 Aug 20
0
[PATCH] remove some FZIGZAG
Hello,
As we discussed with derf some time ago, it seems it is not neccessary to enforce "forward" order of dct_coeffs.
This patch gains .99366902855226196000% so approx 1% speedup.
Meausurement method:
time nice -n -19 ./dump /mnt/disc4/theora/unix/gripen.ogg > /dev/null
Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video
Encoded frame content is 720x480 with 0x0 offset
2004 Oct 22
5
theora-mmx_on_win32?
Hi. Has anyone tried http://svn.xiph.org/branches/theora-mmx this code on Win32 ?
I can compile it with very small modification,
304c304
< ogg_int16_t *const temp= (ogg_int16_t*)align_tmp;
---
> ogg_int16_t *const temp= (int16_t*)align_tmp;
but outputs seem terribly broken. -> ex. http://mycomputer.cc/temp/mmx-out.ogg
GCC version is 3.4.2.
$ gcc --version
gcc.exe (GCC) 3.4.2
2005 Mar 23
0
[PATCH]
Hello,
Here is my first speedup patch. Like 10-11%. No IDCT yet.
Please feel free to comment my code or even better think about
improvements. :) I belive my routines are not so bad, maybe
one day they will be even more faster.
What needs to be optimized is the loop filter fuction. I have
no ideas now how to do it. It does not leave much space for parallel
stuff, copying memory from lot of
2007 Oct 09
1
VC6 Patch
Here is a patch that gets the theora_static.dsp project for VC6 building
again.
Aaron
-------------- next part --------------
Index: win32/theora_static.dsp
===================================================================
--- win32/theora_static.dsp (revision 13945)
+++ win32/theora_static.dsp (working copy)
@@ -41,7 +41,7 @@
# PROP Intermediate_Dir "Static_Release"
# PROP
2006 Jul 02
5
What goes to Hardware ?
Hi people,
As I said before: I did the IDCT to run on the FPGA.
My friends from university did the Reconstruction routines running on the FPGA.
I'm helping with the LoopFilter, and it is almost there.
(all VHDL)
I did a small profiling of the libTheora running on a Altera Stratix II device:
The processor used was the NIOS II with 8Kb of data and instruction
cache, branch prediction and
2002 Dec 10
2
mingw compiling problem for libogg
(i hope this is correct m.list)
Hi,
there is a small compiling problem for mingw
when compiling on libogg..
in include/ogg/os_types.h :
ogg_int64_t, ogg_int32_t, etc are defined
correctly on cygwin and MSVC/Borland
but not on mingw...
i have attached a patch that will fix
this problem (i hope it attaches
correctly)
thx, Nehal
--- os_types.h.old Fri Jul 19 02:25:52 2002
+++ os_types.h Tue
2011 Apr 05
0
quantize after fdct, _dequant table, and idct
1) What are you doing "mathematically" in a procedure x86/x86enquant: oc_enc_quantize_sse2()?
This - the assembler code, and I do not understand mathematically - that's going on there.
--- A:
120 121 28 73 -20 -99 -98 -100
123 122 112 108 73 -32 -102 -98
123 123 117 121 100
2004 Aug 24
5
MMX/mmxext optimisations
quite some speed improvement indeed.
attached the updated patch to apply to svn/trunk.
j
-------------- next part --------------
A non-text attachment was scrubbed...
Name: theora-mmx.patch.gz
Type: application/x-gzip
Size: 8648 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20040824/5a5f2731/theora-mmx.patch-0001.bin
2008 Apr 10
2
Delay occurred when the makefile change
I have tried to add a plunging to the "libtheora-1.0beta2" (network
bandwidth measuring component was added) and Got it success for some far
now the problem is when it is added the encoding process get extremely slow
(around 20 seconds delay).
I think that the problem is with my modified Makefile (some flag may have
missed).
the following is my modified Makefile.am which is in the
2009 Feb 11
4
Benchmarks Inline-ASM vs. Intrinsics
Hi folks, FYI:
I've finally made some benchmarks for inline-assembler versus intrinsic
based mmx code.
I've just applied the changes to the fragment reconstruction functions
as writing the IDCT and loopfilter have not been ported yet.
Nevertheless here are some numbers:
As a baseline I'll take the current version from the trunk with all
inline assembler functions enabled. Lower
2008 Apr 23
1
Theora got extreamly slow (Makefile.am was changed)
I have tried to add a plunging to the "libtheora-1.0beta2" (network
bandwidth measuring component was added) and Got it success for some far
now the problem is when it is added the encoding process get extremely slow
(around 20 seconds delay).
I think that the problem is with my modified Makefile (some flag may have
missed).
the following is my modified Makefile.am which is in the
2006 Jun 05
0
Idct - fpga - improved
Good news,
Working with synchrounous RAM (fpga internal SRAM blocks) the area
usage drop from 20% to 5% of Logic Cells.
And the clock frequency from 30 Mhz to 90 Mhz.
Now I'm improving the latency of samples (number of clock cycles
needed to decode a data sample).
Report:
--------------
Fitter Status : Successful - Mon Jun 5 16:38:21 2006
Quartus II Version : 5.1 Build 176 10/26/2005 SJ