similar to: Benchmarks Inline-ASM vs. Intrinsics

Displaying 20 results from an estimated 800 matches similar to: "Benchmarks Inline-ASM vs. Intrinsics"

2007 Dec 25
2
VC2005 MMX patch.
Here is the patch with my changes. Most work went into the decoder. I just changed on the encoder if something was nessesary to build the library. You can find the patch here (quite big).. http://torus.untergrund.net/code/theora_mmx_vc2005.diff Please let me know if the encoder works without problems. I just did a very brief testing of it. The decoder has been tested against the test
2005 Jul 20
1
MMX IDCT for theora-exp
Hello, I'm attaching IDCT MMX patch. I reused IDCT from theora-a3-MMXd.zip. It should work on 64bit X86 platform too. Here is most used functions when playing video with jet aircrafts (gripen) Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset I can play this video with like 200-300 frame drops on Athlon XP 1700+ CPU load (with
2006 Jul 02
5
What goes to Hardware ?
Hi people, As I said before: I did the IDCT to run on the FPGA. My friends from university did the Reconstruction routines running on the FPGA. I'm helping with the LoopFilter, and it is almost there. (all VHDL) I did a small profiling of the libTheora running on a Altera Stratix II device: The processor used was the NIOS II with 8Kb of data and instruction cache, branch prediction and
2007 Dec 30
2
Patch: fragment reconstruction MMX for GCC
Hi again, I measured my fragment reconstructions against the compiler output from GCC and well - the new codes perform better, so I brushed up my gcc inline assembler skills and made a port. Code is here: http://torus.untergrund.net/code/mmxfrag.c All routines perform much better now. Inter2 alone got a speedup of factor 5 on Pentium-M. Athlon CPU's execute roughly 3 times faster.
2009 Feb 03
3
Issues with Win32 MMX code
Hi folks. Mozilla had some issues with the MMX optimized frag_recon functions over the last days, and I was able to track the problem down. The code itself is fine, but it unfortunately it has the tendency to cause a non-deterministic compiler bug. The whole discussion is here: https://bugzilla.mozilla.org/show_bug.cgi?id=474937 After thinking about the problem I've suggested to
2003 Mar 05
5
VP3 IDCT
Hi, Is there anything special I need to know about VP3's IDCT? I mean besides the fact that there are separate IDCTs to handle sparse coefficient matrices. Are the IDCT functions mathematically equivalent to any textbook IDCT functions? Thanks... -- -Mike Melanson --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To
2008 Dec 16
1
bitpack.c odditiy
While browsing the code I came across line 83 bitpack.c: *_ret=((ret&0xFFFFFFFFUL)>>(m>>1))>>(m+1>>1); Is there any reason why this is so convoluted? Maybe endianess or 64 bit issues? If I'm not mistaken it does exactly the the same as: *_ret = ret >> m; Cheers, Nils
2008 Dec 18
1
configure option --with-ogg broken?
Hi there. I try to cross-compile the theora libraries to test my ARMv6 optimizations (almost done, time to do some benchmarking and testing). While doing so I found out that --with-ogg seems to do nothing. If I simply run: .configure --with-ogg=$HOME The build succeeds also it shouldn't (I don't have ogg installed at $HOME). As an example here is the last line that make executes.
2008 Feb 28
1
Multi-thread Theora Decoder
Hi all, Does Theora Community have an interest in a multi-thread decoder implementation? I'm starting to work with multi-thread and I thought that Theora Decoder is a good choice for me, because I had been working with it in a FPGA implementation and I have experience with the library. I'm thinking in working with LoopFilter at first. Do you think I could start with it or there is a
2008 Apr 09
1
[Fwd: Re: libtheora MMX patch]
Forwarding an email exchange that I had with Nils Pipenbrinck regarding the state of the MMX patch for visual studio-style assembly. I also run with the patches, and everything looks fine as far as I can tell. Is this enough for a go-ahead to put that stuff into the mainline (if it's not there already)? -------- Original Message -------- Subject: Re: libtheora MMX patch Date: Sun, 06
2007 Dec 23
1
svn access and formal things.
A couple of questions: * Do I need a named account to commit changes? How do I get such an account? I'm used to commit roughly once every 8 hours of work if I have a stable point that compiles on linux and win32. That's just my way of working. Worked well in the past to protect myself from doing stupid things. * How anal are you about line endings, tabs vs. spaces, max. line length
2005 Aug 20
0
[PATCH] remove some FZIGZAG
Hello, As we discussed with derf some time ago, it seems it is not neccessary to enforce "forward" order of dct_coeffs. This patch gains .99366902855226196000% so approx 1% speedup. Meausurement method: time nice -n -19 ./dump /mnt/disc4/theora/unix/gripen.ogg > /dev/null Ogg logical stream 310b2968 is Theora 720x480 29.97 fps video Encoded frame content is 720x480 with 0x0 offset
2005 Aug 17
2
MMX loop filter for theora-exp
Hello, I would like to announce the semi-optimized oc_state_loop_filter_frag_rows It gains like 7% speedup. Unfortunately it has some issues: 1) wont compile on 64bit (I will fix it later hopefully) 2) is not yet fully optimized (instruction stalls) Here are the results. CPU: Athlon, speed 1466.91 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
2010 May 18
2
idct8x8 C version in libtheora1.1 release
When using the IDCT routines, the C version [ lib/idct.c: oc_idct8x8_c(ogg_int16_t _y[64],int _last_zzi)] in libtheora 1.1.1, the decoded image is garbled. Is it functionally equivalent to the MMX optimized version [lib/x86/mmxidct.c: oc_idct8x8_mmx(ogg_int16_t _y[64],int _last_zzi)] ? I used some of the Theora video files from here: http://wiki.xiph.org/index.php/List_of_Theora_videos for
2008 Mar 07
1
Bug in reference idct.
Hi The Theora specification states, in section 7.9.3 ("The 1D Inverse DCT") steps 14-16: 14. Assign T[5] the value T[4] - T[5]. 15. Truncate T[5] to a 16-bit representation by dropping any higher-order bits. 16. Assign T[5] the value C4 * (-T[5]) >> 16. However, the relevant section of code in the reference decoder (lib/dec/idct.c line 50) is:
2005 Feb 11
1
Changing the IDCT spec
So, in preparation for some decoder optimization work planned by Rudolf Marek, the subject of the size of the registers needed in the IDCT came up. The current spec language ensures that the result is exactly compatible with the C code for VP3. This language requires that some of the arguments to the multiplies be 17 or 18 bits, because they need to hold the sum or difference of two 16-bit
2007 Mar 25
3
MMX patch to speed up Theora decoding
Hi, Attached is a patch against 1.0alpha7 to speed up Theora decoding. It is about 15~20% faster in my test. It consists of following things: * MMX loop filter based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-August/002838.html * MMX IDCT based on Rudolf Marek's patch in http://lists.xiph.org/pipermail/theora-dev/2005-July/002816.html and the code in
2009 Oct 13
3
Proposal for replacing asm code with intrinsics
Hi, I'm new to Theora and would like to propose several performance optimization using advanced instructions in x86 CPUs (SSE2-SSE4.2). There are several source files in \x86 and \x86_vc which developed using inline assembler. However this cause several maintenance problems: 1) Need to sync gcc & msvc versions 2) Only 32bit environment is supported 3) No support for newer than MMX
2006 May 30
2
16 bits, cast on idct function
Hi all, Just a stupid question The IDctSlow function on file idct.c has this line : ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0); The ip[0] , _Gd and _Cd are of type ogg_int32_t My question is: The result of (_Gd + _Cd) can be a number with more than 16 bits ? (yes, it can be because they are int32, but the algorithm could guarantee something about that... I dont know...) If
2011 Mar 28
1
idct/fdct.c function calls
Hi. I am trying to find calls of idct/fdct.c functions by tracing png2theora.c calls. But found only: analyze.c:oc_dct_cost2() Where and when idct/fdct/mmxidct/mmxfdct.c functions are used? Mentions of "dct" word: ==== pacify at optima-amd64:/usr/src/libtheora-1.2.0alpha1/lib$ grep dct *.c | cut -f1 -d":" | uniq -c ???? 19 analyze.c ???? 28 decode.c ???? 22 encode.c ????? 4