Jack, I got the asm we were working last night committed ont he beta 3 branch. It's in use in both vorbisfile:ov_read() and the lsp lookup (when using float lookups) I'd like to have a patch of the other things you did in the past day or so, even if you don't want to commit yet. Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jack Moffitt
2000-Oct-19 11:43 UTC
[vorbis-dev] casting/rounding ASM committed to beta3 branch
> Jack, I got the asm we were working last night committed ont he beta 3 branch. > It's in use in both vorbisfile:ov_read() and the lsp lookup (when using float > lookups)I just did a checkout of prebranch_beta3 and unless I did something wrong, vorbisfile.c does NOT have the float to int assembly optimizations. I still has (int)f stuff.> I'd like to have a patch of the other things you did in the past day or so, even if you don't want to commit yet.Here's the patch from a fresh checkout of branch_prebeta3. It includes my own assembly stuff, so you'll have to hand apply (you always do anyway). Those of you on the rest of the list might like to take a peek too. If I can help make this much of a difference in speed in such little time, I'm wondering what others on this list can do who are better at optimization than me :) There's basically 3 things going on: 1) float to int of gcc is nasty. we inlined it with one 'fistp'. (HUGE speedup) (there's two places this is critical for speed. float -> int conversion in vorbisfile.c for the sample output, and in lookup.c for the cosine,squareroot,etc lookups for lsp_to_curve). 2) lsp.c had two for loops that i took to one do/while while also chaning array indexing to constant offsets and pointer math. (this was a HUGE speedup) 3) I unrolled a bit a loop in mdct_kernel this is also a pretty good speedup. This was my first time with loop unrolling. I would appreciate it if someone who knows more than me could look at it and see if it can be done even better. (Andrew of the Sonique team helped with the lsp.c loop) mdct_kernel is now much slower than anything else in the code (it used to be a 3 way tie between the float->int sample conversion, mdct_kernel and lsp_to_curve). jack. <HR NOSHADE> <UL> <LI>text/plain attachment: jack_opt.patch </UL> -------------- next part -------------- A non-text attachment was scrubbed... Name: jack_opt.patch Type: application/octet-stream Size: 13852 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20001019/0dc599f6/jack_opt-0001.obj
Possibly Parallel Threads
- FLOAT_LOOKUP version of lsp_to_curve
- New LSP code committed
- [LLVMdev] Questions about attaching DWARF source code debugging information to generated LLVM-IR.
- [LLVMdev] Questions about attaching DWARF source code debugging information to generated LLVM-IR.
- Why LSP?