Venkatesh Srinivas
2010-Jun-29 20:51 UTC
[theora-dev] [PATCH]: PPC/Altivec implementations of SAD and SSD
Hi, This patch adds Altivec-optimized implementations of oc_enc_frag_sad and oc_enc_frag_ssd. This patch is against the latest svn revision of theora-ptalarbvorm. Speeds up encode on a plant stop-motion clip on a 1 GHz PPC 7447 by ~3%, timewise. Time spent in oc_enc_frag_sad is reduced from 4.2% to 2.3% and oc_enc_frag_ssd from 1.2% to 1.0%, as reported by Shark. Currently this is only integrated into the Xcode build project and need support for detecting Altivec on platforms other than OS X; on OS X it uses the sysctl hw.vectorunit. Thanks, -- vs -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/theora-dev/attachments/20100629/f0f421d9/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: ppcenc-draft-0.diff Type: application/octet-stream Size: 20100 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20100629/f0f421d9/attachment-0001.obj
Timothy B. Terriberry
2010-Jun-30 05:01 UTC
[theora-dev] [PATCH]: PPC/Altivec implementations of SAD and SSD
Venkatesh Srinivas wrote:> This patch adds Altivec-optimized implementations of oc_enc_frag_sad and > oc_enc_frag_ssd. This patch is against the latest svn revision of > theora-ptalarbvorm.Just some comments from a quick review: Please follow the x86 structure and create a separate ppcint.h to be shared by the decoder and the encoder and a ppcenc.h for the encoder-specific functionality. All file-scope symbols should be prefixed with oc_ (for functions) or OC_ for (constants and #defines). have_altivec() (after being properly prefixed) should be in a "ppccpu.c" and should cache its result in oc_theora_state.cpu_flags, with an appropriate OC_CPU_PPC_ALTIVEC flag defined in cpu.h. You don't need to model yourself exactly after the x86 cpu.c here (which doesn't even live in lib/x86!), as it still hasn't been properly cleaned up from when it the encoder and decoder were in separate source trees. I'll fix that shortly.> Currently this is only integrated into the Xcode build project and need > support for detecting Altivec on platforms other than OS X; on OS X it > uses the sysctl hw.vectorunit.There is more complete detection code, for example, in Orc, that covers at least OS X, Linux, and the BSD's: http://code.entropywave.com/projects/orc/ See orccpu-powerpc.c This code is BSD licensed, so there should be no problem adapting it to libtheora (with proper attribution, of course). Note there may be problems with the SIGILL fallback method it uses: https://bugzilla.redhat.com/show_bug.cgi?id=435771 But that's okay, because AFAICT that method wouldn't compile if it was enabled, since orc_fault_check_enable(), etc., aren't actually defined anywhere that I can see.