Hi all, I've patched theora-mmx to build on x86_64. The patch against SVN is attached. Basically all I did was to copy lib/i386 to lib/x86_64 and tweak the assembler code a bit: * added to each file: typedef unsigned long int ogg_uint64_t * converted all asm inputs to 64-bit in: dsp_mmxext.c, fdct_mmx.c, recon_mmx.c * left all asm outputs at 32-bit * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus there's no need for the old mmx version I also made minor modifications to cpu.c to get CPUID working correctly on x86_64. I've tested the patch using dump_video and encoder_example on a short 16-second clip which I grabbed off Google Videos. It's a screenshot from some 80s video game, and I've posted it at http://tonquil.homeip.net:888/~dlenski/TeeterTortureVerySho.ogg if anyone wants to compare results. Decoding with either the baseline C dsp routines or my version of mmxext gives *identical* md5sums. Encoding doesn't give the same md5sum (derf_ on IRC told me this was okay), but the video looks identical to the original as far as I can tell. Either way, things seem to run about >2X as fast (on my 2.2 ghz Athlon 64 w/512 kB L2 cache). A couple things that need improvement: * I don't know automake/autoconf that well, so I don't know how to make it automatically choose i386/x86_64 in the Makefile. I did make it so that cpu.c will choose the appropriate cpuid routine based on #if defined(__x86_64__) * Lots of 32-bit integers still get passed back and forth to the DSP routines, which is of course inefficient on a 64-bit system. * I can't figure out how to build a shared lib. I get errors about un-relocatable symbols. Maybe some shared lib guru can help me with this. Please let me know if you get this patch working! Any suggestions/questions/hate mail would be appreciated. Dan Lenski On 3/30/06, Stefan de Konink <skinkie@xs4all.nl> wrote:> Dan Lenski wrote: > > I've googled reports of theora-mmx on AMD64... and have found some > > hazy results suggesting that people have got theora-mmx working on > > AMD64. Can anyone give me any tips on getting it to work? Thanks a > > lot, > > From what I understand of it, the idea is to make a linux32 bit chroot > envirionment and compile theora-mmx in there. So you actually are in > 32bit mode and mmx will just work fine :) > > > Stefan > > _______________________________________________ > Theora-dev mailing list > Theora-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/theora-dev >-------------- next part -------------- A non-text attachment was scrubbed... Name: drl_x86_64.diff.gz Type: application/x-gzip Size: 7749 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20060503/e24f3a76/drl_x86_64.diff.bin
On Wed, May 03, 2006 at 12:48:45AM -0400, Dan Lenski wrote:> Hi all, I've patched theora-mmx to build on x86_64. The patch against > SVN is attached.Excellent, thanks for doing this! Some comments inline.> Basically all I did was to copy lib/i386 to lib/x86_64 and tweak the > assembler code a bit: > * added to each file: typedef unsigned long int ogg_uint64_tThis is reasonable, since the file will only be compiled with gcc on x86_64. But really we should make libogg provide this type if people are wanting it.> * converted all asm inputs to 64-bit in: dsp_mmxext.c, fdct_mmx.c, > recon_mmx.c > * left all asm outputs at 32-bit > * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus > there's no need for the old mmx versionSure.> A couple things that need improvement: > * I don't know automake/autoconf that well, so I don't know how to > make it automatically choose i386/x86_64 in the Makefile. I did make > it so that cpu.c will choose the appropriate cpuid routine based on > #if defined(__x86_64__)Right. Unfortunately there needs to be more of this before I can apply since it breaks the x86_32 build as is. I can make the configure script tell the makefile which directory to compile, but for the sake of those using other build systems, it would really be better if it was always safe to compile both sets on any arch and have the inappropriate code #ifdef'd out. It may also make sense to add an arch to the asm functions to distinguish the two sets. You might also look at what ruik did in the theora-exp branch. He kept unified source files and just used a few conditionals to make the same code work on both. Perhaps that's less helpful if you're intending to rewrite everything to use 64 bit integers, but see for examples http://svn.xiph.org/experimental/derf/theora-exp/lib/x86/ Anyway, thanks for getting the ball rolling here, this has been an oft-requested feature. -r
On Wed, 2006-05-03 at 00:48 -0400, Dan Lenski wrote:> * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus > there's no need for the old mmx versionthere might be no need for the old versions if there is one in mmxext, looking at dsp_i386_mmx_init and dsp_i386_mmxext_init this is not the case for funcs->restore_fpu = restore_fpu; funcs->sub8x8 = sub8x8__mmx; funcs->sub8x8_128 = sub8x8_128__mmx; funcs->sub8x8avg2 = sub8x8avg2__mmx; funcs->intra8x8_err = intra8x8_err__mmx; funcs->inter8x8_err = inter8x8_err__mmx; j
On 5/3/06, j@kein.org <j@kein.org> wrote:> On Wed, 2006-05-03 at 00:48 -0400, Dan Lenski wrote: > > * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus > > there's no need for the old mmx version > there might be no need for the old versions if there is one in mmxext, > looking at dsp_i386_mmx_init and dsp_i386_mmxext_init this is not the > case for > funcs->restore_fpu = restore_fpu; > funcs->sub8x8 = sub8x8__mmx; > funcs->sub8x8_128 = sub8x8_128__mmx; > funcs->sub8x8avg2 = sub8x8avg2__mmx; > funcs->intra8x8_err = intra8x8_err__mmx; > funcs->inter8x8_err = inter8x8_err__mmx;I think the old mmx code is needed for x86_32 processors such as Pentium MMX, P2, K62, K63 etc., where there is MMX support but no SSE support. Dan
On 5/3/06, Michael Smith <msmith@xiph.org> wrote:> > > > I had the exact same problem. I found this on Usenet: > > http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/bbef7633760b5472/a28c7c5bfb46c85f%23a28c7c5bfb46c85f > > It seems that the deal is that PIC and non-PIC code don't play nice on > > x86_64, but you can get away w/it on x86_32 (I need easier > > abbreviations!!!) so lots of libs have this problem. > > The problem is the global variables in a few files (e.g. V128 in > recon_mmx.c), I'm pretty sure. > > Unfortunately, I don't know what the solution is. Possibly loading the > constants into local variables (in the surrounding C code) and only > using the locals would be a workable solution; there might be better > ones.Thanks for explaining that. Fortunately there's a pretty easy fix: we can load 64-bit immediates into general purpose registers on x86_64, so I'm just going to clobber %rax and use it to transfer immediates into the %mm regs. Dan
On 5/4/06, Dan Lenski <dlenski@gmail.com> wrote:> > Thanks for explaining that. Fortunately there's a pretty easy fix: we > can load 64-bit immediates into general purpose registers on x86_64, > so I'm just going to clobber %rax and use it to transfer immediates > into the %mm regs.Great! Though I was able to help out a bit with an explanation, I can neither program in assembly, nor (since I don't have one) can I test x86-64 - so I'm glad it was enough information for you to fix things up! It's excellent to see some forward progress on this code once again. Are you interested in doing ongoing work with theora (more optimisations, or other)? If you are, we should set you up with an SVN account. I've committed your patch. Mike
For those not following along, Dan's patch is in svn now. Dan, a couple of things I noticed: You've got an _i386_ infix in your DspFunctions initializer, but the 32-bit versions don't. We should probably fix both of those to have a more appropriate marker. x86_32 and x86_64 seem approprate to me. That ok with you? None of your initializers are actually getting called from the source, so your work isn't being used in the svn version. Did we forget a commit? -r