thr3ads.net - theora dev - [Theora-dev] patch to build theora-mmx on AMD64 [May 2006]

If this information is useful, please help other people find it:
Share via:

Dan Lenski

2006-May-02 21:48 UTC

[Theora-dev] patch to build theora-mmx on AMD64

Hi all, I've patched theora-mmx to build on x86_64.  The patch against
SVN is attached.

Basically all I did was to copy lib/i386 to lib/x86_64 and tweak the
assembler code a bit:
* added to each file: typedef unsigned long int ogg_uint64_t
* converted all asm inputs to 64-bit in: dsp_mmxext.c,  fdct_mmx.c,  recon_mmx.c
* left all asm outputs at 32-bit
* I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus
there's no need for the old mmx version

I also made minor modifications to cpu.c to get CPUID working
correctly on x86_64.

I've tested the patch using dump_video and encoder_example on a short
16-second clip which I grabbed off Google Videos.  It's a screenshot
from some 80s video game, and I've posted it at
http://tonquil.homeip.net:888/~dlenski/TeeterTortureVerySho.ogg if
anyone wants to compare results.

Decoding with either the baseline C dsp routines or my version of
mmxext gives *identical* md5sums.  Encoding doesn't give the same
md5sum (derf_ on IRC told me this was okay), but the video looks
identical to the original as far as I can tell.  Either way, things
seem to run about >2X as fast (on my 2.2 ghz Athlon 64 w/512 kB L2
cache).

A couple things that need improvement:
* I don't know automake/autoconf that well, so I don't know how to
make it automatically choose i386/x86_64 in the Makefile.  I did make
it so that cpu.c will choose the appropriate cpuid routine based on
#if defined(__x86_64__)
* Lots of 32-bit integers still get passed back and forth to the DSP
routines, which is of course inefficient on a 64-bit system.
* I can't figure out how to build a shared lib.  I get errors about
un-relocatable symbols.  Maybe some shared lib guru can help me with
this.

Please let me know if you get this patch working!  Any
suggestions/questions/hate mail would be appreciated.

Dan Lenski

On 3/30/06, Stefan de Konink <skinkie@xs4all.nl>
wrote:> Dan Lenski wrote:
> > I've googled reports of theora-mmx on AMD64... and have found some
> > hazy results suggesting that people have got theora-mmx working on
> > AMD64.  Can anyone give me any tips on getting it to work?  Thanks a
> > lot,
>
>  From what I understand of it, the idea is to make a linux32 bit chroot
> envirionment and compile theora-mmx in there. So you actually are in
> 32bit mode and mmx will just work fine :)
>
>
> Stefan
>
> _______________________________________________
> Theora-dev mailing list
> Theora-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/theora-dev
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: drl_x86_64.diff.gz
Type: application/x-gzip
Size: 7749 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/theora-dev/attachments/20060503/e24f3a76/drl_x86_64.diff.bin

Ralph Giles

2006-May-02 22:58 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

On Wed, May 03, 2006 at 12:48:45AM -0400, Dan Lenski wrote:
> Hi all, I've patched theora-mmx to build on x86_64.  The patch against
> SVN is attached.
Excellent, thanks for doing this! Some comments inline.
> Basically all I did was to copy lib/i386 to lib/x86_64 and tweak the
> assembler code a bit:
> * added to each file: typedef unsigned long int ogg_uint64_t
This is reasonable, since the file will only be compiled with gcc on 
x86_64. But really we should make libogg provide this type if people
are wanting it.
> * converted all asm inputs to 64-bit in: dsp_mmxext.c,  fdct_mmx.c,  
> recon_mmx.c
> * left all asm outputs at 32-bit
> * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus
> there's no need for the old mmx version
Sure.
> A couple things that need improvement:
> * I don't know automake/autoconf that well, so I don't know how to
> make it automatically choose i386/x86_64 in the Makefile.  I did make
> it so that cpu.c will choose the appropriate cpuid routine based on
> #if defined(__x86_64__)
Right. Unfortunately there needs to be more of this before I can apply 
since it breaks the x86_32 build as is. I can make the configure script 
tell the makefile which directory to compile, but for the sake of those 
using other build systems, it would really be better if it was always 
safe to compile both sets on any arch and have the inappropriate code 
#ifdef'd out.

It may also make sense to add an arch to the asm functions to 
distinguish the two sets.

You might also look at what ruik did in the theora-exp branch. He kept 
unified source files and just used a few conditionals to make the same 
code work on both. Perhaps that's less helpful if you're intending to 
rewrite everything to use 64 bit integers, but see for examples

  http://svn.xiph.org/experimental/derf/theora-exp/lib/x86/

Anyway, thanks for getting the ball rolling here, this has been an 
oft-requested feature.

 -r

j@v2v.cc

2006-May-03 05:08 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

On Wed, 2006-05-03 at 00:48 -0400, Dan Lenski wrote:> * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE, thus
> there's no need for the old mmx versionthere might be no need for the old versions if there is one in mmxext,
looking at dsp_i386_mmx_init and dsp_i386_mmxext_init this is not the 
case for 
  funcs->restore_fpu = restore_fpu;
  funcs->sub8x8 = sub8x8__mmx;
  funcs->sub8x8_128 = sub8x8_128__mmx;
  funcs->sub8x8avg2 = sub8x8avg2__mmx;
  funcs->intra8x8_err = intra8x8_err__mmx;
  funcs->inter8x8_err = inter8x8_err__mmx;


j

Dan Lenski

2006-May-03 08:37 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

On 5/3/06, j@kein.org <j@kein.org> wrote:> On Wed, 2006-05-03 at 00:48 -0400, Dan Lenski wrote:
> > * I didn't patch dsp_mmx.c, since all x86_64 processors have SSE,
thus
> > there's no need for the old mmx version
> there might be no need for the old versions if there is one in mmxext,
> looking at dsp_i386_mmx_init and dsp_i386_mmxext_init this is not the
> case for
>   funcs->restore_fpu = restore_fpu;
>   funcs->sub8x8 = sub8x8__mmx;
>   funcs->sub8x8_128 = sub8x8_128__mmx;
>   funcs->sub8x8avg2 = sub8x8avg2__mmx;
>   funcs->intra8x8_err = intra8x8_err__mmx;
>   funcs->inter8x8_err = inter8x8_err__mmx;
I think the old mmx code is needed for x86_32 processors such as
Pentium MMX, P2, K62, K63 etc., where there is MMX support but no SSE
support.

Dan

Dan Lenski

2006-May-03 16:11 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

On 5/3/06, Michael Smith <msmith@xiph.org> wrote:> >
> > I had the exact same problem.  I found this on Usenet:
> >
http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/bbef7633760b5472/a28c7c5bfb46c85f%23a28c7c5bfb46c85f
> > It seems that the deal is that PIC and non-PIC code don't play
nice on
> > x86_64, but you can get away w/it on x86_32 (I need easier
> > abbreviations!!!) so lots of libs have this problem.
>
> The problem is the global variables in a few files (e.g. V128 in
> recon_mmx.c), I'm pretty sure.
>
> Unfortunately, I don't know what the solution is. Possibly loading the
> constants into local variables (in the surrounding C code) and only
> using the locals would be a workable solution; there might be better
> ones.
Thanks for explaining that.  Fortunately there's a pretty easy fix: we
can load 64-bit immediates into general purpose registers on x86_64,
so I'm just going to clobber %rax and use it to transfer immediates
into the %mm regs.

Dan

Michael Smith

2006-May-04 01:25 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

On 5/4/06, Dan Lenski <dlenski@gmail.com> wrote:>
> Thanks for explaining that.  Fortunately there's a pretty easy fix: we
> can load 64-bit immediates into general purpose registers on x86_64,
> so I'm just going to clobber %rax and use it to transfer immediates
> into the %mm regs.
Great!

Though I was able to help out a bit with an explanation, I can neither
program in assembly, nor (since I don't have one) can I test x86-64 -
so I'm glad it was enough information for you to fix things up!

It's excellent to see some forward progress on this code once again.
Are you interested in doing ongoing work with theora (more
optimisations, or other)? If you are, we should set you up with an SVN
account.

I've committed your patch.

Mike

Ralph Giles

2006-May-05 18:31 UTC

head link

[Theora-dev] patch to build theora-mmx on AMD64

For those not following along, Dan's patch is in svn now.

Dan, a couple of things I noticed:

You've got an _i386_ infix in your DspFunctions initializer, but the 
32-bit versions don't. We should probably fix both of those to have a 
more appropriate marker. x86_32 and x86_64 seem approprate to me. That 
ok with you?

None of your initializers are actually getting called from the source, 
so your work isn't being used in the svn version. Did we forget a 
commit?

 -r

Apparently Analagous Threads

Search for more seemingly similar threads

theora dev - May 2006 - patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

[Theora-dev] patch to build theora-mmx on AMD64

Apparently Analagous Threads