Howdy. I'm working on Altivec versions of some of the libFLAC functions. I figured the best candidates would be those that had MMX/SSE/3dnow versions, and I picked FLAC__lpc_restore_signal() to do first, since it's relatively simple. In stepping through some runs, it appears that 'order' mod 4 is always 0. Is that guaranteed, either by the format or by higher functions in the reference decoder? Also, what assumptions can I make about the alignment of 'data' and 'qlp_coeff'? It would be really nice if these were both doubleword-aligned. Finally, in a more general context, is there an easy way to build for profiling, or do I have to edit the makefiles? I'm using gcc and gprof. Thanks in advance, -Brady -- Brady Patterson (brady@spaceship.com) Do you know Old Kentucky Shark?
brady@spaceship.com said:> Finally, in a more general context, is there an easy way to build for > profiling, or do I have to edit the makefiles? I'm using gcc and > gprof.Try oprofile. It's more accurate than gprof, and it doesn't require you to recompile the source you're profiling. Jason
On Fri, 21 Feb 2003, Jason Lunz wrote:> Try oprofile. It's more accurate than gprof, and it doesn't require you > to recompile the source you're profiling.That looks like a nice piece of software, but it appears to be specific to Linux-x86, so I don't think it will be much help in profiling Altivec code. :) -- Brady Patterson (brady@spaceship.com) Do you know Old Kentucky Shark?
On Thu, Feb 20, 2003 at 05:46:50PM -0600, Brady Patterson wrote:> Finally, in a more general context, is there an easy way to build for > profiling, or do I have to edit the makefiles? I'm using gcc and gprof.If you want to profile an optimized build, you will need to tweak configure.in, since it always adds -fomit-frame-pointer when not debugging (maybe this should be changed...). If a debugging build is OK, you should be able to do: CFLAGS="-pg" ./configure --enable-debug (likewise for whatever other CFLAGS you want) -- - mdz
On Thu, Feb 20, 2003 at 05:46:50PM -0600, Brady Patterson wrote:> I'm working on Altivec versions of some of the libFLAC functions. I figured > the best candidates would be those that had MMX/SSE/3dnow versions, and I > picked FLAC__lpc_restore_signal() to do first, since it's relatively simple. > > In stepping through some runs, it appears that 'order' mod 4 is always 0. Is > that guaranteed, either by the format or by higher functions in the reference > decoder?No, 1 <= order <= 32. There is -l option :).> Also, what assumptions can I make about the alignment of 'data' and > 'qlp_coeff'? It would be really nice if these were both doubleword-aligned.Everything should be 4 byte aligned, residual is 8 byte aligned on GNU libc based system. If this isn't good enough (and it isn't for SSE2), we will have to replace appropriate malloc calls. However, you can copy qlp_coeffs on stack for better alignment.> > Finally, in a more general context, is there an easy way to build for > profiling, or do I have to edit the makefiles? I'm using gcc and gprof.IIRC powerpc has performance counters, if you want the best code, use them. For gprof profiling patch configure.in and build flac with --enable-static --enable-profile 186a187,194> AC_ARG_ENABLE(profile, > [ --enable-profile Turn on profiling], > [case "${enableval}" in > yes) profile=true ;; > no) profile=false ;; > *) AC_MSG_ERROR(bad value ${enableval} for --enable-profile) ;; > esac],[profile=false]) >368a377,384> if test x$profile = xtrue; then > if test x$GCC = xyes; then > CFLAGS="`echo $CFLAGS | sed -e 's/-fomit-frame-pointer//g'` -pg" > else > CFLAGS="$CFLAGS -p" > fi > fi >-- Miroslav Lichvar
lunz@falooley.org said:>> Finally, in a more general context, is there an easy way to build for >> profiling, or do I have to edit the makefiles? I'm using gcc and >> gprof. > > Try oprofile. It's more accurate than gprof, and it doesn't require you > to recompile the source you're profiling.sorry, that doesn't make any sense. you obviously don't need an i386 linux profiler if you're doing altivec optimizations. :) Jason
On Fri, 21 Feb 2003, Miroslav Lichvar wrote:> No, 1 <= order <= 32. There is -l option :).Indeed. For some reason I decided to debug an optimized build, which for some reason was showing order being off by a factor of 4. Thus it appeared that order%4 == 0 in that case, when actually it wasn't. Needless to say I've turned off optimization for the time being, which I should have done earlier.> Everything should be 4 byte aligned, residual is 8 byte aligned on GNU > libc based system. If this isn't good enough (and it isn't for SSE2), > we will have to replace appropriate malloc calls.It isn't, and I'm currently doing just that.> However, you can copy qlp_coeffs on stack for better alignment.You mean copying them into a local array in read_subframe_lpc_(), right? I'd still have to manually align that array, though, or am I missing something?> IIRC powerpc has performance counters, if you want the best code, use them.Good idea. Cpocuba, or however that would approximate in Roman letters :) . -- Brady Patterson (brady@spaceship.com) Do you know Old Kentucky Shark?