Brady Patterson
2005-Jan-29 06:41 UTC
[Flac-dev] A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote:> That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and my time is quite limited (my last patch to a free > software project took almost three months to get into decent shape!).Is this still my code? IIRC I commented it extensively, but the structure is certainly non-intuitive. I'll take a look at it. At the time, I thought I wanted control logic that was impossible in C, but that may not be the case. It didn't occur to me that Linux and Apple would use different assemblers; elsewhere Apple uses the GNU tools. I'm also a bit surprised that people are using flac on an Altivecful Linux/PPC system (but I did attempt for such a system to fall back to the non-altivec C code). End digression. Can you point me to a good reference on altivec.h? -- Brady Patterson (brady@spaceship.com) RLRR LRLL RLLR LRRL RRLR LLRL
Josh Coalson
2005-Jan-29 11:25 UTC
[Flac-dev] A couple of points about flac 1.1.1 on ppc/linux/altivec
--- Brady Patterson <brady@spaceship.com> wrote:> On Thu, 27 Jan 2005, John Steele Scott wrote: > > That looks fine to me as well. However, the best solution is > something which > > Luca suggested a few months ago, which is to use the functions > defined in > > altivec.h. These are C functions which map directly to Altivec > machine > > instructions. I am willing to help out, but I don't find the > current lpc_asm.s > > very easy to follow, and my time is quite limited (my last patch to > a free > > software project took almost three months to get into decent > shape!). > > Is this still my code? IIRC I commented it extensively, but the > structure is > certainly non-intuitive.I think this was about the rewritten-for-gas-2.x version of your code. your original routine is still as is and is what I use to build on my ibook.> I'll take a look at it. At the time, I thought I wanted control logic > that was > impossible in C, but that may not be the case.actually I think you're right, there are times when you want/ need everything to be in assembler for the best performance. also, the altivec.h stuff will work only with gcc I think, so the portability problem just shifts somewhere else. but it does look useful when you just want to inline parts of a function.> It didn't occur to me > that Linux > and Apple would use different assemblers; elsewhere Apple uses the > GNU tools.when I looked back at what's on my ibook (OS X 10.1), as -v gives: Apple Computer, Inc. version cctools-384.obj~11, GNU assembler version 1.38 so I am kind of losing hope that any simple configure test is going to get it right. it is GNU assembler (old) but also from apple's developer docs that came with it, they have tweaked it a lot (but funny, there are no sources for it, even though the docs give a path to them). since it is going to take me a while to sort this out, and since we need to get a release out to fix the sonames problem, I have disabled PPC asm functions for now and will get back to it after the release. the static binaries I make for the darwin binary release will have them though, because it's easy for me to compile the right one.> I'm also a bit surprised that people are using flac on an Altivecful > Linux/PPC > system (but I did attempt for such a system to fall back to the > non-altivec C > code). End digression.I think all of that works, it's just the different assembler syntaxes causing problems.> Can you point me to a good reference on altivec.h?http://developer.apple.com/documentation/DeveloperTools/gcc-3.3/gcc/PowerPC-AltiVec-Built-in-Functions.html although this is significantly newer than the docs that came with my version of OS X Josh __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Brady Patterson
2005-Jan-29 13:49 UTC
[Flac-dev] A couple of points about flac 1.1.1 on ppc/linux/altivec
On Sat, 29 Jan 2005, Josh Coalson wrote:> actually I think you're right, there are times when you want/ > need everything to be in assembler for the best performance.I'll try to compare my assembly to compiled code. I went through several iterations of code structure, some of which were not doable in C. But IIRC what I ended up with is close to a switch statement without breaks (a construct I have never expected to use!). If that's true, and assuming altivec.h gives you direct access to the vector registers, I may be able to get similar performance from C code. Plus gcc is probably smarter about instruction ordering than I.> also, the altivec.h stuff will work only with gcc I think, > so the portability problem just shifts somewhere else. but > it does look useful when you just want to inline parts of a > function.Good point; however, does anybody use a non-gcc compiler for PPC?> since it is going to take me a while to sort this out, and since > we need to get a release out to fix the sonames problem, I have > disabled PPC asm functions for now and will get back to it after > the release. the static binaries I make for the darwin binary > release will have them though, because it's easy for me to > compile the right one.That sounds prudent. I don't think it would hurt to have a configure option to enable it though. -- Brady Patterson (brady@spaceship.com) RLRR LRLL RLLR LRRL RRLR LLRL
Brian Willoughby
2005-Jan-29 16:18 UTC
[Flac-dev] A couple of points about flac 1.1.1 on ppc/linux/altivec
[ as -v gives: [ [ Apple Computer, Inc. version cctools-384.obj~11, GNU assembler version 1.38 [ [ so I am kind of losing hope that any simple configure test is [ going to get it right. it is GNU assembler (old) but also from [ apple's developer docs that came with it, they have tweaked it [ a lot (but funny, there are no sources for it, even though the [ docs give a path to them). I believe that the sources are part of (one of) the Developer Install packages, but you have to enable one of the optional checkboxes during install. The sources take up a respectable amount of space, so they're not installed by default. Not sure which package, but you can probably hunt around on your Developer CD for the open source package, rather than loading the whole Developer multi-package. Brian Willoughby Sound Consulting
Chris Csanady
2005-Jan-29 19:17 UTC
[Flac-dev] A couple of points about flac 1.1.1 on ppc/linux/altivec
I originally did some altivec assembly, but it seems C altivec can be nearly optimal using carefully constructed loops, and the occasional gcc extension (labels as values). Considering the various ABI issues, VRsave, and gratuitous gnu/apple differences, I have since re-implemented everything in C. For comparison, I'm appending a 16 bit C restore function; though the setup and unaligned logic is typically not nice, the core algorithms are (I hope) somewhat clear and reasonably small. I have not done much comparison between the C and asm functions, but I believe that the C function is very nearly optimal in most cases. Chris On 2005/01/29, at 8:40, Brady Patterson wrote:> > On Thu, 27 Jan 2005, John Steele Scott wrote: >> That looks fine to me as well. However, the best solution is >> something which >> Luca suggested a few months ago, which is to use the functions >> defined in >> altivec.h. These are C functions which map directly to Altivec machine >> instructions. I am willing to help out, but I don't find the current >> lpc_asm.s >> very easy to follow, and my time is quite limited (my last patch to a >> free >> software project took almost three months to get into decent shape!). > > Is this still my code? IIRC I commented it extensively, but the > structure is > certainly non-intuitive. > > I'll take a look at it. At the time, I thought I wanted control logic > that was > impossible in C, but that may not be the case. It didn't occur to me > that Linux > and Apple would use different assemblers; elsewhere Apple uses the GNU > tools. > I'm also a bit surprised that people are using flac on an Altivecful > Linux/PPC > system (but I did attempt for such a system to fall back to the > non-altivec C > code). End digression. > > Can you point me to a good reference on altivec.h? > > -- > Brady Patterson (brady@spaceship.com) > RLRR LRLL RLLR LRRL RRLR LLRL > > _______________________________________________ > Flac-dev mailing list > Flac-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/flac-devvoid FLAC__lpc_restore_signal_16bit_altivec(const FLAC__int32 residual[], unsigned data_len, const FLAC__int32 qlp_coeff[], unsigned order, int lp_quantization, FLAC__int32 data[]) { int i, j, *r, *end = (int *)residual + data_len, FLAC__align16 qc[16]; intptr_t do0; vu8 p; vs16 qF8, q70, hF8, h70, t; vs32 r03, s, zero = vec_splat_s32(0); vu32 lpq; FLAC__ASSERT(order > 0); FLAC__ASSERT(VecRelAligned(data, residual)); if (order < 2 || order > 16) { FLAC__lpc_restore_signal(residual, data_len, qlp_coeff, order, lp_quantization, data); return; } /* Load lp_quantization into all elements of lpq */ VecLoad4(lpq, (unsigned int *)&lp_quantization); /* qc[] = qlp_coeff[] reversed, aligned, and padded with enough * zeros to complete the vector. */ j = order; i = 16; r = (int *)qlp_coeff; do { qc[--i] = *(r++); } while (--j); while (i & 3) qc[--i] = 0; /* This switch loads the necessary qlp coefficients and data history * into the q* and h* vectors. They are arranged like so: * qF8 = qlp[15] - qlp[8], q70 = qlp[7] - qlp[0] * hF8 = data[-16] - data[-9], h70 = data[-8] - data[-1] * Loading the data is complicated by the fact that it may not be vector * aligned. First, the loads are imlicitly rounded down one vector. Then, * the packed vectors need to be shifted so that the actual data is * aligned at the right. That is the purpose of p here. */ p = vec_lvsr(0, (short *)((-(intptr_t)data & 15) >> 1)); r03 = s = zero; switch (order + 3 & ~3) { case 16: r03 = vec_ld(0, qc); s = vec_ld(-49, data); case 12: qF8 = vec_pack(r03, vec_ld(16, qc)); t = vec_pack( s, vec_ld(-33, data)); hF8 = vec_perm( t, t, p); case 8: r03 = vec_ld(32, qc); s = vec_ld(-17, data); case 4: q70 = vec_pack(r03, vec_ld(48, qc)); h70 = vec_pack( s, vec_ld(-1, data)); h70 = vec_perm( t, h70, p); } /* p is used to shift the history vector to the left one element, and * to insert the recently calculated data element s. Keep in mind, * restore*() only computes one data element at a time: the vec_sums() * leaves the sum in the high word, and the remaining calculation of s * is entirely serial. */ p = (vu8)AVV( 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,30,31); do0 = (intptr_t)data - (intptr_t)residual - 16; /* -16 for preincrement */ r = (int *)residual; r03 = vec_ld(0, residual); if (order > 8) { #define restore16(r) \ s = vec_sums(vec_msum(q70, h70, vec_msum(qF8, hF8, zero)), zero); \ s = vec_add(r, vec_sra(s, lpq)); \ hF8 = vec_sld(hF8, h70, 2); h70 = vec_perm(h70, (vs16)s, p); do { restore16(vec_perm(r03, r03, vec_lvsl(0, ++r))); } while (!VecAligned(r)); vec_st(vec_unpackl(h70), 0, data); while (r < end) { r03 = vec_ld(0, r); r += 4; restore16(vec_splat(r03, 0)); restore16(vec_splat(r03, 1)); restore16(vec_splat(r03, 2)); restore16(vec_splat(r03, 3)); vec_st(vec_unpackl(h70), do0, r); } #undef restore16 } else { #define restore8(r) \ s = vec_sums(vec_msum(q70, h70, zero), zero); \ s = vec_add(r, vec_sra(s, lpq)); \ h70 = vec_perm(h70, (vs16)s, p); do { restore8(vec_perm(r03, r03, vec_lvsl(0, ++r))); } while (!VecAligned(r)); vec_st(vec_unpackl(h70), 0, data); while (r < end) { r03 = vec_ld(0, r); r += 4; restore8(vec_splat(r03, 0)); restore8(vec_splat(r03, 1)); restore8(vec_splat(r03, 2)); restore8(vec_splat(r03, 3)); vec_st(vec_unpackl(h70), do0, r); } #undef restore8 } }