mdct.c, lines 436 onward (part of mdct_backward) state
{
DATA_TYPE *oX1=out+n2+n4;
DATA_TYPE *oX2=out+n2+n4;
DATA_TYPE *iX =out;
T =init->trig+n2;
do{
oX1-=4;
oX1[3] = MULT_NORM (iX[0] * T[1] - iX[1] * T[0]);
oX2[0] = -MULT_NORM (iX[0] * T[0] + iX[1] * T[1]);
oX1[2] = MULT_NORM (iX[2] * T[3] - iX[3] * T[2]);
oX2[1] = -MULT_NORM (iX[2] * T[2] + iX[3] * T[3]);
oX1[1] = MULT_NORM (iX[4] * T[5] - iX[5] * T[4]);
oX2[2] = -MULT_NORM (iX[4] * T[4] + iX[5] * T[5]);
oX1[0] = MULT_NORM (iX[6] * T[7] - iX[7] * T[6]);
oX2[3] = -MULT_NORM (iX[6] * T[6] + iX[7] * T[7]);
oX2+=4;
iX += 8;
T += 8;
}while(iX<oX1);
<p>A couple of questions:
(1) why the opening brace for a new code block, the variables iX, oX and T
from the previous lines are not being used
(2) assuming n2+n4 is the initial offset to out (oX1 and oX2) you can write
HandelC code like
{
int oX1, oX2, iX, T;
par
{
oX1 = n2 + n4;
oX2 = n2 + n4;
iX = 0;
T = n2;
}
do{
oX1 -= 4;
par
{
out[oX1 + 3] = MULT_NORM (out[iX+0] * trig[T+1] - out[iX+1] *
trig_256[T+0] );
out[oX2 + 0] = -MULT_NORM (out[iX+0] * trig[T+0] + out[iX+1] *
trig_256[T+1] );
out[oX1 + 2] = MULT_NORM (out[iX+2] * trig[T+3] - out[iX+3] *
trig_256[T+2] );
out[oX2 + 1] = -MULT_NORM (out[iX+2] * trig[T+2] + out[iX+3] *
trig_256[T+3] );
out[oX1 + 1] = MULT_NORM (out[iX+4] * trig[T+5] - out[iX+5] *
trig_256[T+4] );
out[oX2 + 2] = -MULT_NORM (out[iX+4] * trig[T+4] + out[iX+5] *
trig_256[T+5] );
out[oX1 + 0] = MULT_NORM (out[iX+6] * trig[T+7] - out[iX+7] *
trig_256[T+6] );
out[oX2 + 3] = -MULT_NORM (out[iX+6] * trig[T+6] + out[iX+7] *
trig_256[T+7] );
}
par
{
oX2 += 4;
iX += 8;
T += 8;
}
}while(iX < oX1);
trig here is either trig_256 or trig_2048 depending on the block size (I
haven't thought about the best way to do this selection yet)
The code above takes advantage of parallelism, however if you consider n 256, n2
is 128 and n4 is 64
then you are assigning out[195], out[192], out[194], out[193], out[193],
out[194], out[192], out[195] initially when oX1 and oX2 are the same. Is it
safe to keep these statements in parallel (I've actually just noticed that
oX1 is decremented before the values are assigned, therefore it's OK even on
the first pass of the loop). I've done a similar exploitation of
parallelism for the oX[0-3] assignments a few lines previous.
Govind
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.