thr3ads.net - theora dev - [Theora-dev] 16 bits, cast on idct function [May 2006]

If this information is useful, please help other people find it:
Share via:

Felipe Portavales Goldstein

2006-May-30 23:07 UTC

[Theora-dev] 16 bits, cast on idct function

Hi all,

Just a stupid question

The IDctSlow function on file idct.c has this line :

      ip[0] = (ogg_int16_t)((_Gd + _Cd )   >> 0);


The ip[0] , _Gd and _Cd are of type ogg_int32_t

My question is:

The result of (_Gd + _Cd)  can be a number with more than 16 bits ?
(yes, it can be because they are int32, but the algorithm could
guarantee something about that... I dont know...)

If can, the cast (ogg_int16_t) will truncate the number to the 16 less
significant bits, and will get a wrong result...

the ip[0] is 32 bits, so, why truncate to 16 bits ?

But I'm realy confused with the >> 0 ,
This shift right zero can do something or someone just forgot to delete it ?



Thanks
-- Felipe

Felipe Portavales Goldstein

2006-May-31 10:07 UTC

head link

[Theora-dev] 16 bits, cast on idct function

On 5/31/06, Timothy B. Terriberry <tterribe@vt.edu>
wrote:> Remembering to CC: the list this time.
:-)
my mistake
>
> Felipe Portavales Goldstein wrote:
> > On 5/31/06, Timothy B. Terriberry <tterribe@vt.edu> wrote:
> >
> >> Felipe Portavales Goldstein wrote:
> >> > My question is:
> >> >
> >> > The result of (_Gd + _Cd)  can be a number with more than 16
bits ?
> >> > (yes, it can be because they are int32, but the algorithm
could
> >> > guarantee something about that... I dont know...)
> >>
> >> With normal input, certainly this would never occur. However, due
to
> >> quantization error, rounding error, etc., it is theoretically
possible
> >> to generate a number with more than 16 bits here.
> >
> >
> > Good :-)
> >
> >>
> >> > If can, the cast (ogg_int16_t) will truncate the number to
the 16 less
> >> > significant bits, and will get a wrong result...
> >> >
> >> > the ip[0] is 32 bits, so, why truncate to 16 bits ?
> >>
> >> The main answer is, "To make SIMD/hardware implementations
easier."
> >> These will generally use 16-bit registers, and so will
automatically
> >> have done the truncation.
> >
> >
> > Your right, Its better to use 16-bit registers. And using 16-bit
> > adders and multipliers we can get shorters critical-paths , having a
> > higher clock rate.
> >
> > Then, I have other question:
> >
> > If the result is truncated to 16 bits, why the IntermediateData was
> > declared as 32 bits ?
> >
> >  ogg_int32_t IntermediateData[64];
> >  ogg_int32_t * ip = IntermediateData;
> >
> > I think this is because the dequant_slow result is 32 bits, and is
> > stored in the IntermediateData
> >
> > But, this dequant result is multiplied by a 16 bit defined cossine
> > factor , and this new result is shifted right 16 bits and stored in
> > IntermediateData
> >
> > Im thinking If I could use 16 bits IntermediateData array.
> >
> > The dequant especification says:
> > Output parameters:
> > DQC - integer array - size = 14 bits
> >
> > I think that I can use the InteremediateData as 16 bits integer.
> > What do you think ?
>
> Yes, you certainly can. On modern 32-bit CPUs, 16-bit instructions are
> very, very slow, so we avoid them when we can. The only real reason to
> use 16-bit operands on a 32-bit CPU is to save memory bandwidth, which
> is the primary bottleneck in video processing. Since IntermediateData is
> local, and likely to be entirely in cache, there's no reason to make it
> 16 bits.
>
> If you are implementing the iDCT for a different instruction
> set/architecture, I highly suggest working from Section 7.9.3 of the
> spec directly. The spec can be obtained from:
> http://www.theora.org/doc/Theora_I_spec.pdf
I'm working on a theora decoder on FPGA. I'm writing directly the
hardware in VHDL.

I'm preparing to put the VHDL files on the SVN and post in this list a
description of this work as soon as possible.

Yes, I'm reading the spec.
But sometimes the libtheora software can help.

>
> >> The important thing is not that the iDCT gives you valid values
that
> >> make sense in such situations, but that it gives you the _same_
values
> >> across all implementations, even when the input is invalid. If
that were
> >> not the case, then the decoded frame would not be the same as what
the
> >> encoder _thought_ the decoded frame was going to be, and so the
next
> >> subsequent frame would also be wrong, etc., all the way until the
next
> >> keyframe.
> >>
> >> Think of it this way: you can never generate a _wrong_ result so
long as
> >> you follow the specification. The specification tells you what
result
> >> you're going to get for any input. If the encoder chose an
input that
> >> caused overflow, well, that's the encoder's problem, not
the decoder's.
> >>
> >> > But I'm realy confused with the >> 0 ,
> >> > This shift right zero can do something or someone just forgot
to delete
> >> > it ?
> >>
> >> I assume the original author was playing around with dividing up
the >>4
> >> in the op[] stage between the two. It doesn't matter; any
compiler worth
> >> its salt will optimize the useless operation away.
>

-- 
________________________________________
Felipe Portavales <portavales@gmail.com>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory
http://www.lsc.ic.unicamp.br

Felipe Portavales Goldstein

2006-May-31 20:10 UTC

head link

[Theora-dev] 16 bits, cast on idct function

Yes, It runs very well, with a very good latency.

But I synthesized for Stratix FPGA, and it consumes about 20% of the
slices, this is because the distributed RAM.
Im using (on this first version) a RAM like an array, acessing all
time , without worry.
But, It inferrs flipflops for each memory position, and big muxes to control it.

So, to solve this problem, I will use a syncronous memory model, That
will inferr Block RAMS (FPGA specialized blocks). This is like small
SDRAMs on th FPGA chip.

I think that using it , the area can drop down to 3% to 5% of the
Stratix FPGA slices.

On 5/31/06, Ralph Giles <giles@xiph.org> wrote:> On Wed, May 31, 2006 at 04:26:50PM -0300, Felipe Portavales Goldstein
wrote:
> > YEAAAAAAAAHHHHH
> >
> > IDCT_SLOW VHDL model is working
> > but I neet optimize it to consume less FPGA resources like
multiplyers.
> >
> > i will send to svn this night
>
> SWEET!!!
>
> This runs in ghdl?
>
>  -r
>

-- 
________________________________________
Felipe Portavales <portavales@gmail.com>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory
http://www.lsc.ic.unicamp.br

Possibly Parallel Threads

Search for more seemingly similar threads

theora dev - May 2006 - 16 bits, cast on idct function

[Theora-dev] 16 bits, cast on idct function

[Theora-dev] 16 bits, cast on idct function

[Theora-dev] 16 bits, cast on idct function

Possibly Parallel Threads