Felipe Portavales Goldstein
2006-May-30 23:07 UTC
[Theora-dev] 16 bits, cast on idct function
Hi all, Just a stupid question The IDctSlow function on file idct.c has this line : ip[0] = (ogg_int16_t)((_Gd + _Cd ) >> 0); The ip[0] , _Gd and _Cd are of type ogg_int32_t My question is: The result of (_Gd + _Cd) can be a number with more than 16 bits ? (yes, it can be because they are int32, but the algorithm could guarantee something about that... I dont know...) If can, the cast (ogg_int16_t) will truncate the number to the 16 less significant bits, and will get a wrong result... the ip[0] is 32 bits, so, why truncate to 16 bits ? But I'm realy confused with the >> 0 , This shift right zero can do something or someone just forgot to delete it ? Thanks -- Felipe
Felipe Portavales Goldstein
2006-May-31 10:07 UTC
[Theora-dev] 16 bits, cast on idct function
On 5/31/06, Timothy B. Terriberry <tterribe@vt.edu> wrote:> Remembering to CC: the list this time.:-) my mistake> > Felipe Portavales Goldstein wrote: > > On 5/31/06, Timothy B. Terriberry <tterribe@vt.edu> wrote: > > > >> Felipe Portavales Goldstein wrote: > >> > My question is: > >> > > >> > The result of (_Gd + _Cd) can be a number with more than 16 bits ? > >> > (yes, it can be because they are int32, but the algorithm could > >> > guarantee something about that... I dont know...) > >> > >> With normal input, certainly this would never occur. However, due to > >> quantization error, rounding error, etc., it is theoretically possible > >> to generate a number with more than 16 bits here. > > > > > > Good :-) > > > >> > >> > If can, the cast (ogg_int16_t) will truncate the number to the 16 less > >> > significant bits, and will get a wrong result... > >> > > >> > the ip[0] is 32 bits, so, why truncate to 16 bits ? > >> > >> The main answer is, "To make SIMD/hardware implementations easier." > >> These will generally use 16-bit registers, and so will automatically > >> have done the truncation. > > > > > > Your right, Its better to use 16-bit registers. And using 16-bit > > adders and multipliers we can get shorters critical-paths , having a > > higher clock rate. > > > > Then, I have other question: > > > > If the result is truncated to 16 bits, why the IntermediateData was > > declared as 32 bits ? > > > > ogg_int32_t IntermediateData[64]; > > ogg_int32_t * ip = IntermediateData; > > > > I think this is because the dequant_slow result is 32 bits, and is > > stored in the IntermediateData > > > > But, this dequant result is multiplied by a 16 bit defined cossine > > factor , and this new result is shifted right 16 bits and stored in > > IntermediateData > > > > Im thinking If I could use 16 bits IntermediateData array. > > > > The dequant especification says: > > Output parameters: > > DQC - integer array - size = 14 bits > > > > I think that I can use the InteremediateData as 16 bits integer. > > What do you think ? > > Yes, you certainly can. On modern 32-bit CPUs, 16-bit instructions are > very, very slow, so we avoid them when we can. The only real reason to > use 16-bit operands on a 32-bit CPU is to save memory bandwidth, which > is the primary bottleneck in video processing. Since IntermediateData is > local, and likely to be entirely in cache, there's no reason to make it > 16 bits. > > If you are implementing the iDCT for a different instruction > set/architecture, I highly suggest working from Section 7.9.3 of the > spec directly. The spec can be obtained from: > http://www.theora.org/doc/Theora_I_spec.pdfI'm working on a theora decoder on FPGA. I'm writing directly the hardware in VHDL. I'm preparing to put the VHDL files on the SVN and post in this list a description of this work as soon as possible. Yes, I'm reading the spec. But sometimes the libtheora software can help.> > >> The important thing is not that the iDCT gives you valid values that > >> make sense in such situations, but that it gives you the _same_ values > >> across all implementations, even when the input is invalid. If that were > >> not the case, then the decoded frame would not be the same as what the > >> encoder _thought_ the decoded frame was going to be, and so the next > >> subsequent frame would also be wrong, etc., all the way until the next > >> keyframe. > >> > >> Think of it this way: you can never generate a _wrong_ result so long as > >> you follow the specification. The specification tells you what result > >> you're going to get for any input. If the encoder chose an input that > >> caused overflow, well, that's the encoder's problem, not the decoder's. > >> > >> > But I'm realy confused with the >> 0 , > >> > This shift right zero can do something or someone just forgot to delete > >> > it ? > >> > >> I assume the original author was playing around with dividing up the >>4 > >> in the op[] stage between the two. It doesn't matter; any compiler worth > >> its salt will optimize the useless operation away. >-- ________________________________________ Felipe Portavales <portavales@gmail.com> Undergraduate Student - IC-UNICAMP Computer Systems Laboratory http://www.lsc.ic.unicamp.br
Felipe Portavales Goldstein
2006-May-31 20:10 UTC
[Theora-dev] 16 bits, cast on idct function
Yes, It runs very well, with a very good latency. But I synthesized for Stratix FPGA, and it consumes about 20% of the slices, this is because the distributed RAM. Im using (on this first version) a RAM like an array, acessing all time , without worry. But, It inferrs flipflops for each memory position, and big muxes to control it. So, to solve this problem, I will use a syncronous memory model, That will inferr Block RAMS (FPGA specialized blocks). This is like small SDRAMs on th FPGA chip. I think that using it , the area can drop down to 3% to 5% of the Stratix FPGA slices. On 5/31/06, Ralph Giles <giles@xiph.org> wrote:> On Wed, May 31, 2006 at 04:26:50PM -0300, Felipe Portavales Goldstein wrote: > > YEAAAAAAAAHHHHH > > > > IDCT_SLOW VHDL model is working > > but I neet optimize it to consume less FPGA resources like multiplyers. > > > > i will send to svn this night > > SWEET!!! > > This runs in ghdl? > > -r >-- ________________________________________ Felipe Portavales <portavales@gmail.com> Undergraduate Student - IC-UNICAMP Computer Systems Laboratory http://www.lsc.ic.unicamp.br