thr3ads.net - theora dev - [Theora-dev] Theora Decoding on FPGA [May 2006]

If this information is useful, please help other people find it:
Share via:

Felipe Portavales Goldstein

2006-May-31 23:47 UTC

[Theora-dev] Theora Decoding on FPGA

Hello people

My name is Felipe and I sent a proposal to the Google Summer of Code
that the goal is to get a FPGA embeded system decoding Theora Streams
in real-time.
It was accepted and the mentor is the Ralph Giles.

The proposal can be viewd here:

atlas.lsc.ic.unicamp.br/~portavales/wp-content/uploads/2006/05/soc_proposal.txt

There is also a presentation with a better division of the hardware modules:

svn.xiph.org/trunk/theora-fpga/doc/hard_theora.pdf

Now, I'm working on it, and today I did a simple implementation of the
IDctSlow procedure as a VHDL module.

This module run and decode samples correctly, but It consumes a lot of
FPGA resources (logic cells, multipliers, etc..)
I will optimize this module for area, to get better results.

The testbench uses the GHDL tool to simulate and can be download from the svn:

svn.xiph.org/trunk/theora-fpga/idctslow

Just run:
$make
$make run
$make compare
to see the testbench working and validating the module data output.


This IDctSlow implementation was synthesized to the Altera Stratix II
FPGA. The report is below:

------------------------------------
Analysis & Synthesis Status : Successful - Thu Jun  1 02:15:09 2006
Quartus II Version : 5.1 Build 176 10/26/2005 SJ
Revision Name : idctslow
Top-level Entity Name : IDctSlow
Family : Stratix II
Total combinational functions : 13782
Total registers : 3451
Total pins : 54
Total virtual pins : 0
Total memory bits : 2,048
DSP block 9-bit elements : 230
Total PLLs : 0
Total DLLs : 0
------------------------------------


These numbers are no good.
Im using (on this first version) a RAM like an array, acessing every
time , without worry.
But, It inferrs flipflops for each memory position, and big muxes to control it.

So, to solve this problem, I will use a syncronous memory model, That
will inferr Block RAMS (FPGA specialized blocks). This is like small
SRAMs into the FPGA chip.

I think that using it, the area can drop down to 3% to 5% of the
Stratix FPGA slices. (estimated by looking other detailed synthesis reports)

And I'm using a lot of multipliers to do all calculations in just one
clock cycle (this is easier), but (to save multipliers) I can break
the operations in several clock cycles and use the same multiplier
across them.

Now I'm working on these optimizations.


Bye
--felipe




-- 
________________________________________
Felipe Portavales <portavales@gmail.com>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory
lsc.ic.unicamp.br

Seemingly Similar Threads

Search for more apparently analagous threads

theora dev - May 2006 - Theora Decoding on FPGA

[Theora-dev] Theora Decoding on FPGA

Seemingly Similar Threads

Wisdom of the Ancients