Andrey and I were having a conversation off list, forwarding relevent
bits of the last response from Andrey for public reference.
--- begin forward ---
On Mar 31, 2005 9:22 PM, Ralph Giles <giles@xiph.org>
wrote:> On Thu, Mar 31, 2005 at 06:46:06PM -0700, Andrey Filippov wrote:
>
> > > BTW, you asked about padding theora packet data to a byte
boundary. I
> > > don't see any way to do this, but if you have resources left
over, can
> > > you pre-shift the packet data on the fpga and then only have to
> > > mask together the overlap byte with the header.
>
> > Still It will make it much more suitable for the hardware
> > accelerations in general - not just my. Maybe it is possible to add
> > such feature (preserving backward compatibility)?
> > It will very much simplify the implementations with the motion
> > compensation and other dynamic modes where frame header can not be
> > built until the whole frame is processed.
>
> There is no room to change something like that in a backward-compatible
> way. It turns out if you specify two qi values, the coded block table
> starts on a byte boundary, but only for inter frames. You can shift it
> one bit either way by using 3 or the normal 1 qi value in the frame
> header, but that still won't align an intra frame. Having more than one
> qi value means you have to insert another block table telling you which
> one to use, which is extra overhead, but you might be able to make that
> up in quality by actually using the different qi values as appropriate.
>
> However, I may have misunderstood how you're constructing things. Your
> email mentioned 'static' coded block maps, I'm not sure how
that can
> work. :)
That is simple. I can build a map of coded blocks and download it to
FPGA (due to the size of a table granularity is 32x32pixels). This map
can be used to effectively decrease frame rate outside of the area of
interest (I also can use 2 qi/frame with a similar map). With this map
I have 3 types of frames - INTRA, INTER-full(all blocks coded) and
INTER-map (block are coded according to the map). That gives 3
different frame headers that are prepared by software in advance, and
the bit size is calculated. Then FPGA receives a 3-element table (one
for each type) - number of bits (0..31) to shift frame data in such a
way, that the frame data can be appended to the frame header without
shifting every word of it.
If the frame header will be built depending on the frame data - then
the only way how to build it (without increasing latency and required
memory size - it is difficult too) is to build the header in FPGA -
During first pass (1 frame time) when data is processed in
macroblock-scan order and pretokens are saved to the sdram all the
data for the header will be ready. And then the headers could be built
just before the stage 2 (starting from "pretokens" in coded order)
send the data out. It is possible - but it is much easier to build
headers in software in parallel with the fpga processing the bulk of
data.
> Something better discussed on the lists, anyway.
It makes sense. Maybe you can just post our conversation?
--- end forward ---
We also discussed high-bitrate compression optimizing for image
quality, either with theora or PNG-based lossles compression.
If I may paraphase Andrey's response, the Axis cpu that handles
the network stack would quickly become a bottleneck at higher
bitrates, and one wants gigabit ethernet for such an application
in any case. There's nothing pin-compatible, but a redesign based
on another family of embedded cpu would be the requisite step
toward such a camera.
-r