Hello, I am programmer working on a product which integrates Theora. I have a question regarding the memory use on some of the internals of Theora. Is this the right forum for this question, and if not, does anyone know where an appropriate place to ask is? Thanks Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/theora-dev/attachments/20121010/e79734c4/attachment.htm
Thank you Ralph (and Benjamin, who also replied) This is going to be a little long, but the crux of my question is finding ways to have theora malloc as little as possible. I am working on an embedded system which needs to run constantly in public places with no supervision - think arcade video game. Due to slow flash media, and in the interest of speed, I have been loading in all ogvs needed at bootup in to RAM, and preprocessing them into header information and theora frames, so hopefully they have been de-containerized. I am working with all known data - movies that have been created internally here. At run time, I recreate the ogg_packet from my RAM buffer, and update the movies like so (edited for brevity) terr = theora_decode_packetin(&zog->td,&zog->op); terr = theora_decode_YUVout(&zog->td,&zog->yuv); // decode frame into YUV buffer // OK, here is where we deviate from the standard example // Normal program would convert the YUV data to RGB data for display // That takes a lot of time - we just send the YUV data right to the graphics card as an 8-bit texture // And we have a special shader on the graphics card that does the YUV to RGB conversion on a per pixel basis //Send yuv.y, yuv.u and yuv.v to graphics card, let it handle it from there This has been working well and vigorously tested for over a year The second part, is that I load and initialize all movies at startup - we are now growing to have around 180 movies, many of them 1920x1080. Psuedocode for initialize: theora_comment_init(&zog->tc); theora_info_init(&zog->ti); for(i=0;i<3;i++) { if (theora_decode_header(&zog->ti,&zog->tc,&zog->head.op_for_ti[i])) { zerr("zog_load: theora_decode_header err %s\n",filename); fclose(f); return ERR_FILE; } } err = theora_decode_init(&zog->td,&zog->ti); if (err) { zerr("zog_load: theora_decode_init err %d %s\n",err,filename); zerr("%08X %08X\n",(unsigned int)&zog->td,(unsigned int)&zog->ti); fclose(f); return ERR_FILE; } theora_decode_init is where I seem to be running out of memory. I confess that I do not understand much of the internals of theora. Given that I have 180 (and growing) movies on tap, is there any memory that can be shared amongst all open movies, instead of individually malloced for each one? For example, I'm not quite sure what 'dct_tokens' is, but mallocing that in oc_dec_init() seems to be putting me over the edge. I notice that the size of those mallocs looks similar to what I'd expect the RGB pixel data to take for each movie. Changing // _dec->dct_tokens=(unsigned char *)_ogg_malloc((64+64+1)* _dec->state.nfrags*sizeof(_dec->dct_tokens[0])); To unsigned char sambuf[MB(8)]; // global _dec->dct_tokens=(unsigned char *)&sambuf[0]; 'seems' to work, but not knowing the internals of theora, makes me nervous that I have broken something. To sum up, and thanks for reading this far 1. Given an unusually large number of open movies and a fixed datset, are there any memory savings to be had by sharing buffers instead of individually malloc-ing? 2. Since I am working directly with the YUV data, are there any memory savings to be had dealing with YUV-RGB conversion? Thanks for any help, and I will be more than happy to provide any clarification! Sam -----Original Message----- From: Ralph Giles [mailto:giles at thaumas.net] Sent: Thursday, October 11, 2012 2:56 AM To: Engineering Cc: theora-dev at xiph.org Subject: Re: [theora-dev] Theora integration question On 12-10-10 1:16 PM, Engineering wrote:> Hello, I am programmer working on a product which integrates Theora. I > have a question regarding the memory use on some of the internals of > Theora. Is this the right forum for this question, and if not, does > anyone know where an appropriate place to ask is?Yes, this is an appropriate forum to ask theora programming questions. Responders are all volunteers, but people generally try to be helpful. -r
> -----Original Message----- > From: Ralph Giles [mailto:giles at thaumas.net] > > I have test code into so that the largest (1920x1080) movies share > buffers > > for ref_frame_data and dct_tokens > > > > As long as I remember to update the movie at frame 0 before use, I > don't see > > any issues, but am I inviting disaster? > > The important thing here is that the setup headers match between all > the > videos. Same resolution, quant matricies, huffman tables, etc. >Yes, I am trying to step on as little as possible. It looks to me like dct_tokens is an internal working buffer, so sharing it would not be a problem as long as I'm not trying to process movies in parallel ref_frame_data looks like the YUV buffers, which would be bad to share, but I think I can get away with it as long as when I switch movies, I make sure to decode frame 0 again before use. My reponse to Benjamin will have more details, and thanks for responding! Sam
Hi Benjamin, thanks for the response. You are correct, I am abusing the code a bit (a lot?) The design philosophy of Theora makes perfect sense as is. I am using it in unintended ways. One example we have used Theora in is a soccer game like you'd see at Chuck E Cheese. The kids are kicking balls at targets, and the video screen shows 5-10 second clips in reaction. The reaction needs to be almost instantaneous, so a few milliseconds shouldn't matter, but over 10-20 and I have some problems. We also decode to texture memory for use with 3D graphics, which is how we originally started with Theora. Like a 3D environment with video monitors, each playing a different movie. Anyway, I admit freely that I'm abusing the code ;) I also wasn't paying enough attention to the memory footprint. Our internal system does not use malloc at all - I have my own heap for that. So we do have situations where 10 small videos can be playing simultaneously on different areas on the screen, and also situations where I know there is just one video, like full-screen, and sharing is OK. You and Ralph have given me some excellent insight on how to better integrate Theora into our engine for the future, but as a quick fix, do you foresee any issues (crash issues!) with allowing the fullscreen movies to share memory space for ref_frame_data and dct_tokens?> On Mon, Oct 15, 2012 at 12:52 PM, Engineering <ee at athyriogames.com> > wrote: > > As long as I remember to update the movie at frame 0 before use, I > > don't see any issues, but am I inviting disaster? Doing this gets me > > from 1.2GB to 0.2GB of RAM usage. > > It's hard to see what you're doing, but I suspect your thinking here is > 180 degrees opposite to ours. We typically expect that users will only > have a single theora decoder in memory, and that when the user requests > to change the movie, software that uses libtheora will (1) deallocate > all Theora-related memory and (2) allocate and initialize a new decoder > from scratch. Even at 1080p, the decoder shouldn't require more than > about 20 MB of memory, and should initialize in milliseconds. > > The only reason I can think of to keep lots of decoders sitting around > in memory is if you're trying to maintain seek state, so you can pick > up where you left off in the middle of a movie ... and even then, it's > probably not the right solution. >
Thanks, Ralph I would assume frame 0 is always a keyframe, but I will use that to doublecheck at loadtime. I am working on a known dataset of movies which do not change Sam> -----Original Message----- > From: Ralph Giles [mailto:giles at thaumas.net] > Yes. The decoder needs two reference frame buffers (the keyframe and > the > previously-decoded frame) to reconstruct the current frame during > decoding. Keyframes, by definition, are not predicted from any previous > frame data, so as long as you feed in a keyframe first after the switch > you should get clean output. > > You can use th_packet_iskeyframe() to verify this behaviour at runtime. > > -r
Hi Benjamin, Just to give another example on our use (abuse?) of Theora Say we have a select screen at game start with 5 options. To make it look more interesting, I have the artists go wild with 3D animation. We end up with 13 movies - 5 looping idle movies for options A,B,C,D, and E, and 8 transitions between (a to b, b to c, etc.) Each of these movies could be around 1 second long, and need to be almost instantly accessible. Knowing what I know now, I feel a bit silly, but at the time, I was only looking at the RAM footprint of the data - and I turned the artists loose on stuff like this. I agree that the individual files works well for this, and makes programming and updating individual pieces of art easier. Currently this means that I have 13 encoders sitting in memory for roughly 13 seconds of video. I certainly acknowledge that the issues are not due to Theora's design, but rather my architecture and misuse of Theora! Our movie's YUV data ends up in the graphics RAM on a video card. I hit that wall a long time ago, and so we have a mechanism for "mutually exclusive" movies to share the same VRAM. That's why my first instinct was to see if there were any internal Theora buffers that could be shared the same way. Sam> From: Benjamin Schwartz [mailto:benjamin.m.schwartz at gmail.com]...> A single Theora movie must have a single resolution, so you can't use > this to combine different movies at different resolutions (unless you > scale them all up to 1080p before encoding, which I don't recommend). > Even if they are matched, I disagree that this is more elegant. A > filesystem, with named files, is a great, elegant way to identify > distinct movies.