Hi, first post on this list. I run a small game development company specializing in casual Windows/Mac games. We've been using Theora for video playback inside our engine for a while, but we always run into performance issues. I've tracked them down to two parts : YUV to RGB decoding (done via software) and uploading the new pixel data to the video card as a texture. For the YUV to RGB issue, I'll rewrite it in assembler. Uplading the whole 1024x768 frame is proving problematic, though - this is the operation that causes delays in one of the test machines, and it's pretty much unavoidable. I was thinking - since videos are encoded as delta frames, is it possible to get Theora to give me the rects modified since the last frame and upload only that? I'm stuck with theora_decode_YUVout which decodes a full frame (yep, old API, but th_decode_ycbcr_out in the new one seems to be similar) Thanks, --Gabriel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/theora/attachments/20090813/e9ef5ec2/attachment.htm
On Thu, Aug 13, 2009 at 8:04 AM, Gabriel Gambetta<gabriel at mysterystudio.com> wrote:> Hi, first post on this list. I run a small game development company > specializing in casual Windows/Mac games. > > We've been using Theora for video playback inside our engine for a while, > but we always run into performance issues. I've tracked them down to two > parts : YUV to RGB decoding (done via software) and uploading the new pixel > data to the video card as a texture.On the Mac, I use these calls: glTexParameteri(Target, GL_TEXTURE_STORAGE_HINT_APPLE, GL_STORAGE_SHARED_APPLE); glTexImage2D(Target, 0, GL_RGBA, W, H, 0, GL_YCBCR_422_APPLE,GL_UNSIGNED_SHORT_8_8_REV_APPLE, pixels); I can run 1080p at full frame rates on an iMac with these calls and an efficient cropping/422 packing algorithm Linux is problematic, I have not found an equivalent API that works. I haven't looked yet for Windows, but if you find one please let me know. It is ridiculous that in this day and age, OpenGL does not have standard YUV -> RGB conversion functions. On OSX, be sure to use Shark.> For the YUV to RGB issue, I'll rewrite it in assembler. Uplading the whole > 1024x768 frame is proving problematic, though - this is the operation that > causes delays in one of the test machines, and it's pretty much unavoidable.Look at liboggplay, there are some fast conversion routines there, but on the Mac you shouldn't need them, and I'd hope that on Windows you wouldn't either.> I was thinking - since videos are encoded as delta frames, is it possible to > get Theora to give me the rects modified since the last frame and upload > only that? I'm stuck with theora_decode_YUVout which decodes a full frame > (yep, old API, but th_decode_ycbcr_out in the new one seems to be similar)Sounds like a cool idea, but also like an R&D project that has a good chance of failing. Shayne Wissler
On Thu, Aug 13, 2009 at 7:04 AM, Gabriel Gambetta<gabriel at mysterystudio.com> wrote:> I was thinking - since videos are encoded as delta frames, is it possible to > get Theora to give me the rects modified since the last frame and upload > only that? I'm stuck with theora_decode_YUVout which decodes a full frame > (yep, old API, but th_decode_ycbcr_out in the new one seems to be similar)libtheora doesn't expose this, but you could of course hack it up to do so. Start with the list of block coding modes. Unless your video has completely static backgrounds I'm not sure it would help much though. If you look, for example, at the telemetry images toward to bottom of http://web.mit.edu/xiphmont/Public/theora/demo3.html, even when most of the coded blocks are in groups there are still some outliners which make it hard to find efficient dirty rectangles without just breaking the image into hundreds of 16x16 tiles, which will be slow in the other direction. As you mention, doing yuv2rgb in simd will help a lot. Also, many video cards have support for doing the conversion on the GPU, either through texture extensions like Shayne suggested, or directly with a shader program. This can help a lot because the yuv data is subsampled in the chroma planes and is half the size of the rgb version, so there's less memory bandwidth needed to upload it to texture memory. Also, if you're doing flat playback even very old cards often have support for yuv2rgb and scaling as a 2d graphics 'overlay' feature. HTH, -r