Kenneth Arnold
2000-Sep-13 20:04 UTC
[vorbis-dev] end-user mode for a moment (side-by-side tests)
Smack my curiosity, but I encoded some songs in Vorbis mode 2 and tried to be able to distinguish the difference between that and the uncompressed WAVs. (*smacks self*) but here's what I noted: It's actually kind of hard to tell the difference :) (and I consider myself to have a decent set of ears, though not anywhere near the best) I got my accuracy to about 90%, but I couldn't figure out what it was. Finally I figured out that it was some of the high-range in a few spots that hit with less -- brillience is the word that comes to mind -- than the uncompressed. Yeah this is 128k so what should I really expect, and compressed against uncompressed so almost no fault at all found here, but could the psycoacoustic model be tuned any? Maybe if somebody could assemble a "test kit" that a lot of people could use and try to tune the model to what they thought sounded best, then the results could be averaged? Or do we have it on higher authority that the psycoacoustics are the best they could be? (I am reminded of Linus Torvalds' announcement for 2.4.0-test2 on l-k back when I was subscribed.) Wow. Not bad. Now gotta try against MP3. Dang, does that mean I have to grab notlame or bladeenc? Darn... I didn't even install them when I reinstalled last, because it seems that I have my audio compression needs taken care of. Back to developer mode: Thanks, Ralph, for the Ogg todo sent a while ago. I've only now got to really studying it and looking at what to do. Looks like video is it. So a couple questions for the list: 1. Where's the Tarkin source anyway? 2. I am aware that Tarkin uses wavelets. MPEG uses object detection and motion estimation. What other methods are out there? Does anybody know of any new, cool methods for compressing video? Or, failing that, does anybody know [of] anyone who does? 3. I have looked over the MPEG document Marshall said to look over a while ago (about varying levels of detail). I think that's a good idea (in fact that was a goal even before I read that). See what you all think about my personal codec wishlist (from a starting-from-scratch viewpoint, even though it probably won't work out that easily): * Three levels: packet, frame, and field. Packet holds all the stuff that should naturally go together and is otherwise worthless when split up. (I'm thinking streaming here). Field is collection of packets that describes part of a frame. It may pull information from a lot of sources, e.g., raw image data, data from frames earlier / later (with an arbitrarily adjustable window), "scratch" area, whatever. It should have the capability to embody vector graphics, arbitrary transforms, effects, etc. even if the encoder can't pick them out from a source video (if it could, that'd be great, but that gets very compex). Maybe field == packet; I need to think some more about that. But by "part of a frame", I mean a level of detail as opposed to a region (although region might be useful also). Object descriptions are hierarchical in importance by nature; the codec should take advantage of this. Coding should be done residually, i.e., take as much information about the frame as can be embodied relatively simply, and repeat with what's left over. The amount of complexity per independent block should be adjustable over a wide range. Each block iteration (hierarchical level) could be assigned a priority, and when streaming, the transport could choose to only send the blocks above priority x. Different methods could be used to formulate these blocks, possibly even different methods for different blocks describing the same area. This would allow motion estimation to be used for entire objects, and e.g. wavelets for details about the object. The definitions and implementations of the residue and coding areas are left for later, to allow for more than enough flexibility (I hope). * Every frame should be able to reference back to frames before it, i.e., no MPEG's I frames (except maybe at the beginning of the stream). Okay, so maybe there should be I-frames, but use them more carefully. Possibly a lossless compression could be made from them... but back to the main issue here: a typical viewer will be watching the video for at least 100 megabits before [s]he even starts to worry about quality as opposed to content. So I-frames can be very sparse. The tradeoff is more redundancy in the diff frames. Each diff frame should transmit the diff, plus some data that the viewer should know if it's been watching since the last I-frame. This would allow streaming to be able to take advantage of scene similarity without worrying too much about the consequences of lost data. Possibly the redundant data could have a temporal component attached also, so when the video is saved to disk after streaming, it could be moved to the proper place where it should have been first introduced and then removed as much as possible to keep redundancy to a minimum on a fixed medium (key point: the stream is not the compressed video. They work together but both can be modified to hold the same or similar data in a more optimal manner). Another key point: there's a lot you can tune here (amount of redundant data transmitted, frequency of I-frames, etc.). More flexibilty. * VBR of course. But since streaming often works best when bitrate is constant (TCP windows, if streaming over TCP), allow the redundant data to be filled in whenever the data size is otherwise small. * Scratch pad to save previous data. e.g. if scene is switching between two talking heads, should save data associated with one when switching to other. Key point is that maybe viewer didn't catch that old data; maybe send it before stream starts playing, or put it in the redundant frames. First sounds nice if you're not multicasting; second is more suited for broadcasting. * Assume viewer knows everything about the stream you sent, then either the viewer could ask (unicast better again) or the streamer could just resend anyway (multicast) the missing data. Spewing a lot to myself above, and I really didn't mean to spew that much, but chew on it and tell me what you think. That's the product of probably about 15 minutes of mostly continuous thought that is very likely disjointed and missing some key information still locked somewhere in my head, so don't take it as written in anything but sand sprinkled in tide pools. It's also 11:00 PM local time, so I may have gone insane and not known about it. The bit of judgement in me that hasn't gone to sleep yet is telling me that this is a good place to stop. Kenneth PS - I'm going to really like reading that when I'm more awake. It'll be fun. --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Ralph Giles
2000-Sep-13 23:14 UTC
[vorbis-dev] end-user mode for a moment (side-by-side tests)
On Wed, 13 Sep 2000, Kenneth Arnold wrote:> Thanks, Ralph, for the Ogg todo sent a while ago. I've only now got to really > studying it and looking at what to do. Looks like video is it. So a couple > questions for the list:Glad it was helpful. If anyone's interested, I've put up a version at http://snow.ashlu.bc.ca/ogg/todo.html> 1. Where's the Tarkin source anyway?Jack's still sitting on it? The wavelet part wouldn't be hard to re-create I think. A good exercise anyway (where have you heard that before?) Just look for a basic tutorial on the web. The quantization is probably trickier.> 2. I am aware that Tarkin uses wavelets. MPEG uses object detection and > motion estimation. What other methods are out there? Does anybody know > of any new, cool methods for compressing video? Or, failing that, does > anybody know [of] anyone who does?Aren't wavelets new? Especially as a 2+1 dimensional transform. If you want a larger space for the encoder to play in, Monty has pointed out Steve Mann's work on chirplets. My understanding in that direction pretty much stops at a hunch it might by fruitful. :-) http://wearcam.org/chirplet.html Note that including a series of frames (time) in the transform domain is a form of motion compensation, and a sophisticated one. Hope that helps, -r -- giles@ashlu.bc.ca --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jelle Foks
2000-Sep-14 05:30 UTC
[vorbis-dev] end-user mode for a moment (side-by-side tests)
See my in-line comments. Kenneth Arnold wrote:> > Smack my curiosity, but I encoded some songs in Vorbis mode 2 and tried > to be able to distinguish the difference between that and the uncompressed > WAVs. (*smacks self*) but here's what I noted: > > It's actually kind of hard to tell the difference :) (and I consider myself > to have a decent set of ears, though not anywhere near the best) > > I got my accuracy to about 90%, but I couldn't figure out what it was. Finally > I figured out that it was some of the high-range in a few spots that hit with > less -- brillience is the word that comes to mind -- than the uncompressed. > Yeah this is 128k so what should I really expect, and compressed against > uncompressed so almost no fault at all found here, but could the psycoacoustic > model be tuned any? Maybe if somebody could assemble a "test kit" that a > lot of people could use and try to tune the model to what they thought > sounded best, then the results could be averaged? Or do we have it on higher > authority that the psycoacoustics are the best they could be? (I am reminded > of Linus Torvalds' announcement for 2.4.0-test2 on l-k back when I was > subscribed.) > > Wow. Not bad. > > Now gotta try against MP3. Dang, does that mean I have to grab notlame or > bladeenc? Darn... I didn't even install them when I reinstalled last, > because it seems that I have my audio compression needs taken care of. > > Back to developer mode: > > Thanks, Ralph, for the Ogg todo sent a while ago. I've only now got to really > studying it and looking at what to do. Looks like video is it. So a couple > questions for the list: > > 1. Where's the Tarkin source anyway? > > 2. I am aware that Tarkin uses wavelets. MPEG uses object detection and > motion estimation. What other methods are out there? Does anybody know > of any new, cool methods for compressing video? Or, failing that, does > anybody know [of] anyone who does? > > 3. I have looked over the MPEG document Marshall said to look over a while > ago (about varying levels of detail). I think that's a good idea (in fact > that was a goal even before I read that). See what you all think about my > personal codec wishlist (from a starting-from-scratch viewpoint, even though > it probably won't work out that easily): > > * Three levels: packet, frame, and field. Packet holds all the stuff that > should naturally go together and is otherwise worthless when split up. > (I'm thinking streaming here). Field is collection of packets that > describes part of a frame. It may pull information from a lot of sources, > e.g., raw image data, data from frames earlier / later (with an arbitrarily > adjustable window), "scratch" area, whatever. It should have the capability > to embody vector graphics, arbitrary transforms, effects, etc. even if the > encoder can't pick them out from a source video (if it could, that'd be > great, but that gets very compex). Maybe field == packet; I need to think > some more about that. But by "part of a frame", I mean a level of detail > as opposed to a region (although region might be useful also). Object > descriptions are hierarchical in importance by nature; the codec should > take advantage of this. Coding should be done residually, i.e., take as > much information about the frame as can be embodied relatively simply, and > repeat with what's left over. The amount of complexity per independent > block should be adjustable over a wide range. Each block iteration > (hierarchical level) could be assigned a priority, and when streaming, the > transport could choose to only send the blocks above priority x. Different > methods could be used to formulate these blocks, possibly even different > methods for different blocks describing the same area. This would allow > motion estimation to be used for entire objects, and e.g. wavelets for > details about the object. The definitions and implementations of the > residue and coding areas are left for later, to allow for more than > enough flexibility (I hope). > * Every frame should be able to reference back to frames before it, i.e., > no MPEG's I frames (except maybe at the beginning of the stream).If there are too many dependencies upon 'previous data', such as what happens when you send/store I-type image data only very occasionally, then you will have very slow or difficult seeking, channel zapping, etc.> Okay, > so maybe there should be I-frames, but use them more carefully.If 3d transforms are used, then there is not much need for something like a I/P/B-type frame concept, because you're looking at muti-frame data coefficients in the transformed domain. Here, the depth in time of the 3d transform is similar to the 'I-frame frequency' in 2d-transform coding. the I-P-B frame types are a direct result of the current 2d transform coding methods using predicive coding in the time domain. Back in the old days, image compression method even did predictive coding in the pixel domain, but when they stepped over towards transform coding, then there was no need to keep doing that. The only place where prediction remains is on the boundaries of the transforms: in 8x8 DCT coding (MPEG, JPEG, H.26x), this is at the DC DCT coefficients plus in the time domain. In NxMxQ 3d-wavelet coding, prediction will only help at the edges of the pixels and the group of frames that are transformed as a whole.> Possibly > a lossless compression could be made from them...Lossless compression can be made from any compression method where the residual entropy is sufficiently low. Lossless compression doesn't require I-frames.> but back to the main > issue here: a typical viewer will be watching the video for at least 100 > megabits before [s]he even starts to worry about quality as opposed to > content.Unless the viewer is receiving the stream over a 56k POTS modem or similar. Even on 512kbit ADSL that is still more than 3 minutes.> So I-frames can be very sparse.I'de hate to be able to seek only to 3-minute or more intervals or wait up to three minutes after each seek because the decoder needs to reconstruct sufficient 'history'. Also I'd hate to be able to zap through channels at only one channel per three minutes.> The tradeoff is more redundancy in the diff frames.Not completely. In your proposal all difference frames are changing the reference frames, because each decompressed difference frame can be a reference frame. In that case you have the problem of accumulated errors. Especially in transform coding, where various implementations of decoders may not be bit-true equal (due to various decoding environments: processors, hardware (read: differences in rounding, optimizations, efficiency and available data types)). After each difference frame that is used as reference frame, the reference available decoder deviates a bit more from the reference available in the encoder, resulting in increased differences in the reconstructed images.> Each diff frame should transmit the diff, plus some > data that the viewer should know if it's been watching since the last > I-frame. > This would allow streaming to be able to take advantage of scene > similarity without worrying too much about the consequences of lost data. > Possibly the redundant data could have a temporal component attached also, > so when the video is saved to disk after streaming, it could be moved to > the proper place where it should have been first introduced and then > removed as much as possible to keep redundancy to a minimum on a fixed > medium (key point: the stream is not the compressed video. They work together > but both can be modified to hold the same or similar data in a more optimal > manner). Another key point: there's a lot you can tune here (amount of > redundant data transmitted, frequency of I-frames, etc.). More flexibilty. > * VBR of course. But since streaming often works best when bitrate is constant > (TCP windows, if streaming over TCP), allow the redundant data to be filled > in whenever the data size is otherwise small.If the bitstream can occasionally have a higher bit-rate than the transmission medium, this results in latency (due to buffering). Dropping frames is not a good solution here, because that is nothing more than very bluntly reducing the VBR ceiling, which can better be done inside the coding algorithm.> * Scratch pad to save previous data. e.g. if scene is switching between two > talking heads, should save data associated with one when switching to other.AFAIK, MPEG4 solves that by separating object descriptions and image structure. In other words: in MPEG4, not all known and/or previously known objects must be displayed at all times. This allows an encoder to 'keep' some objects across scene switches.> Key point is that maybe viewer didn't catch that old data; maybe send it > before stream starts playing, or put it in the redundant frames. First > sounds nice if you're not multicasting; second is more suited for > broadcasting.It's all dependant on the application, many applications won't accept the latency and other problems you get if you trade everything off against maximum compression. The Ogg Video codec should be able to produce the perfect stream for each application, but not every Ogg Video stream has to be perfect for each application. Hence, keep all that parametrizable, and keep as much of the details outside of the standard and codec, let the application decide which parameters tickle it's sweetspot. I think a video stream format is best kept simple: KISS (keep it simple, stupid).> * Assume viewer knows everything about the stream you sent, then either the > viewer could ask (unicast better again) or the streamer could just resend > anyway (multicast) the missing data. > > Spewing a lot to myself above, and I really didn't mean to spew that much, > but chew on it and tell me what you think. That's the product of probably > about 15 minutes of mostly continuous thought that is very likely disjointed > and missing some key information still locked somewhere in my head, so don't > take it as written in anything but sand sprinkled in tide pools. It's also > 11:00 PM local time, so I may have gone insane and not known about it. > > The bit of judgement in me that hasn't gone to sleep yet is telling me that > this is a good place to stop. > > Kenneth > > PS - I'm going to really like reading that when I'm more awake. It'll be fun. > > --- >8 ---- > List archives: http://www.xiph.org/archives/ > Ogg project homepage: http://www.xiph.org/ogg/ > To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' > containing only the word 'unsubscribe' in the body. No subject is needed. > Unsubscribe messages sent to the list will be ignored/filtered.--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Last week I tweaked Tim Wood's changes and created a diff that yields (nearly) out-of-the-box Mac OS X DP 4 compatability, and announced it on the list. As far as I can tell, no one has yet applied the changes to the CVS repository. Could someone with commit access please work with me to get these changes in? You can grab the diff from <http://www.gizzywump.com/vorbis-macosx-diffs> (There are some configure & configure.in changes; if you prefer, you might just apply the configure.in changes and rerun autoconf.) Thanks, -- Richard +-------------------------+ | Richard Kiss | | 140 Locksunart Way #8 | | Sunnyvale, CA, 94087 | | richard@homemail.com | | http://www.ogopogo.net/ | +-------------------------+ --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.