Michael Vrable
2004-Jun-18 19:12 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://westfish.xiph.org/pipermail/vorbis-dev/attachments/20040618/e7bee431/attachment.pgp
Michael Vrable
2004-Jun-18 23:06 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
(re-sent) Message-ID: <20040619060621.GA18322@vrable.net> [Wow, the list archiver does a great job of mangling signed messages (multipart/signed MIME type). Re-sending so that this message can make it to the archive, and just in case it was damaged before reaching other subscribers.] The attached patch is intended to fix the bug in vcut where the granulepos on each page is not always set correctly in the output stream. (See, for example, the discussion on vorbis-dev@xiph.org in August 2003, "vcut breaks song index ? XMMS search fails".) It's an issue I came up against while editing the recording of a webcast I helped organize; I created a workable fix then, and I've tried to clean it up now. The patch should be against the most recent version of vcut from svn.xiph.org (not that it seems to be undergoing much active development). This is actually a moderately large rewrite of the process_first_stream and process_second_stream functions in vcut. Now, we keep track of the granulepos associated with every packet in the input stream. This requires some computation on our part, since the Ogg later only gives us the granulepos for the last packet of each page. The old version of vcut made these calculations only for the page where the cut was actually made. (So this will be slower than the old vcut; I haven't made any measurements, but I don't expect it to be too bad. And working, slower code is better than faster broken code.) Previously, if the page boundaries shifted at all when writing, there were no granulepos values to write out for many pages, leading to a broken stream. With this new code, this should never happen. A few other minor changes: if the cut point is right at a packet boundary, then only one packet needs to be shared between the two output streams, not two. Also, I attempt to flush the output stream where necessary--at the end of the stream (maybe this isn't needed, but it seemed safe to do so just in case) and after the first two audio packets if any of the initial samples are to be discarded. I've given it some testing, and haven't noticed any broken streams created yet. But there could still be bugs. One case that I haven't tested yet, and which might be still problematic, is splitting a stream which starts at a positive granulepos (such as audio saved from the middle of a network stream). I tried to stick to the original coding style as closely as I could. (Including what appeared to be 4-space tabs(!).) I'm not sure if I should be putting myself in the copyright section or not; I'm willing to take that out, assign copyrights to Xiph, etc., if needed. It would be nice to see this patch, or some other bug fix, applied to the vcut sources. If there are issues with the patch that should be addressed, please let me know. --Michael Vrable -------------- next part -------------- --- vcut.c.orig 2004-06-18 18:12:45.000000000 -0700 +++ vcut.c 2004-06-18 18:17:21.000000000 -0700 @@ -2,6 +2,7 @@ * a copy of which is included with this program. * * (c) 2000-2001 Michael Smith <msmith@xiph.org> + * (c) 2004 Michael Vrable <vrable@cs.hmc.edu> * * * Simple application to cut an ogg at a specified frame, and produce two @@ -121,104 +122,126 @@ int eos=0; ogg_page page; ogg_packet packet; - ogg_int64_t granpos, prevgranpos; + ogg_int64_t current_granpos = 0; /* granulepos for the current packet */ + int packet_count = 0; /* Number of audio packets seen */ + int flush_needed = 0; /* Set to force a flush of the output stream */ int result; while(!eos) { while(!eos) { - int result = ogg_sync_pageout(s->sync_in, &page); + result = ogg_sync_pageout(s->sync_in, &page); if(result==0) break; else if(result<0) fprintf(stderr, _("Page error. Corrupt input.\n")); else { - granpos = ogg_page_granulepos(&page); ogg_stream_pagein(s->stream_in, &page); - if(granpos < s->cutpoint) + while(1) { - while(1) + result=ogg_stream_packetout(s->stream_in, &packet); + + if(result==0) break; + else if(result==-1) + fprintf(stderr, _("Bitstream error, continuing\n")); + else { - result=ogg_stream_packetout(s->stream_in, &packet); + /* Update current_granpos. The ogg framing layer won't + * set the granulepos for us on every packet--only + * those falling at the end of a page--but we should be + * certain to set the granulepos on every packet we + * write out, since we don't know where the page + * boundaries will end up. */ + current_granpos += get_blocksize(s,s->vi,&packet); + + /* Exception: The first packet ends at granulepos 0, + * unless we were told otherwise. So we _don't_ want + * to advance current_granpos above for the first + * packet. */ + if(packet_count == 0) + { + current_granpos = 0; + } - /* throw away result, but update state */ - get_blocksize(s,s->vi,&packet); + /* If we do have a granulepos in the packet to compare + * against, however, do so. There should only be a + * difference in two cases--when the start of the + * stream is truncated (we should have processed no + * more than two packets so far) or when the end of the + * stream is truncated (which should only occur at + * end-of-stream). If the first case occurs, we must + * flush the output stream after this packet so that + * the start-of-stream truncation is indicated properly + * in the output. */ + if(packet.granulepos != -1 + && current_granpos != packet.granulepos) + { + if(packet_count<2) + { + flush_needed = 1; + } + else if(packet.e_o_s) + { + /* No special action needed. */ + } + else + { + fprintf(stderr, "WARNING: Unexpected granulepos adjustment in input stream; output may not be correct.\n"); + } + + current_granpos = packet.granulepos; + } - if(result==0) break; - else if(result==-1) - fprintf(stderr, _("Bitstream error, continuing\n")); - else + packet.granulepos = current_granpos; + packet_count++; + + if(current_granpos >= s->cutpoint) { - /* We need to save the last packet in the first - * stream - but we don't know when we're going - * to get there. So we have to keep every packet - * just in case. - */ - if(s->packets[0]) - free_packet(s->packets[0]); - s->packets[0] = save_packet(&packet); + s->packets[1] = save_packet(&packet); + /* Set it! This 'truncates' the final packet, as + * needed. */ + packet.granulepos = s->cutpoint; + packet.e_o_s = 1; ogg_stream_packetin(stream, &packet); - if(write_pages_to_file(stream, f,0)) - return -1; + + eos = 1; + break; } + + /* We need to save the last packet in the first stream + * - but we don't know when we're going to get there. + * So we have to keep every packet just in case. + */ + if(s->packets[0]) + free_packet(s->packets[0]); + s->packets[0] = save_packet(&packet); + + ogg_stream_packetin(stream, &packet); + if(write_pages_to_file(stream, f, flush_needed)) + return -1; + flush_needed = 0; } - prevgranpos = granpos; } - else - eos=1; /* First stream ends somewhere in this page. - We break of out this loop here. */ - if(ogg_page_eos(&page)) + if(!eos && ogg_page_eos(&page)) { fprintf(stderr, _("Found EOS before cut point.\n")); - eos=1; + fprintf(stderr, + _("Cutpoint not within stream. Second file will be empty\n")); + write_pages_to_file(stream, f,1); + + return -1; } } } - if(!eos) - { - if(update_sync(s,in)==0) - { - fprintf(stderr, _("Setting eos: update sync returned 0\n")); - eos=1; - } - } - } - - /* Now, check to see if we reached a real EOS */ - if(granpos < s->cutpoint) - { - fprintf(stderr, - _("Cutpoint not within stream. Second file will be empty\n")); - write_pages_to_file(stream, f,0); - return -1; - } - - while((result = ogg_stream_packetout(s->stream_in, &packet))!=0) - { - int bs; - - bs = get_blocksize(s, s->vi, &packet); - prevgranpos += bs; - - if(prevgranpos > s->cutpoint) + if(!eos && update_sync(s,in)==0) { - s->packets[1] = save_packet(&packet); - packet.granulepos = s->cutpoint; /* Set it! This 'truncates' the - * final packet, as needed. */ - packet.e_o_s = 1; - ogg_stream_packetin(stream, &packet); - break; - } - if(s->packets[0]) - free_packet(s->packets[0]); - s->packets[0] = save_packet(&packet); - ogg_stream_packetin(stream, &packet); - if(write_pages_to_file(stream,f, 0)) + fprintf(stderr, _("ERROR: update sync returned 0\n")); return -1; + } } /* Check that we got at least two packets here, which we need later */ @@ -228,11 +251,11 @@ return -1; } - if(write_pages_to_file(stream,f, 0)) + if(write_pages_to_file(stream,f, 1)) return -1; /* Remaining samples in first packet */ - s->initialgranpos = prevgranpos - s->cutpoint; + s->initialgranpos = current_granpos - s->cutpoint; return 0; } @@ -253,15 +276,22 @@ int eos=0; int result; ogg_int64_t page_granpos, current_granpos=s->initialgranpos; - ogg_int64_t packetnum=0; /* Should this start from 0 or 3 ? */ + ogg_int64_t packetnum=1; /* First packet is header, already written. */ - packet.bytes = s->packets[0]->length; - packet.packet = s->packets[0]->packet; - packet.b_o_s = 0; - packet.e_o_s = 0; - packet.granulepos = 0; - packet.packetno = packetnum++; - ogg_stream_packetin(stream,&packet); + /* If the starting granulepos is 0, then we happened to cut right at a + * packet boundary, and we only need to overlap with one packet from the + * old stream. Otherwise, copy the previous two packets from the old + * stream. */ + if(current_granpos != 0) + { + packet.bytes = s->packets[0]->length; + packet.packet = s->packets[0]->packet; + packet.b_o_s = 0; + packet.e_o_s = 0; + packet.granulepos = 0; + packet.packetno = packetnum++; + ogg_stream_packetin(stream,&packet); + } packet.bytes = s->packets[1]->length; packet.packet = s->packets[1]->packet; @@ -271,22 +301,27 @@ packet.packetno = packetnum++; ogg_stream_packetin(stream,&packet); - if(ogg_stream_flush(stream, &page)!=0) - { - fwrite(page.header,1,page.header_len,f); - fwrite(page.body,1,page.body_len,f); - } - - while(ogg_stream_flush(stream, &page)!=0) - { - /* Might this happen for _really_ high bitrate modes, if we're - * spectacularly unlucky? Doubt it, but let's check for it just - * in case. - */ - fprintf(stderr, _("ERROR: First two audio packets did not fit into one\n" - " ogg page. File may not decode correctly.\n")); - fwrite(page.header,1,page.header_len,f); - fwrite(page.body,1,page.body_len,f); + /* We only need to flush the stream if we are starting at a non-zero + * granulepos. */ + if(current_granpos != 0) + { + if(ogg_stream_flush(stream, &page)!=0) + { + fwrite(page.header,1,page.header_len,f); + fwrite(page.body,1,page.body_len,f); + } + + while(ogg_stream_flush(stream, &page)!=0) + { + /* Might this happen for _really_ high bitrate modes, if we're + * spectacularly unlucky? Doubt it, but let's check for it just + * in case. + */ + fprintf(stderr, _("ERROR: First two audio packets did not fit into one\n" + " ogg page. File may not decode correctly.\n")); + fwrite(page.header,1,page.header_len,f); + fwrite(page.body,1,page.body_len,f); + } } while(!eos) @@ -313,6 +348,11 @@ current_granpos += bs; if(current_granpos > page_granpos) { + if(!packet.e_o_s) + { + fprintf(stderr, + _("WARNING: Adjusting granpos mid-stream.\n")); + } current_granpos = page_granpos; } @@ -335,6 +375,10 @@ } } + /* Ensure all data is flushed. */ + if(write_pages_to_file(stream,f, 1)) + return -1; + return 0; }
John Morton
2004-Jun-18 23:20 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
On Sat, 19 Jun 2004 14:12, Michael Vrable wrote:> The attached patch is intended to fix the bug in vcut where the > granulepos on each page is not always set correctly in the output > stream. (See, for example, the discussion on vorbis-dev@xiph.org in > August 2003, "vcut breaks song index ? XMMS search fails".) It's an > issue I came up against while editing the recording of a webcast I > helped organize; I created a workable fix then, and I've tried to clean > it up now.*Applause* I know I'll be giving this patch a spin as soon as I can. Thanks, John
Michael Vrable
2004-Jun-19 10:12 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
I also realized not long after sending the patch that this patched line in process_second_stream: ogg_int64_t packetnum=1; /* First packet is header, already written. */ should really read: ogg_int64_t packetnum=3; /* Header packets have already been written. */ (Though I actually don't think this value affects the output bitstream at all.) --Michael Vrable
Michael Smith
2004-Jun-20 18:41 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
On Saturday 19 June 2004 12:12, Michael Vrable wrote:> The attached patch is intended to fix the bug in vcut where the > granulepos on each page is not always set correctly in the output > stream. (See, for example, the discussion on vorbis-dev@xiph.org in > August 2003, "vcut breaks song index ? XMMS search fails".) It's an > issue I came up against while editing the recording of a webcast I > helped organize; I created a workable fix then, and I've tried to clean > it up now. > > The patch should be against the most recent version of vcut from > svn.xiph.org (not that it seems to be undergoing much active > development).Well, yes. vcut isn't built by default, and I've been suggesting for the past year or so that nobody should ever use it. It was written as proof-of-concept code, and it successfully proved the concept. It isn't intended for actual use. The fact that it went into vorbis-tools was, in retrospect, a mistake.> I've given it some testing, and haven't noticed any broken streams > created yet. But there could still be bugs. One case that I haven't > tested yet, and which might be still problematic, is splitting a stream > which starts at a positive granulepos (such as audio saved from the > middle of a network stream). > > I tried to stick to the original coding style as closely as I could. > (Including what appeared to be 4-space tabs(!).)They're the only reasonable way to do tabs! :-) (oh, and you missed "80 column lines", which isn't stuck to as closely as it should be in the existing source, but is intended...)> > I'm not sure if I should be putting myself in the copyright section or > not; I'm willing to take that out, assign copyrights to Xiph, etc., if > needed. > > It would be nice to see this patch, or some other bug fix, applied to > the vcut sources. If there are issues with the patch that should be > addressed, please let me know. >Whilst I expect (without having actually looked at it yet) that this patch drastically improves things, I don't really want to do anything to vcut that might suggest to people that it's actually considered _usable_ (because I don't, even with these fixes, due to a) and b) below.) Well... unless you actually make it work properly in all cases - at the very least this requires heavy testing on streams with a) the start cut off, as you described, and b) chained streams. Neither of which currently work. Mike
Michael Vrable
2004-Jun-20 22:53 UTC
[Vorbis-dev] Patch to stop vcut from generating broken streams
<200406211141.51197.msmith@xiph.org> Message-ID: <20040621055300.GA15022@vrable.net> On Mon, Jun 21, 2004 at 11:41:51AM +1000, Michael Smith wrote:> On Saturday 19 June 2004 12:12, Michael Vrable wrote: > > The attached patch is intended to fix the bug in vcut where the > > granulepos on each page is not always set correctly in the output > > stream. (See, for example, the discussion on vorbis-dev@xiph.org in > > August 2003, "vcut breaks song index ? XMMS search fails".) It's an > > issue I came up against while editing the recording of a webcast I > > helped organize; I created a workable fix then, and I've tried to clean > > it up now. > > > > The patch should be against the most recent version of vcut from > > svn.xiph.org (not that it seems to be undergoing much active > > development). > > Well, yes. vcut isn't built by default, and I've been suggesting for the past > year or so that nobody should ever use it. It was written as proof-of-concept > code, and it successfully proved the concept. It isn't intended for actual > use. The fact that it went into vorbis-tools was, in retrospect, a mistake.That said, it _is_ a handy tool to have around. I'm not currently aware of another tool that allows for lossless cutting of a Vorbis stream (I'd love to be pointed to one). It was invaluable for some editing I did at one point, and at least until a good replacement for it exists, it would be nice to have a (mostly) working version of vcut for when it's needed.> > I tried to stick to the original coding style as closely as I could. > > (Including what appeared to be 4-space tabs(!).) > > They're the only reasonable way to do tabs! :-) (oh, and you missed "80 column > lines", which isn't stuck to as closely as it should be in the existing > source, but is intended...)(I generally stick with 4-space indentation but 8-space tabs in code I write. Your code did look a lot better once I realized tabs were supposed to be every four spaces and adjusted my editor accordingly. :-) I mostly tried to stick to 80-column lines, but gave up on some long strings, as I noticed had been done elsewhere. I could try to clean that up a bit more.)> Whilst I expect (without having actually looked at it yet) that this patch > drastically improves things, I don't really want to do anything to vcut that > might suggest to people that it's actually considered _usable_ (because I > don't, even with these fixes, due to a) and b) below.) > > Well... unless you actually make it work properly in all cases - at the very > least this requires heavy testing on streams with a) the start cut off, as > you described, and b) chained streams. Neither of which currently work.I believe, not having tested this, that the current behavior on streams that start with a positive granulepos is to cut at the indicated location as measured against the granulepos values in the stream. So cutting a stream at 35 seconds when the stream starts at 30 seconds will actually cut after only 5 seconds of audio. This may or may not be what the user expects. It shouldn't be too difficult to always measure from the first audio sample contained in the stream. (This change would be made if adding support for chained bitstreams, at least as I imagine it.) With regards to chained streams: I have thus far been treating that as a "don't do that!" case (though I don't think there are any checks for this in the code). To actually handle chained streams properly requires first defining how they _should_ be treated. I'm thinking: Imagine decoding the entire physical input bitstream, concatenating the samples decoded from each logical bitstream. Seek through the specified number of samples within this imaginary output stream, determine at which position within which logical stream in the input that sample corresponds, and make the cut there. Are there other reasonable ways to define this? Here, we are completely blind to the timing information conveyed by streams that start at a positive granulepos; we only care about the number of samples of audio actually encoded in each stream. The granulepos values in the input stream only matter when they indicate that some decoded samples are to be discarded. Complications: - We can't immediately copy the headers from the start of the input stream to the second output stream, since the cutpoint might actually be in a subsequent stream. Not too difficult. - Almost anything can change midway in a chained stream--even the sampling rate. If the cutpoint has been specified in samples, print a warning (but do allow the user sample-precise measures, since this could still be useful). If the cutpoint has been specified in seconds, more logic is needed to count the time properly (we can't simply multiply by the sampling rate at the beginning as is done currently). - There would still be some unhandled cases. Cutpoints within two packets of the start or end of a stream will cause trouble. Such a cutpoint require generating an output stream with only two packets, and I don't believe there's a way to specify how many samples to discard from the start and the end in a stream this short. In this case, print an error message and abort. Possibly permit discarding the short output stream, or recommend how to shift the cutpoint to make the cut possible. - Other issues that I'm forgetting? It might also be worth giving the user the option of adjusting the cutpoint to the nearest packet boundary. I can see this as useful when sample-precise editing isn't necessary. I'm willing to give coding this a shot. It would take a bit longer, but looks doable. (The previous patch was a one-night hack, though that was with the benefit of having previously coded then lost the sources for a similar fix.) Thanks for the comments. I'm open to more suggestions and advice (from you, or from other people with an interest in vcut). --Michael Vrable