I'm new to the mailing list but am interested in picking up a thread from earlier in the month but which I thought had become confusing so I am starting again. I should admit from the beginning that I am a colleague of Alex Brims who started the original thread. The thread in question related to a wav file with an extra two bytes at the end causing a partial sample error in the reference flac encoder. There seemed to be confusion over what was actually wrong with the file but as Brian Willoughby deduced correctly, the file had an odd number of samples in the data chunk and as it was a stereo file, this was incorrect. What I am asking is does this make the file invalid as far as the RIFF\WAVE specification goes? The file had a valid "RIFF", "fmt " and "data" chunks as far as chunk Id's and lengths where concerned and the overall file length is correct. The only issue is that the "data" chunk started and ended with a sample from the same channel. I have read through some of the documentation provided at http://www.ambisonia.com/Members/mleese/file-formats/ and especially the McGill university WAVE specification and the Microsoft/IBM documentation and can't find anywhere that it says that there needs to be an even number of samples, just that they need to be interleaved. I may have missed something along the way in which case the simple answer to the original question is "yes the file is invalid." That's fine with me. The main reason to bring this up is to point out that these files exist and in our experience are quite common, of the hundreds of new wav files we receive a day around 1% seem to have this problem. We receive them from many different suppliers and they seem to becoming more common, which is a bit of a pain and probably down to an update to a particular piece of software. Fixing them is also not particularly complicated, either add a sample or remove the last one and update the relevant chunk information. On the other hand it seems a little punitive to error on a file in this situation. Anyway your thoughts or opinions would be appreciated. If I have totally misunderstood something in putting this mail together then I apologize in advance. Ben
Brian Willoughby
2007-Nov-16 07:12 UTC
[Flac] Re: Odd number of samples in a stereo wave file
This topic is possibly more appropriate on the flac-dev list, since most flac users are not going to be interested in the details of WAVE format errors. Assuming it's ok to keep talking about it here... Ben, you've stumbled upon one of the common shortcomings of specifications. There are often assumptions which are not spelled out, or there are pieces which are vague. In some cases, the various resulting symptoms may appear the same to the end user, but they actually have quite different causes when examined in detail. There are a couple of issues which fall under the generic category of "odd" sizes in RIFF/WAVE files. The first is one that I tried to describe in detail, but turns out not to be an issue with the file in question, although it is a common error. The second issue could probably be described more clearly. 1) At the most basic level of the specification, all RIFF/WAVE chunks must be an even number of bytes. At the time RIFF was designed, there was some efficiency to accessing files on word boundaries instead of byte boundaries - or at least they copied this requirement from AIFF. This has nothing to do with the size of the sample, but there is an interaction. With 16-bit files (e.g. CD) and any number of channels, there will never be a chance of an odd number of bytes. Also, with stereo files, no matter how many bytes per sample, there's no chance of an odd number of bytes. This leaves mono 8-bit and mono 24-bit files as potential problems, if the programmer doesn't understand the specification (which clearly states the requirement that odd-byte- sized chunks must be padded, even if they're the last chunk in the file). 2) Another assumption common to all sampled files is the "frame." For any multichannel file, there must be one sample per channel to form a complete frame. It really doesn't make sense to create a file, call it stereo, and then leave one channel missing on the last frame. I'm not sure whether the specifications spell out what should happen when a frame is incomplete, but there is a strong implication that frames should always be complete. I can't even imagine what kind of programmer would write code which creates partial frames. It would actually be punitive to expect the flac code to expect and adapt to nonsensical WAVE files. If the problem is common, then report the bug to the makers of the programs which produce the bad WAVE files, tell your customers to switch to compliant programs, and/or write your own software which processes WAVE files looking for these kinds of errors and repairs them. If you receive the files via uploads, it would probably be possible to have this step run automatically. Brian Willoughby Sound Consulting On Nov 15, 2007, at 10:37 AM, <ben@yarwood.com> <ben@yarwood.com> wrote: I'm new to the mailing list but am interested in picking up a thread from earlier in the month but which I thought had become confusing so I am starting again. I should admit from the beginning that I am a colleague of Alex Brims who started the original thread. The thread in question related to a wav file with an extra two bytes at the end causing a partial sample error in the reference flac encoder. There seemed to be confusion over what was actually wrong with the file but as Brian Willoughby deduced correctly, the file had an odd number of samples in the data chunk and as it was a stereo file, this was incorrect. What I am asking is does this make the file invalid as far as the RIFF \WAVE specification goes? The file had a valid "RIFF", "fmt " and "data" chunks as far as chunk Id's and lengths where concerned and the overall file length is correct. The only issue is that the "data" chunk started and ended with a sample from the same channel. I have read through some of the documentation provided at http://www.ambisonia.com/Members/mleese/file-formats/ and especially the McGill university WAVE specification and the Microsoft/IBM documentation and can't find anywhere that it says that there needs to be an even number of samples, just that they need to be interleaved. I may have missed something along the way in which case the simple answer to the original question is "yes the file is invalid." That's fine with me. The main reason to bring this up is to point out that these files exist and in our experience are quite common, of the hundreds of new wav files we receive a day around 1% seem to have this problem. We receive them from many different suppliers and they seem to becoming more common, which is a bit of a pain and probably down to an update to a particular piece of software. Fixing them is also not particularly complicated, either add a sample or remove the last one and update the relevant chunk information. On the other hand it seems a little punitive to error on a file in this situation. Anyway your thoughts or opinions would be appreciated. If I have totally misunderstood something in putting this mail together then I apologize in advance. Ben
On 16/11/2007, Brian Willoughby <brianw@sounds.wa.com> wrote:> It would actually be punitive to expect the flac code to expect and > adapt to nonsensical WAVE files.You say "punitive". I say it would be "reliable". One missing byte is a huge burden and nonsensical? People post on this list looking for solutions. They don't want to become experts in the WAV format (including the undocumented parts). They just want to compress their audio without losing any of the original (you know, LOSSLESS). And if some of their original isn't included in the archive, they want an exit code to indicate a problem and not that everything is okay. Imagine if Sony, Pioneer and other hardware player manufacturers were so quick to reject "nonsensical" audio. Can you imagine if your car CD player was so finicky? How do you think real world customers would react? They don't want to hear excuses about format, they just want a product to work. Sorry if that "just work" expectation is too "punitive" for you.> If the problem is common, then report the bug to the makers of the > programs which produce the bad WAVE files, tell your customers to > switch to compliant programs,As I have stated before on this list, that is a completely unrealistic fantasy. I have yet to find a customer who favorably responds to being told to change their tools. Dumping the uncooperative vendor is generally much easier. Customers don't want excuses, they want solutions. And as pointed out before, sometimes recordings get interrupted. If a recording device loses power it may not be able to write an even number of bytes or update the header size. Sending bug reports to the manufacturer won't help. Nor will they help in cases where the hardware is no longer being developed. In the real world, not all WAVs are perfect. That will never change. I have spent many, many hours scripting around the various gotchas that occur when you use flac to archive WAV and need some certainty that the original WAVs will actually be recoverable. The recently added --keep-foreign-metadata option was a big step forward and a Great improvement. In the past Flac failed to generate error codes to indicate problems with WAV files that might prevent correct archiving. As a result much scripting and testing of the archive was required. Recent flac versions are much better (thanks to Josh for the improvements!). Though I doubt I'll drop the exhaustive attempts at verifying correctness.> and/or write your own software which > processes WAVE files looking for these kinds of errors and repairs > them. If you receive the files via uploads, it would probably be > possible to have this step run automatically.For many of us who produce terabytes of audio masters, modifying them is not an option. The potential for introducing flaws is too great. Simply stating "reject them" is not a solution. FL