Cool, thanks for all the great comments. I think we agree now on that the "find mp3 before encoding" feature would not be a good idea to implement in the flac core. As Brian pointed out, it might be a better idea to create a program that automatically checks if a flac might have been an mp3 source. My first suggestion was to use FFT, because I know that 128kbps mp3 have a low-pass filter at 16kHz (Fraunhofer IIS Encoder). The program should not decide whether or not the file is a mp3 based on only that, but it could give an indication on that. Perhaps with people not very familiar with advanced audio tools know how to spot out a mp3, they just want to know if the flac they just bought or downloaded is good or not? Another test, if the FFT test is unclear, is to check the correlation between left and right channel below 3kHz, which should be just about nothing if the source is a mp3 (since mp3 encoders sum L and R below 3kHz at low bitrates). If also doing further testing, and knowing how the mp3 encoders work, it should be fairly easy to determine if a source file might have been an mp3. At least the program would be able to tell if the input file is poor quality? :) J. On Jan 8, 2011, at 12:28 AM, Declan Kelly wrote:> On Fri, Jan 07, 2011 at 02:22:51PM -0800, brianw at sounds.wa.com wrote: >> >> First of all, I am not aware of any official source of FLAC files >> that provide MP3 sourced data. > > Unofficial sources (such as Usenet and that torrent site with the old > fashioned sailing ship as its logo) are much more likely to have FLAC > files that were made from lossy audio. > > And I vaguely remember reading about an illegal download site that > stored all audio in MP3 (at less than 320k) and transcoded on the fly > for all other bitrates and formats, including FLAC and 320k MP3. They > did it to save storage space. > >> However, you should be aware that many modern producers use software >> to create their music, and when the software stores sound clips in >> MP3 format, what you end up with is music that sometimes looks like >> MP3. > > I recently bought the double-CD "Influence" remaster by The Art Of Noise > and some rarer tracks were sourced from MP3 because that was all their > archivist could find. Most of the reissue was direct from analogue tapes > so this wasn't a quick "shovelware" reissue job. > >> it just has to do with the software that was used to create the music >> originally. > > A friend of mine recorded his band's last album on DCC in the mid 1990s > and released it on CD. It sounds horrible; the lossy compression of DCC > is even worse than MiniDisc's ATRAC. I'm sure this CD would fail most > FFT quality tests, as literally everyone who heard it (not just people > with "golden ears" or good sound systems) complained about the quality. > >> In other words, if you try to shut down the FLAC encoder based on an >> FFT, you might have a lot of false triggers! > > I think it's a bad idea for a lot of reasons: checking the source audio > quality should be a job for another tool. Most FLAC users won't need to > check (most of my FLAC files are ripped from original CDs that I own), > and anyone who was trying to fool listeners (or fellow piracy groups) > would either work out how to bypass the check, or (more likely) use an > older version of FLAC. > > And it's not in keeping with the philosophy behind FLAC: one thing that > I regularly say to people who aren't sure about using FLAC is that Josh > designed it with no copy protection support: if it was there, someone > would only crack it, so it is effectively useless. And that's probably > why Apple's ALAC is usually bigger than FLAC for the same uncompressed > audio (and why Apple still don't support FLAC in their products). > > Stopping a pirate from encoding FLAC is similar to stopping a pirate > from ripping a copy-protected CD: it's a challenge to be overcome, and > it will probably take "them" less time to work it out than it took "us" > to build it. And "they" only need to work it out once. Which is why all > copy protection and DRM sucks, for everyone.
On Sat, Jan 08, 2011 at 12:54:01AM +0100, jorgen at anion.no wrote:> > I think we agree now on that the "find mp3 before encoding" feature would not be a good idea to implement in the flac core. As Brian pointed out, it might be a better idea to create a program that automatically checks if a flac might have been an mp3 source.It would be more versatile to check if the uncompressed audio was taken from a lossy source. And you'll be doing that anyway, just using libFLAC to decode it first. With all lossless encoding, we can be 100% sure that the uncompressed audio that we get out was the same as what went in, so only checking a FLAC file limits any tool to only working with FLAC. There's one for Windows called EncSpot, and I just don't remember the names of any of the others (but there are others). I'd suggest grabbing as many of these as possible and especially those with source code available, before reinventing the wheel. Just as home taping (and vinyl bootlegging) resulted in many generations of copies, I'm sure there are MP3 files floating around that were ripped from CD, encoded at a low bitrate, burnt to CD, copied CD-to-CD on audio CD recorders, and ripped again. It might be easy to fingerprint which MP3 encoder was used (and at what settings) for uncompressed source audio, but I'd be really impressed if anyone could analyse an MP3 or other lossy file that had been transcoded more than once.> Another test, if the FFT test is unclear, is to check the correlation between left and right channel below 3kHzYou need to be careful about stereo separation at lower frequencies. Every vinyl lab checks this, and almost every mastering house has warnings about it on their website. Due to the frequency response of vinyl as a medium, bass has to be cut when recording, and boosted at playback. The RIAA standardised the vinyl frequency response curves back in the 1950s or so - before that there were competing systems using variations on the same frequency curve. As stereo vinyl is cut at 45 degrees for mono compatibility (a form of mid side encoding) the difference signal translates into vertical stylus movement. So the needle will jump out of the groove if there is too much separation at lower frequencies. As the human hearing can't really tell direction with lower frequencies, it's not as essential. This same shortcut is why most movie "surround sound" systems have only one sub bass channel. -- -Dec. ---
On Jan 7, 2011, at 15:54, J?rgen Vigdal wrote:> My first suggestion was to use FFT, because I know that 128kbps mp3 > have a low-pass filter at 16kHz (Fraunhofer IIS Encoder). The > program should not decide whether or not the file is a mp3 based on > only that, but it could give an indication on that. Perhaps with > people not very familiar with advanced audio tools know how to spot > out a mp3, they just want to know if the flac they just bought or > downloaded is good or not?What I have found is that audio above 15.8 kHz seems to come and go, probably based upon available bits. While some MP3 may have more of a constant state of nothing at those high frequencies, other MP3 bit rates will have intermittent content. What does seem to be true is that no version of MP3 or MP4/AAC will encode anything above 20 kHz. Even with the CD 44.1 kHz sample rate, there is still some possibilities for frequencies between 20 kHz and 22.05 kHz. Lossy encoding will never encode those frequencies. With either frequency range, though, you're going to have a slight problem detecting absence of frequency information. That's because lossy coding allows noise at all frequencies. So, your FFT is still going to show something above 15.8 kHz, and even something above 20 kHz, especially if the source was 16-bit instead of 24-bit. Thus, you'll need an intelligent threshold, and perhaps adaptive algorithms to detect lossy coding. It's quite normal for some acoustic recordings to have low levels of audio content at higher frequencies, especially if the microphone was located a long distance from the sound source, because of the high-frequency attenuation properties of air given sufficient distance.> Another test, if the FFT test is unclear, is to check the > correlation between left and right channel below 3kHz, which should > be just about nothing if the source is a mp3 (since mp3 encoders > sum L and R below 3kHz at low bitrates). If also doing further > testing, and knowing how the mp3 encoders work, it should be fairly > easy to determine if a source file might have been an mp3. At least > the program would be able to tell if the input file is poor > quality? :)Some encoders drop information below 10 Hz, some do not. I am not aware of mono coding for 3 kHz and below. Perhaps this depends upon whether the Joint Stereo option is being used or not. In any case, I would suggest that very common encoders might not have the properties you're describing. In any case, I don't want to discourage your efforts. I just think it might be harder than you think. For one thing, lossy coding doesn't really drop frequencies as described in most introductory texts, but rather what they really do is allow significant quantization noise to creep in at different frequencies. Thus, your 16-bit CD might be coded at 8-bit or less in some frequencies, where it might actually have the equivalent of 24-bit at other frequencies. Since you generally cannot recode lossy files without going back to the original, I expect this also means that you can't detect everything with an FFT, even if you're running the exact same FFT used by the lossy encoder. It should be a fun learning experience, though! Keep us posted with your findings! Brian Willoughby Sound Consulting
On Jan 7, 2011, at 16:27, Declan Kelly wrote:> It might be easy to fingerprint which MP3 encoder was used (and at > what > settings) for uncompressed source audio, but I'd be really > impressed if > anyone could analyse an MP3 or other lossy file that had been > transcoded > more than once.You may be right, but I actually suspect that it will be just as difficult to detect even just a single generation of MP3 encoding. Because lossy coding involves variable quantization in the frequency domain, that makes it rather difficult to predict a precise criterion for detection.> Due to the frequency response of vinyl as a medium, bass has to be cut > when recording, and boosted at playback. The RIAA standardised the > vinyl > frequency response curves back in the 1950s or so - before that there > were competing systems using variations on the same frequency curve. > As stereo vinyl is cut at 45 degrees for mono compatibility (a form of > mid side encoding) the difference signal translates into vertical > stylus > movement. So the needle will jump out of the groove if there is too > much > separation at lower frequencies.If you're on OSX, then you can grab my free AudioUnit that implements RIAA decoding. It's called AURIAA, and is available at http:// www.sounds.wa.com/audiounits.html> As the human hearing can't really tell direction with lower > frequencies, > it's not as essential. This same shortcut is why most movie "surround > sound" systems have only one sub bass channel.In this case, you have been misled by a common misconception in the consumer audio industry. In actuality, the human hearing system is quite capable of telling direction with lower frequencies, it's merely incompatible with studio mixing. At low frequencies, the human ear+brain system uses time delays to detect direction. This is because low frequencies travel around obstructions like the head without significant volume losses. Thus, it would be impossible to use volume to find the directional source. The brain then detects the leading edge of sounds, and compares the phase or time delay between left and right. When the direction is straight ahead, the time delay is zero. As sound sources move away from directly ahead, the time delay increases until a maximum determined by the speed of sound and the distance between your ears (plus a little extra for the path taken from one ear to the other is not through your head but around it, and that's a slightly longer time delay). However, as the frequency gets higher and higher, it becomes too difficult for the brain to analyze the phase differences, because a high frequency waveform will repeat several times during the time delay, and it become impossible to compare phase when you don't know which cycle matches. Fortunately, high frequencies are very directional - more directional than low frequencies - and thus the volume is attenuated when high frequency sounds bend around an obstruction (like the head or anything else). So, the brain uses volume differences, not time/phase differences, to determine directionality for high frequencies. What's true is that we humans are actually directionally deaf in the midrange, not at lower frequencies. The time/phase technique is most effective at the lowest frequencies, and becomes less effective as the frequency gets higher. The volume/amplitude technique is more effective at the highest frequencies, and becomes less effective as the frequency gets lower and less directional. In the middle, neither technique is effective. I seem to recall that this frequency is around 500 Hz, well below the 2 kHz to 5 kHz range where our ears are most sensitive. The reason why most consumer electronics experts get this wrong is because of the standard techniques use in studio recording. Most music is recorded as multiple channels, e.g., 16, that are each monophonic. These channels are played back through a mixing console, and a simple pan pot is used to artificially place them in a location. Because the pan pot only effects the volume, not the phase difference or time delay, this means that a studio recording is going to have no directionality at low frequencies. But not all recordings are made in a sound proof studio. A simple binaural recording will have plenty of time delay and phase information, just like the real world, and the human hearing system will easily be able to detect direction of low frequency sounds. That is, unless your audiophile salesman has convinced you that you only need one subwoofer, and thus your playback system is compromised. Another factor that is showing up in digital production is the ability to create a 3D mixer, instead of a simple pan pot. CoreAudio and other digital systems allow a monophonic sound to be placed at any position in a virtual 3D sound world, and the DSP will calculate the appropriate time delay and amplitude loss based upon the relative positions of the sound source and the virtual listener. OSX and CoreAudio can even extend this virtual system to include knowledge of the actual placement of speakers attached to your computer, with the DSP automatically calculating the correct time delay and volume for each speaker in your system (whether 5.1, 7.1, 10.2, or more). With such a system for creating sounds, you certainly are not limited by sound studio production limitations of previous decades, and, more importantly, there will be plenty of directional cues in the low frequencies. By the way, the reason surround sound has only one sub bass channel is because it takes very little bandwidth to add one more channel. In actuality, all 5 channels have full sub bass included in their discrete channel. The .1 channel is just a way to add more oomph without taking the full bandwidth of 6 channels. Many surround mixers will place directional sub bass in the 5 channels, but this will only be heard on the best surround systems with more than one subwoofer, or at least with large speakers that can reproduce enough sub bass to be heard. The .1 channel has nothing to do with human directional perception, and everything to do with taking advantage of something that is available at a low bandwidth cost. Brian Willoughby Sound Consulting