thr3ads.net - flac dev - [Flac-dev] Detecting lossy encodes [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Jørgen Vigdal

2011-Jan-07 23:54 UTC

[Flac-dev] Idea to possibly improve flac?

Cool, thanks for all the great comments. 

I think we agree now on that the "find mp3 before encoding" feature
would not be a good idea to implement in the flac core. As Brian pointed out, it
might be a better idea to create a program that automatically checks if a flac
might have been an mp3 source.

My first suggestion was to use FFT, because I know that 128kbps mp3 have a
low-pass filter at 16kHz (Fraunhofer IIS Encoder). The program should not decide
whether or not the file is a mp3 based on only that, but it could give an
indication on that. Perhaps with people not very familiar with advanced audio
tools know how to spot out a mp3, they just want to know if the flac they just
bought or downloaded is good or not?

Another test, if the FFT test is unclear, is to check the correlation between
left and right channel below 3kHz, which should be just about nothing if the
source is a mp3 (since mp3 encoders sum L and R below 3kHz at low bitrates). If
also doing further testing, and knowing how the mp3 encoders work, it should be
fairly easy to determine if a source file might have been an mp3. At least the
program would be able to tell if the input file is poor quality? :)

J. 
On Jan 8, 2011, at 12:28 AM, Declan Kelly wrote:
> On Fri, Jan 07, 2011 at 02:22:51PM -0800, brianw at sounds.wa.com wrote:
>> 
>> First of all, I am not aware of any official source of FLAC files  
>> that provide MP3 sourced data.
> 
> Unofficial sources (such as Usenet and that torrent site with the old
> fashioned sailing ship as its logo) are much more likely to have FLAC
> files that were made from lossy audio.
> 
> And I vaguely remember reading about an illegal download site that
> stored all audio in MP3 (at less than 320k) and transcoded on the fly
> for all other bitrates and formats, including FLAC and 320k MP3. They
> did it to save storage space.
> 
>> However, you should be aware that many modern producers use software  
>> to create their music, and when the software stores sound clips in  
>> MP3 format, what you end up with is music that sometimes looks like  
>> MP3.
> 
> I recently bought the double-CD "Influence" remaster by The Art
Of Noise
> and some rarer tracks were sourced from MP3 because that was all their
> archivist could find. Most of the reissue was direct from analogue tapes
> so this wasn't a quick "shovelware" reissue job.
> 
>> it just has to do with the software that was used to create the music  
>> originally.
> 
> A friend of mine recorded his band's last album on DCC in the mid 1990s
> and released it on CD. It sounds horrible; the lossy compression of DCC
> is even worse than MiniDisc's ATRAC. I'm sure this CD would fail
most
> FFT quality tests, as literally everyone who heard it (not just people
> with "golden ears" or good sound systems) complained about the
quality.
> 
>> In other words, if you try to shut down the FLAC encoder based on an  
>> FFT, you might have a lot of false triggers!
> 
> I think it's a bad idea for a lot of reasons: checking the source audio
> quality should be a job for another tool. Most FLAC users won't need to
> check (most of my FLAC files are ripped from original CDs that I own),
> and anyone who was trying to fool listeners (or fellow piracy groups)
> would either work out how to bypass the check, or (more likely) use an
> older version of FLAC.
> 
> And it's not in keeping with the philosophy behind FLAC: one thing that
> I regularly say to people who aren't sure about using FLAC is that Josh
> designed it with no copy protection support: if it was there, someone
> would only crack it, so it is effectively useless. And that's probably
> why Apple's ALAC is usually bigger than FLAC for the same uncompressed
> audio (and why Apple still don't support FLAC in their products).
> 
> Stopping a pirate from encoding FLAC is similar to stopping a pirate
> from ripping a copy-protected CD: it's a challenge to be overcome, and
> it will probably take "them" less time to work it out than it
took "us"
> to build it. And "they" only need to work it out once. Which is
why all
> copy protection and DRM sucks, for everyone.

Declan Kelly

2011-Jan-08 00:27 UTC

head link

[Flac-dev] Detecting lossy encodes

On Sat, Jan 08, 2011 at 12:54:01AM +0100, jorgen at anion.no
wrote:> 
> I think we agree now on that the "find mp3 before encoding"
feature would not be a good idea to implement in the flac core. As Brian pointed
out, it might be a better idea to create a program that automatically checks if
a flac might have been an mp3 source.
It would be more versatile to check if the uncompressed audio was taken
from a lossy source. And you'll be doing that anyway, just using libFLAC
to decode it first.
With all lossless encoding, we can be 100% sure that the uncompressed
audio that we get out was the same as what went in, so only checking a
FLAC file limits any tool to only working with FLAC.

There's one for Windows called EncSpot, and I just don't remember the
names of any of the others (but there are others). I'd suggest grabbing
as many of these as possible and especially those with source code
available, before reinventing the wheel.

Just as home taping (and vinyl bootlegging) resulted in many generations
of copies, I'm sure there are MP3 files floating around that were ripped
from CD, encoded at a low bitrate, burnt to CD, copied CD-to-CD on audio
CD recorders, and ripped again.
It might be easy to fingerprint which MP3 encoder was used (and at what
settings) for uncompressed source audio, but I'd be really impressed if
anyone could analyse an MP3 or other lossy file that had been transcoded
more than once.

> Another test, if the FFT test is unclear, is to check the correlation
between left and right channel below 3kHz
You need to be careful about stereo separation at lower frequencies.
Every vinyl lab checks this, and almost every mastering house has
warnings about it on their website.
Due to the frequency response of vinyl as a medium, bass has to be cut
when recording, and boosted at playback. The RIAA standardised the vinyl
frequency response curves back in the 1950s or so - before that there
were competing systems using variations on the same frequency curve.
As stereo vinyl is cut at 45 degrees for mono compatibility (a form of
mid side encoding) the difference signal translates into vertical stylus
movement. So the needle will jump out of the groove if there is too much
separation at lower frequencies.
As the human hearing can't really tell direction with lower frequencies,
it's not as essential. This same shortcut is why most movie "surround
sound" systems have only one sub bass channel.

-- 
-Dec.
---

Brian Willoughby

2011-Jan-08 01:24 UTC

head link

[Flac-dev] Idea to possibly improve flac?

On Jan 7, 2011, at 15:54, J?rgen Vigdal wrote:> My first suggestion was to use FFT, because I know that 128kbps mp3  
> have a low-pass filter at 16kHz (Fraunhofer IIS Encoder). The  
> program should not decide whether or not the file is a mp3 based on  
> only that, but it could give an indication on that. Perhaps with  
> people not very familiar with advanced audio tools know how to spot  
> out a mp3, they just want to know if the flac they just bought or  
> downloaded is good or not?
What I have found is that audio above 15.8 kHz seems to come and go,  
probably based upon available bits.  While some MP3 may have more of  
a constant state of nothing at those high frequencies, other MP3 bit  
rates will have intermittent content.

What does seem to be true is that no version of MP3 or MP4/AAC will  
encode anything above 20 kHz.  Even with the CD 44.1 kHz sample rate,  
there is still some possibilities for frequencies between 20 kHz and  
22.05 kHz.  Lossy encoding will never encode those frequencies.

With either frequency range, though, you're going to have a slight  
problem detecting absence of frequency information.  That's because  
lossy coding allows noise at all frequencies.  So, your FFT is still  
going to show something above 15.8 kHz, and even something above 20  
kHz, especially if the source was 16-bit instead of 24-bit.  Thus,  
you'll need an intelligent threshold, and perhaps adaptive algorithms  
to detect lossy coding.  It's quite normal for some acoustic  
recordings to have low levels of audio content at higher frequencies,  
especially if the microphone was located a long distance from the  
sound source, because of the high-frequency attenuation properties of  
air given sufficient distance.

> Another test, if the FFT test is unclear, is to check the  
> correlation between left and right channel below 3kHz, which should  
> be just about nothing if the source is a mp3 (since mp3 encoders  
> sum L and R below 3kHz at low bitrates). If also doing further  
> testing, and knowing how the mp3 encoders work, it should be fairly  
> easy to determine if a source file might have been an mp3. At least  
> the program would be able to tell if the input file is poor  
> quality? :)
Some encoders drop information below 10 Hz, some do not.  I am not  
aware of mono coding for 3 kHz and below.  Perhaps this depends upon  
whether the Joint Stereo option is being used or not. In any case, I  
would suggest that very common encoders might not have the properties  
you're describing.

In any case, I don't want to discourage your efforts.  I just think  
it might be harder than you think.  For one thing, lossy coding  
doesn't really drop frequencies as described in most introductory  
texts, but rather what they really do is allow significant  
quantization noise to creep in at different frequencies.  Thus, your  
16-bit CD might be coded at 8-bit or less in some frequencies, where  
it might actually have the equivalent of 24-bit at other  
frequencies.  Since you generally cannot recode lossy files without  
going back to the original, I expect this also means that you can't  
detect everything with an FFT, even if you're running the exact same  
FFT used by the lossy encoder.

It should be a fun learning experience, though!  Keep us posted with  
your findings!

Brian Willoughby
Sound Consulting

Brian Willoughby

2011-Jan-08 02:00 UTC

head link

[Flac-dev] Detecting lossy encodes

On Jan 7, 2011, at 16:27, Declan Kelly wrote:> It might be easy to fingerprint which MP3 encoder was used (and at  
> what
> settings) for uncompressed source audio, but I'd be really  
> impressed if
> anyone could analyse an MP3 or other lossy file that had been  
> transcoded
> more than once.
You may be right, but I actually suspect that it will be just as  
difficult to detect even just a single generation of MP3 encoding.   
Because lossy coding involves variable quantization in the frequency  
domain, that makes it rather difficult to predict a precise criterion  
for detection.

> Due to the frequency response of vinyl as a medium, bass has to be cut
> when recording, and boosted at playback. The RIAA standardised the  
> vinyl
> frequency response curves back in the 1950s or so - before that there
> were competing systems using variations on the same frequency curve.
> As stereo vinyl is cut at 45 degrees for mono compatibility (a form of
> mid side encoding) the difference signal translates into vertical  
> stylus
> movement. So the needle will jump out of the groove if there is too  
> much
> separation at lower frequencies.
If you're on OSX, then you can grab my free AudioUnit that implements  
RIAA decoding.  It's called AURIAA, and is available at http:// 
www.sounds.wa.com/audiounits.html

> As the human hearing can't really tell direction with lower  
> frequencies,
> it's not as essential. This same shortcut is why most movie
"surround
> sound" systems have only one sub bass channel.
In this case, you have been misled by a common misconception in the  
consumer audio industry.

In actuality, the human hearing system is quite capable of telling  
direction with lower frequencies, it's merely incompatible with  
studio mixing.

At low frequencies, the human ear+brain system uses time delays to  
detect direction.  This is because low frequencies travel around  
obstructions like the head without significant volume losses.  Thus,  
it would be impossible to use volume to find the directional source.   
The brain then detects the leading edge of sounds, and compares the  
phase or time delay between left and right.  When the direction is  
straight ahead, the time delay is zero.  As sound sources move away  
from directly ahead, the time delay increases until a maximum  
determined by the speed of sound and the distance between your ears  
(plus a little extra for the path taken from one ear to the other is  
not through your head but around it, and that's a slightly longer  
time delay).

However, as the frequency gets higher and higher, it becomes too  
difficult for the brain to analyze the phase differences, because a  
high frequency waveform will repeat several times during the time  
delay, and it become impossible to compare phase when you don't know  
which cycle matches.  Fortunately, high frequencies are very  
directional - more directional than low frequencies - and thus the  
volume is attenuated when high frequency sounds bend around an  
obstruction (like the head or anything else).  So, the brain uses  
volume differences, not time/phase differences, to determine  
directionality for high frequencies.

What's true is that we humans are actually directionally deaf in the  
midrange, not at lower frequencies.  The time/phase technique is most  
effective at the lowest frequencies, and becomes less effective as  
the frequency gets higher.  The volume/amplitude technique is more  
effective at the highest frequencies, and becomes less effective as  
the frequency gets lower and less directional.  In the middle,  
neither technique is effective.  I seem to recall that this frequency  
is around 500 Hz, well below the 2 kHz to 5 kHz range where our ears  
are most sensitive.

The reason why most consumer electronics experts get this wrong is  
because of the standard techniques use in studio recording.  Most  
music is recorded as multiple channels, e.g., 16, that are each  
monophonic.  These channels are played back through a mixing console,  
and a simple pan pot is used to artificially place them in a  
location.  Because the pan pot only effects the volume, not the phase  
difference or time delay, this means that a studio recording is going  
to have no directionality at low frequencies.

But not all recordings are made in a sound proof studio.  A simple  
binaural recording will have plenty of time delay and phase  
information, just like the real world, and the human hearing system  
will easily be able to detect direction of low frequency sounds.   
That is, unless your audiophile salesman has convinced you that you  
only need one subwoofer, and thus your playback system is compromised.

Another factor that is showing up in digital production is the  
ability to create a 3D mixer, instead of a simple pan pot.  CoreAudio  
and other digital systems allow a monophonic sound to be placed at  
any position in a virtual 3D sound world, and the DSP will calculate  
the appropriate time delay and amplitude loss based upon the relative  
positions of the sound source and the virtual listener.  OSX and  
CoreAudio can even extend this virtual system to include knowledge of  
the actual placement of speakers attached to your computer, with the  
DSP automatically calculating the correct time delay and volume for  
each speaker in your system (whether 5.1, 7.1, 10.2, or more).  With  
such a system for creating sounds, you certainly are not limited by  
sound studio production limitations of previous decades, and, more  
importantly, there will be plenty of directional cues in the low  
frequencies.

By the way, the reason surround sound has only one sub bass channel  
is because it takes very little bandwidth to add one more channel.   
In actuality, all 5 channels have full sub bass included in their  
discrete channel.  The .1 channel is just a way to add more oomph  
without taking the full bandwidth of 6 channels.  Many surround  
mixers will place directional sub bass in the 5 channels, but this  
will only be heard on the best surround systems with more than one  
subwoofer, or at least with large speakers that can reproduce enough  
sub bass to be heard.  The .1 channel has nothing to do with human  
directional perception, and everything to do with taking advantage of  
something that is available at a low bandwidth cost.

Brian Willoughby
Sound Consulting

Seemingly Similar Threads

Search for more maybe matching threads

flac dev - Jan 2011 - Detecting lossy encodes

[Flac-dev] Idea to possibly improve flac?

[Flac-dev] Detecting lossy encodes

[Flac-dev] Idea to possibly improve flac?

[Flac-dev] Detecting lossy encodes

Seemingly Similar Threads