So now we've come to the last step, and it's a big one. Just
how good are these Mp3s? Have we set up the encoder correctly?
How do they sound compared to the CD they were made from? The
obvious way to find out is with a listening test. But it's not
as simple as it sounds.
There are a number of audio listening tests which are in common
use. People have created computer programs to automate them, or
gathered together with stereo equipment to run them manually.
They've been done on big and small scales. The vast majority of
them work on the same principal though, the idea that if any audible
difference exists between two pieces of audio gear (speakers,
amplifiers, file types, etc,) that it should be a simple matter
of playing the same song through one and then the other and asking
people which sounded better. This is the classic A/B test. There
also exists an ABX test which goes one step further by playing
one, then the other, and then randomly one of the two again and
asking which one it was. The goal in either case is to determine
if the equipment sounds the same or not, and hopefully pick the
superior equipment. These tests are best performed blind, meaning
that the people doing the listening aren't told anything about
which gear is in use at a given time, so that they don't do anything
like subconsciously favour the more expensive brands. And even
that can be made better still by making sure the person recording
the results doesn't know which is which either, lest they ask
leading questions or unintentionally adjust what people said to
fit what they think is the right answer. So it sounds great, A/B
and ABX testing seem like a foolproof way to determine once and
for all if there's any quality gains to be had between various
gear or audio formats.
But I've recently started to have my doubts.
Let me be clear about something. It is not the blind aspect which
I object to, not at all. This blinding is very important. The
unfortunate fact is that we humans tend to see what we're expecting
to. This is called confirmation
bias. We look for a few key facts that seem to support what
we expect to find, and we don't look for others that might disprove
it. Time and time again it has been shown that if you wrap one
bottle of wine with a cheap label and another identical bottle
with a more expensive brand's label, that both the ordinary person
and the enthusiast alike will almost always say the more expensive
wine tastes better. The enthusiast will even use the full breadth
of their vocabulary to tell you all the ways in which the two
wines differ. Likewise, Even the most obviously bogus audiophile
scam products will all get at least some good reviews from happy
customers who truly believe their money was well spent. $1500
power cords infused with genuine good vibes and fairy dust are
described as causing records to sound much better than standard
$5 cables, in precisely the same way as the relabeled bottles
of wine. If it costs that much, it MUST be good, right?
No, it's not the blinding which worries me. It's the fact that
all traditional audio tests, be they A/B, ABX, sighted or blind,
all hinge on two assumptions. They all assume that the average
untrained person's perception is good enough to notice things
like exactly how crisp a cymbal crash in a song is, or how muted
a violin, and that the human memory is good enough to store all
that information away so that it can be recalled perfectly when
it comes time to listen the second (or third) time. And this strikes
me as a very poor assumption.
On the first point, we tend to be very bad at thinking about
things which we lack terms for. And most of us don't have a very
thorough vocabulary for audio characteristics. We simply don't
work with audio on that level very often in day to day life. So
most of us don't know what to look for when it comes to quantifying
recording quality, beyond the broad strokes. The obvious solution
then is to devote years of ones life to learning about acoustics,
about the way the ears pick
up sound, how audio compression engines work, and to develop
language of descriptive words to rival that of the snootiest
of wine tasters. All this would certainly make changes in audio
quality a lot more quantifiable. But it would clearly also take
a hell of a lot of work. Far more, I think, than the average person
wants to invest in something simple like putting good music on
Secondly, the human mind does not seem to be very good at accurately
comparing a current experience to the memory of a previous one.
How many times have you heard someone complain about how much
better life was when they were a kid? When men were real men,
when kids respected the elders, and when crime wasn't an issue?
Or listened to 3 different people give 3 totally different reports
to a police officer about what a thief looked like? All our memories
are distorted and coloured by the wet meat machine between our
ears, especially when we try to recall something we weren't paying
much attention to when we experienced it. And I'm not sure that
there's much which we can do to improve our memories in this regard.
It seems then that it might not be safe to trust ourselves with
traditional listening tests. So I sought to devise a test which
didn't rely on memory. It would instead be a purely perceptive
test, to see if the listener noticed a change in audio quality
at the moment one occurred. It might not tell us exactly what
the difference is, but it will at least let us know if the change
is significant enough for the human ear to notice, rather than
just the human memory.
The idea came to me when a friend of mine insisted she could
hear the difference between 320kpbs Mp3s and ones of even slightly
lower bitrate. Having researched the format and experimented with
it myself, I strongly suspected that she couldn't, but my logic
alone wasn't going to change her mind. She knew she could and
wouldn't be told otherwise. Frustrated with the mind's ability
to find a difference whether it was there or not, I thought as
I often did about how nice it would be to conduct a more immediate
listening test. One where I could switch between two audio sources
of different quality at any point in a song, with no delays or
changes in things like timing or volume. This last part would
be critical, since our brains recognize a slight increase in volume
as an increase in clarity. I imagined a pair of carefully cued
audio files and a slider to electrically pan between them. If
there was a significant difference in quality it should be heard
as a change of some sort as the slider was moved. But I realized
that I'd never be able to do something like that in hardware since
the two audio sources would need to be aligned not merely to the
second but to the very sample, else their waveforms would be offset,
which would cause them to alternately reinforce and cancel each
other out instead of neatly overlapping as they were panned between.
Then it hit me. I could use that very property to do something
even better, with commonly available software instead of mixers.
I would draw on a technique which was almost as old as recorded
music itself. Let me see if I can explain.
Sound is a wave. It flows down wires and through the air as an
oscillation. When a speaker is producing sound, it's pushing and
pulling its little membrane back and forth, corresponding to the
rising and falling of the recorded wave. This creates little bursts
of higher and lower pressure air which in turn move our ear drums
back and forth, and we hear sound. Simple stuff, right?
A common mistake when hooking up home or car stereos is to reverse
the polarity of one of the speakers relative to the other. Then
when one speaker is pushing, the other is pulling, and the two
fight each other. This means that any sound which is present on
both stereo channels will be much quieter than it should be as
the two speakers cancel each other out. The effect is sometimes
called Out Of Phase Stereo, or OOPS. Not good for music, but some
people discovered they could deliberately set up their stereos
that way to help them analyze music. Vocals, you see, are usually
present in both channels of pop music, with the instruments on
one or the other. By removing only the stuff present on both using
Out Of Phase Stereo, you can hear the instruments alone without
Okay, that's pretty cool, but how does it help us to compare
the quality of Mp3s? Well, what if instead of removing the sounds
common to both stereo channels of a single audio file, we removed
the sounds common to a pair of audio files? Say, an uncompressed
file and the Mp3 version? Then we would get to hear only the bits
which were different. And then we can do something REALLY clever.
I took a CD off my shelf and ripped the same track from it 3
times. First as an uncompressed wave file, then as a 320 bitrate
Mp3 and finally with my preferred VBR profile.
I loaded the raw wave file and the 320kbps version into Audacity,
one below the other.
Then I went to the start of the files and zoomed in. The Mp3
compression process had padded the start and finish of the file
subtly, so that it no longer lined up with the original. But unlike
analog mixing boards, audacity's natural habitat is the sample.
I measured exactly how many samples late the Mp3 was (2,256 in
this particular case) and removed that much silence from the start
of the file.
Ta dah! The raw wave and the 320kbps Mp3 were now in perfect
waveform alignment. The holy grail had been found.
Now came the fun part. I selected the whole of the 320kbps Mp3
version of the song and inverted it. This made it so that where
ever the waveform had previously gone up (corresponding to the
speaker pushing out) it now went down (making the speaker pull
in) and vice versa.
Then I selected the inverted Mp3 and the original wave and told
audacity to mix and render. This caused it to combine them together
into a single file. But because I had inverted the Mp3 version
first, any aspect of it which was 100% identical to the original
wave version would now be its exact opposite, thus canceling it
out and producing silence. The resulting file contained only the
difference between the source files.
Let me see if I can explain all this a little better. If one
audio file is telling the speaker to move outward by precisely,
say, 1.5 millimeters, and the other is at that same moment telling
the speaker to move inward by precisely 1.5 millimeters, then
the net result is that the speaker sits still and makes no sound.
What I've done here is mixed together two files which were almost,
but not quite identical. The difference between them was the Mp3
compression. That means where ever the files were still identical,
they canceled each other out and make silence. But where ever
one file had a sound that was missing from the other (because
the Mp3 encoder had taken it out to save space,) that sound would
remain in the resulting output. In other words, I had created
a file which contained only the sounds normally removed by turning
a wave into an Mp3.
And what a beautiful file it was. For the first time I could
see exactly what was being altered during compression. I could
hear the sorts of things which were being lost. And you can too,
by clicking here.
But it was about to get even cooler.
I left this difference file open in audacity and loaded the 320kbps
Mp3 in beneath it a second time. I did the sample measuring trick
again to make sure they had perfect waveform alignment. Then I
hit play. Both tracks played together, sounding just like the
source CD. And mathematically, it WAS the source CD. The compressed
file plus the difference file equaled the source file. And by
muting the difference file at any point in the song, I got to
hear, instantly and without distortion or changes in volume, what
the quality lost by compressing a wave file to a 320kbps Mp3 was.
And what loss did I hear?
The Mp3 file sounded exactly like the original to my ears. I
was performing the opposite of a blinded test, a sighted one,
to give myself every possible chance to notice if there was a
difference, and I couldn't find one. I even listened to the difference
file on its own so I knew what sort of sounds had been most affected,
but still I couldn't pick out a change in quality as I toggled
the difference file on and off.
(Had I found a difference with a sighted test but not a blinded
one, I would have been highly suspect of the result, and gone
on to do further blinded ones to see if I could reproduce the
result. My goal in performing it initially sighted was to make
sure I knew when to expect a change, and what the change might
sound like, to make sure I was looking in the right places.)
Want to try it for yourself? Listen to this
and see if you can spot the points where the quality changes.
No? Try again, listening for an increase in quality at 5 seconds
which disappears at 10. If you can hear a difference, let
me know. So far no one has.
Next I repeated the test using my more compressed VBR Mp3 instead
of the 320. As one would expect given the smaller file size, the
difference file did show more removed audio than the 320 one
had. If my friend was correct, this would be enough to cause a
small but detectable change in playback quality when I toggled
the difference file. You can try it for yourself by listening
If a VBR Mp3 averaging 233kpbs isn't enough to recreate CD audio,
the quality should improve slightly at the 5 second mark, then
drop off at 10. But I could not hear a change. No slight drop
in volume, no increase in noise, no dulling of delicate notes.
Once more I listened to the difference file to give myself an
unfair advantage, and even when I knew what to expect I couldn't
find a change. Did that mean there really wasn't a significant
difference? Did that suggest my VBR scheme sounded as good as
an uncompressed CD?
I wanted to make sure I wasn't falling victim to confirmation
bias before I made a claim like that. To make sure that I hadn't
bungled a step somewhere I went back to the CD and ripped the
test track once more at only 56kbps. This time there would be
no way to miss the transitions between the source and the Mp3
if the method worked. I loaded it into Audacity, aligned it, inverted
it, mixed it, loaded it again. It worked! This time the difference
file was huge! There was a ton of audio which had been removed.
When I started playing the 56 bitrate Mp3 the song sounded dreadful.
But when I unmuted the difference track, it suddenly sounded just
as good as the CD once more, as I had predicted. So the method
DID work! You can hear it for yourself here.
Okay, so what's the bottom line?
Well, to start with, this method is only useful for very specific
circumstances.It would be useless for comparing vinyl to CDs,
for instance. And an encoder that did nothing more than take the
input and invert it would show up as deleting the entire track
using this method. Just because a sound is different between the
input and the output doesn't necessarily mean it was removed,
the encoder may have changed it in some other way. So the difference
track might actually exaggerate the trimming to the Mp3 files
for all I know. Take it with a grain of salt. Despite this, it
remains true that the difference track plus the encoded track
exactly equals the source track, so that part we can trust.
And secondly, the use of Audacity and this slightly convoluted
method is far from the only way, or even the best way to perform
this test. I simply chose it because it was a free tool which
I already had and knew how to use. In theory one could use a simpler
program which just switched audio sources at the click of a button.
But it would have to be smart enough to align the files to the
sample, and it would have to be engineered very carefully to avoid
making popping sounds or anything like that at the transition.
I found that Audacity was surprisingly good at remaining transparent
through these tests, so that's what I use.
Those limits stated, I feel the test as it stands has produced
some interesting and plausible results. It suggests that my 22
year old ears, covered by a pair of Sennheiser HD-280s, which
have been plugged into a Soundblaster Audigy 2 ZS, are unable
to hear the difference between a particular Sarah McLachlan song
which has been compressed as 230 VBR and one which has been left
as a wave. This in itself isn't terribly interesting, but I've
been repeating the experiment with other songs since different
types of music will respond differently to compression. And I've
been sending comparison files to other people in an effort to
see if any of them can pick up a difference I missed. So far though
it's looking good for Mp3. I have yet to find a song which sounded
different in the VBR Mp3 profile vs the source CD.
This suggests to me that there's likely no advantage to using
CBR 320kbps Mp3s for daily listening, especially since they take
up 40% more space on average than high quality VBR Mp3s. More
generally, it also means that people who insist Mp3s sound worse
than CDs are probably fooling themselves, except in extremely
rare cases. Especially if they're in their 40s with worn out ears.
I also discovered that I could go surprisingly low on the bitrate
and still retain tolerable sound quality. It seems that the Mp3
encoders have improved dramatically in the past decade. No longer
do we have to suffer a sea of audio errors if an Mp3 is "only"
192kbps. In fact I dare say that dropping to 128kbps on a modern
encoder would be fine if the files were being played on an iPod
or similar, where listening conditions are poor.
Another interesting fact. When I tried comparing a very heavily
compressed Mp3 of 56kbps to the CD, I found that my ears tended
to notice when the quality increased, but often missed it when
the song got worse. I'm unsure what the implications of this are,
but it might be useful to know when pinpointing a desired quality
All in all, I feel it's time to give up the notion that Mp3s
are only good enough for puddingheads with $2 speakers. My tests
suggest they can be good enough for serious enjoyment on serious
Last modified September 30th 2013
Can you hear me, Major Tom?