What Happens When You Create an MP3?
When a WAV or M4A file is converted to MP3, the encoder performs several steps in sequence. The input is raw PCM audio — uncompressed samples representing air pressure over time. The output is a stream of compressed frames, each covering a few milliseconds of audio.
The pipeline works like this:
- Windowing: the audio is split into overlapping frames of 1,152 samples (about 26 ms at 44.1 kHz)
- Frequency analysis: each frame is transformed from the time domain to the frequency domain using the Modified Discrete Cosine Transform (MDCT)
- Psychoacoustic analysis: the encoder calculates which frequencies are masked (inaudible) in this frame
- Quantization: masked frequencies are removed or given fewer bits; audible frequencies get more bits
- Huffman coding: the quantized data is losslessly compressed using entropy coding
- Bitstream assembly: frame header, side information, and coded audio data are packed into the output
The result: a 44.1 kHz, 16-bit stereo WAV at 1,411 kbps becomes a 320 kbps MP3 — nearly 80% smaller — while sounding virtually identical.
The Psychoacoustic Model
The psychoacoustic model is the core of MP3 compression. It is a mathematical model of how human hearing works, and it determines what the encoder can safely remove. The model exploits three types of masking:
Simultaneous (Frequency) Masking
A loud sound at one frequency makes nearby quieter sounds inaudible. For example, a loud cymbal crash at 8 kHz masks a quiet guitar harmonic at 9 kHz. The encoder detects these masked frequencies and allocates them fewer bits (or zero bits). You would not hear them anyway.
Temporal Masking
Masking also works across time. A loud sound masks quieter sounds that occur just before it (pre-masking, about 5 ms) and just after it (post-masking, about 50–100 ms). The encoder uses this to reduce data during transitions between loud and quiet passages.
Absolute Hearing Threshold
Human ears are not equally sensitive to all frequencies. We hear 1–5 kHz best and are much less sensitive below 100 Hz and above 16 kHz. The encoder removes any audio below the absolute threshold of hearing — sounds so quiet that no human can hear them regardless of other sounds.
Key insight: MP3 does not simply "throw away data." It uses a sophisticated model of human hearing to identify and remove only the audio you cannot perceive. This is why a 320 kbps MP3 sounds indistinguishable from the original in blind tests.
How Bitrate Relates to Quality
Bitrate is the number of kilobits the encoder can use per second. More bits mean fewer compromises:
| Bitrate | What Gets Removed | Audible Result |
|---|---|---|
| 320 kbps | Only truly inaudible content | Transparent — indistinguishable from original |
| 256 kbps | Inaudible + borderline content | Transparent for 99% of listeners |
| 192 kbps | Some partially audible content | Good quality; artifacts rare on consumer equipment |
| 128 kbps | Noticeable compromises | Acceptable for casual listening; trained ears notice loss |
| 64 kbps | Aggressive cuts across all frequencies | Obvious artifacts; suitable only for speech |
The relationship is not linear. Going from 128 to 192 kbps is a huge quality jump. Going from 256 to 320 kbps is barely perceptible. This is because the psychoacoustic model prioritizes the most audible content first — the last bits saved at high bitrates are the least noticeable.
A Brief History of MP3
MP3 — officially MPEG-1 Audio Layer III — was developed at the Fraunhofer Institute in Germany, primarily by Karlheinz Brandenburg. The standard was published as ISO 11172-3 in 1993.
The format went through several milestones:
- 1993: ISO 11172-3 published. MP3 exists as a standard but has no good encoders yet
- 1995: Fraunhofer releases the first MP3 encoder. File sharing begins on university networks
- 1998: LAME project begins as "LAME Ain't an MP3 Encoder" — a patch to improve the reference encoder
- 1999: Napster launches. MP3 becomes the dominant music format worldwide
- 2003: iTunes Store launches, selling AAC files (MP3's intended successor)
- 2017: All MP3 patents expire. The format is completely free to use without licensing
Despite AAC and Opus being technically superior, MP3 remains the most widely supported audio format in existence. Every device, every player, every operating system supports MP3.
Why LAME Is the Best MP3 Encoder
LAME (LAME Ain't an MP3 Encoder) is an open-source MP3 encoder that has been continuously refined since 1998. It is the encoder used inside FFmpeg as libmp3lame, and it is what Convertio uses for every MP3 conversion.
What makes LAME special:
- 25+ years of optimization. The psychoacoustic model, quantization, and VBR tuning have been refined through thousands of listening tests and code improvements.
- VBR quality levels. LAME's VBR V0 through V9 presets dynamically allocate bitrate per frame. V0 (highest, ~245 kbps average) through V9 (lowest, ~65 kbps average) cover every quality target.
- Auto joint stereo. LAME analyzes each frame and automatically switches between mid/side stereo and full stereo encoding, choosing whichever is more efficient. This is why the default mode produces optimal results.
- Gapless playback info. LAME writes encoder delay and padding information into the MP3, enabling seamless track transitions on supporting players.
Our backend: Convertio uses FFmpeg with libmp3lame. When you select VBR, the command uses -q:a (quality level 0–9). When you select CBR, it uses -b:a 320k (constant bitrate). Both go through the full LAME psychoacoustic pipeline.
Generation Loss: Why Re-Encoding Is Bad
Every time you encode audio to a lossy format, the encoder makes decisions about what to discard. If you take an MP3 and encode it to MP3 again, the second encoder discards additional data — including data that the first encoder considered important enough to keep.
This is called generation loss, and it is cumulative:
- 1st encode: original quality (inaudible content removed)
- 2nd encode: slight degradation (borderline content removed that was kept in pass 1)
- 5th encode: noticeable artifacts in complex passages
- 10th encode: clearly audible warbling, frequency loss, stereo collapse
The practical rule: always encode from the original lossless source (WAV, FLAC, or ALAC). If you need a different bitrate, go back to the original and encode again — never re-encode an existing MP3. This applies to M4A (AAC) sources too: convert once to MP3, do not convert the result again.
Common mistake: Converting a 128 kbps MP3 to 320 kbps does not improve quality. The missing data from the 128 kbps encode is gone permanently. You only get a larger file with the same (or slightly worse) quality due to a second encoding pass.