What is Audio Resampling?
When you convert audio from one sample rate to another (e.g. a 44.1 kHz MP3 → 48 kHz WAV for video editing), every single sample must be recalculated on a new time grid. This process is called resampling.
A naïve approach — simply dropping or duplicating samples — creates audible clicks and aliasing. Professional resamplers use mathematical interpolation (typically polyphase FIR filters) to reconstruct a continuous signal from discrete samples, then re-sample it at the new rate. The quality of this interpolation determines whether your audio stays transparent or picks up artifacts.
Key concept: According to the Nyquist-Shannon theorem, any band-limited signal sampled above twice its highest frequency can be perfectly reconstructed. Resampling leverages this theorem — a high-quality resampler can change rates with zero audible degradation.
What is SoXr?
The SoXr (SoX Resampler Library) is an open-source, audiophile-grade resampling engine originally developed for the SoX (Sound eXchange) command-line audio tool. It uses an FFT-based polyphase algorithm that produces results virtually indistinguishable from the original signal.
SoXr is used by professional audio software including foobar2000, JRiver Media Center, MPV, and VLC. Convertio.com integrates SoXr through FFmpeg's aresample filter, applying it to every WAV conversion automatically.
| Parameter | Value | What It Does |
|---|---|---|
| Engine | SoXr (CR64) | 64-bit double-precision floating-point computation |
| Precision | 28-bit | ~168 dB signal-to-noise ratio — far beyond audible noise floor |
| Dithering | Shibata | Psychoacoustically-shaped noise that pushes quantization artifacts away from the 1–5 kHz hearing sensitivity peak |
| Anti-aliasing | Automatic | Steep low-pass filter prevents aliasing when downsampling |
SoXr vs FFmpeg's Default Resampler
FFmpeg includes two resampling backends: the default swresample (SWR) and the optional soxr. Here's how they compare:
| Aspect | swresample (default) | SoXr |
|---|---|---|
| Algorithm | Kaiser-windowed sinc (linear phase) | FFT-based oversampled polyphase |
| Internal precision | 16-bit (default) or 32-bit float | 64-bit double (CR64 engine) |
| Aliasing rejection | Good (−100 dB typical) | Excellent (−168 dB with precision=28) |
| Dithering | Triangular (flat spectrum) | Shibata (noise-shaped, less audible) |
| Speed | Faster | Slightly slower (~10–15% more CPU) |
| Passband ripple | Measurable near Nyquist | Negligible |
| Best for | Real-time streaming, video playback | Mastering, archival, distribution |
Bottom line: swresample is optimized for speed and is perfectly fine for real-time playback. SoXr is optimized for quality and is the right choice when you're producing a file that will be kept, distributed, or edited further — exactly what a converter does.
Shibata Dithering Explained
When audio is converted between bit depths (e.g. 32-bit float internal processing → 16-bit WAV output), rounding errors create quantization noise. Dithering adds a tiny amount of noise before rounding to eliminate the more unpleasant distortion patterns.
Not all dithering is equal. Standard triangular dithering (TPDF) distributes noise evenly across the frequency spectrum. Shibata dithering uses psychoacoustic noise shaping to push that noise into frequency ranges where human hearing is least sensitive:
| Dither Type | Noise Distribution | Audibility |
|---|---|---|
| None (truncation) | No noise added | Worst — audible harmonic distortion |
| Rectangular (RPDF) | Flat, random | Removes distortion, flat noise floor |
| Triangular (TPDF) | Flat, uncorrelated | Better — no modulation noise |
| Shibata (noise-shaped) | Shifted away from 1–5 kHz | Least audible — exploits hearing curve |
Why it matters: Human hearing is most sensitive between 1–5 kHz (the Fletcher-Munson curve). Shibata dithering pushes quantization noise into the less sensitive high-frequency region above 10 kHz, making it effectively inaudible even on high-end monitoring equipment.
When Does Resampling Happen?
SoXr is applied automatically to every WAV conversion on Convertio.com, but its impact is most significant in these scenarios:
| Scenario | Example | SoXr Impact |
|---|---|---|
| Downsampling hi-res | 96 kHz FLAC → 44.1 kHz WAV | Critical — prevents aliasing artifacts |
| Music → video rate | 44.1 kHz MP3 → 48 kHz WAV | Important — clean rate conversion |
| Voice downsampling | 48 kHz podcast → 22.05 kHz WAV | Important — preserves speech clarity |
| Same rate conversion | 44.1 kHz MP3 → 44.1 kHz WAV | Minimal — dithering still applies for bit-depth changes |
The biggest quality difference is during downsampling — when the target rate is lower than the source. Without proper anti-aliasing (which SoXr handles automatically), frequencies above the new Nyquist limit fold back into the audible range as distortion.
28-bit Precision: What It Means
SoXr's precision=28 parameter sets the internal computation to 28 effective bits using the CR64 (constant-rate, 64-bit) engine. This translates to approximately 168 dB of signal-to-noise ratio.
For context:
- 16-bit audio has ~96 dB dynamic range
- 24-bit audio has ~144 dB dynamic range
- SoXr at precision=28 computes at ~168 dB — 24 dB below the noise floor of even 24-bit audio
This means the resampling process itself introduces zero audible noise, even for 24-bit masters. The resampler's internal computation is quieter than the quietest sound any real-world recording can capture.
Why not precision=32? Higher precision values increase CPU time with diminishing returns. At precision=28, SoXr already operates 24 dB below 24-bit audio's noise floor — increasing further would be inaudible and impractical. This is the sweet spot used by most professional audio tools.
How Convertio Uses SoXr
Every WAV conversion on Convertio.com runs through this pipeline:
- Upload — your audio file is received over HTTPS
- Decode — FFmpeg reads the source format (MP3, FLAC, M4A, OGG, etc.)
- Resample — SoXr converts to your chosen sample rate and bit depth
- Dither — Shibata noise shaping is applied during bit-depth conversion
- Encode — clean PCM samples are written to the WAV container
- Download — your WAV file is ready
The entire process is automatic. You just pick your target settings (sample rate, bit depth, channels) and Convertio handles the rest using SoXr under the hood. No configuration required, no "quality mode" toggle — every conversion gets the same studio-grade resampling.