cMP² | CPlay / MP

The Minimum Phase Riddle

Using minimum phase (or intermediate phase) interpolators (aka filters, resamplers, etc.) has the attraction of eliminating (or reducing) pre-ringing artifacts in the time domain. Some DACs offer this option and are regarded as being superior to linear phase. Deviating from linear phase grossly compromises interpolation accuracy. Thus our riddle, why does introducing interpolation error sound better in some cases?

Interpolation Error

Transients undergoing resampling result in pre- and post-ringing noise. This can be seen in the Impulse Response of the interpolator. By design (i.e. summation of the sinc function on either side of the centre point), we can intuitively see that by not processing the right wing of the Sinc function, we remove pre-ringing. By not having the right wing, the approaching transient data in the time domain is not processed thus not producing any pre-ringing.

Figure 1. SoX VHQ measurements from src.infinitewave.ca showing impulse response for linear phase (left) and minimum phase (right).

Ringing occurs outside the audible range, i.e. for 44.1k input, ringing frequency is above 22kHz. An example of resampled output (44.1k > SRC 192k) of a transient (sudden volume increase from -90 to -40 dbFS for 5ms) is shown below:

Figure 2. SRC 192k upsampled output of 44.1k. Transient is a 2kHz input tone with volume increasing from -90 to -40dbFS (for 5ms). Both pre- and post-ringing (i.e. linear phase) is now present (see insets showing ringing "riding" on low level -90dbFS 2kHz waveform) with minor overshoot.

It is argued that pre-ringing (typically less than 2.5ms) preceding the transient is audible. Post-ringing is not audible as such artifacts are overwhelmed by the transient itself, i.e. the loud sound masks lower level post-ringing.

Math used to eliminate such pre-ringing is not simply a crude chop of the right wing. Phase error is introduced. Words like phase-shift, -distortion or -noise are used to describe non-linear phase interpolators. In actual fact, what we have is gross interpolation error. Signal amplitudes in the time domain undergo phase change depending on input frequency. This is seen in the phase response graph:

Figure 3. SoX VHQ measurements from src.infinitewave.ca showing phase for linear (left) and minimum (right).

Whilst the resulting output may produce the same frequency spectrum, i.e. frequency content in the pass band remains unaltered, phase is grossly compromised. The case above shows input frequencies from as early as 4kHz are changed (deviation from straight line) thus affecting critical music information (harmonics) such as tonal decay. Unlike uniform phase shift of 180 degrees (polarity inversion) which is audible and desired to correct polarity errors (either in recording and/or downstream components), non-linear interpolation causes non-uniform phase shift (error increases with input frequency). That means the entire audio stream suffers interpolation error! In the case of linear interpolation, only pre-ringing is added to transients of the audio stream. Post-ringing occurs in both cases at transients.

It sounds better (sometimes). A false paradise?

Such mathematical manipulation is sometimes experienced positively even when the entire audio stream is polluted with non-uniform phase shift. Its rationale for better sound suggests a false paradise.

A closer look at figure 2 shows that pre-ringing is above 22kHz and occurs at relatively high levels (36db lower than transient peak). This is extremely important given that transients by nature are often at maximum (0dbFS) or near maximum levels. Such high level and high frequency noise creates a "perfect storm" for jitter distortion. Its implications are best understood using the following Periodic Jitter analysis:

Figure 4. Periodic Jitter modelled for a wide range of J_pp using a frequency sweep of the audible range and beyond.

Notice that for input frequencies above 22kHz, sideband distortion levels are unacceptably high with DACs sporting jitter performances as low as 150ps J_pp. Periodic jitter frequencies acting on such input will yield audible distortion where J_f is higher then 2kHz. Jitter sideband distortion whose frequency occurs at an offset (given by J_f) to the input frequency will be in the audible range. The presence of ringing results in a noise floor polluted with sideband distortion.

In the time domain, by removing pre-ringing such audible jitter distortion preceding the transient will not occur. Hence, by introducing interpolation error we remove pre-ring associated jitter distortion. Jitter distortion from post-ringing will remain but its undesirable effects are masked by the transient itself.

Conclusion

DACs susceptible to such jitter distortion could sound better when introducing minimum (or intermediate) phase interpolation error. That is, one evil for another is substituted. Consider a DAC with a poorly designed PLL or worse, the absence of PLLs (or a bad USB/SPDIF receiver chip): jitter frequencies above 2kHz are passed through and interfere with the master clock causing high level audible sideband jitter distortion.

Interpolation error is unacceptable. Should its introduction improve sound, it suggests the DAC in use is ineffective in reducing periodic jitter distortion. A poor compromise indeed.

Such DAC deficiencies can be readily tested using a resampler offering phase adjustments such as SoX. Both cPlay and Foobar (with SoX DSP component) provide this.