cMP² | CPlay / SoftwareInducedJitter

Background

Jitter is a complex subject but fortunately is well defined in mathematical terms allowing us to analyse, measure and reduce its nasty effects. Both DACs and ADCs suffer from jitter distortion. This is unique to digital audio, i.e. analogue audio is jitter free (but not free from having good interconnects, EMI etc.).

For starters we know there's a well defined end-point where jitter distortion arises. This is located deep inside the DAC chip where the master clock marshals each sample point (both left & right channel) to its "rightful position" in the output analogue voltage. Unfortunately, this "right" position is constantly fluctuating in time causing analogue voltage signal changes out of time (either earlier or later). This is jitter distortion. Its correct measurement can only be determined at the analogue outputs. This is discussed in cMP (Chapter 03).

Jitter measurement made at the DAC's clock input using for example the Wavecrest analyser is misleading. Whilst this measure is correct and important in providing clock jitter data, it is not jitter distortion as defined. What is being measured here is clock jitter as it enters the DAC chip. But this is not where signal coupling takes place that affects actual analogue voltage changes, instead we find this happening deep within the DAC chip. This aspect has large bearing on jitter distortion actually experienced by the listener.

DAC chips are complex integrated circuits where the seemingly pure clock signal entering it suffers damage (for example substrate noise). The quality of remaining DAC input signals (other clocks, control and data) are noise pollution sources that significantly affect jitter. The chip itself consumes power, performs computations and has other complications such as buffers which also contribute towards jitter. Hence measuring jitter distortion can only be done at the DAC's analogue outputs.

It's also important asking the question: how much jitter distortion in audible? Playback jitter levels needs to be below 8.5ps (J_pp RSS, assuming no recorded Jitter!) to be inaudible (if human hearing is capable of 22 bits resolution). A very complex task indeed.

Some vendors design computer audio with the assumption that incoming computer audio bitstream is highly jittered and noisy. The solution applied here involves reclocking the data. Such designs attempt to create a "firewall" against incoming jittered data but has a drawback: jitter levels are that of the device in use (i.e. intrinsic jitter) and any superior audio bit-stream sent is largely wasted.

The Audio Chain

Jitter’s domination ends at the analogue output within DAC chips. Likewise, jitter distortion during recording ends within the ADC chip. That is, during recording no further jitter distortion is added after an analogue signal is digitized. Audio processing applied to this digital data (e.g. dithering or gain) will affect sound quality but such (audio data) processing does not add jitter (as long as everything is kept in the digital domain).

Treatments that act to reduce playback jitter fully apply to recording and vice versa. Clean power supply, no vibrations, maintaining constant temperature, etc. all matters. Software (Operating System together with player/recorder) induces jitter. What is not part of this discussion is audio data deliberately changed by software (either intentionally or not) that affects sound quality. The issue covered here is why the exact same audio data presented to the DAC (i.e. Bit Perfect) and sourced from the same location but streamed through different software and Operating System (e.g. Windows) can result in sound differences.

Before delving into jitter effects of software, it’s important to explore the audio value chain. Background topic above covers devices acting to shield the DAC from incoming jitter distortion. Transformer coupling, buffering and reclocking is done with audio data commonly sourced from USB. Such designs do yield good results but suffer from 96k input limitation and a performance ceiling, namely, its own intrinsic jitter. Any superior signal (with lower jitter) sent to it will have minor benefit. Hence “I don’t hear any differences when making changes to my PC” comments from owners of such devices are common.

A common alternative is audio playback from a HDD using a regular soundcard. Streaming to the DAC (via soundcard connected by PCI, PCIe, Firewire or USB) takes the following path:

HDD (SATA/IDE/RAID) > Chipset > RAM > CPU (software) > RAM > Chipset > Soundcard (XO) > DAC

Network playback offers:

Ethernet > Chipset > RAM > CPU (software+netware) > RAM > Chipset > Soundcard (XO) > DAC

Both HDD and Network playback methods act to impede streaming of audio data: whilst soundcard retrieves audio data, playback software is concurrently reading the next audio data using the same resources (RAM and Chipset). Network playback incurs additional OS software (netware) and device/component overheads.

Memory player’s advantage is avoiding this resource conflict and overheads:

RAM > CPU (software) > RAM > Chipset > Soundcard (XO) > DAC

It also offers a shorter playback chain with fewer devices and associated software. This design is most optimal and lends itself to a direct 192k I2S DAC chip interface avoiding Firewire/USB, SPDIF or AES. It’s also the platform that is most revealing of software changes.

“Headless” PCs is applicable to all options and aims to remove the video/display, keyboard and mouse of the streaming PC.

Understanding Software

The CPU (Central Processing Unit) is the most complex part of the equation and its optimization has profound bearing on sound. “CPU (software)” in the above context includes the FSB & Memory Controller allowing for RAM I/O. Highly optimal systems have CPUs consuming ~50% or less of total power. Less optimal (and most common) setups will have CPUs (together with GPUs) consuming in excess of 80%! Heat generation is yet another indicator wherein poor setups require fan based cooling. Over-clockers resort to expensive water based cooling solutions. Super-computers used liquid Helium or Nitrogen cooling. The holy grail of computing is high temperature superconductivity.

Software (through instructions) control what the CPU does thus dictating how much power, resource arbitration, error handling and signaling (including signal reflection complexities) is needed. Modern CPUs such as Intel’s Core Duo contains in excess of 100 discrete components, e.g. L1 (Data & Instruction) Cache and L2 Shared Cache. Lesser known components are things like GPRs (general purpose registers up to 16 per core), Control Registers, Execution Units, Loop Cache, other front-end components (instruction prefetcher, decoder, etc.) and TLBs (translation lookaside buffers).

At the physical level each software instruction is decoded into one or more µops (micro-ops) that’s understood and processed via specialized Execution Units. Each clock cycle is able to complete 3-4 µops concurrently (i.e. a single core acts like 3-4 cores). Aggressive instruction pipelining is done together with “out of order execution” and macro-fusion (combining µops) to achieve amazing levels of performance and concurrency. Software in execution should be seen as a dynamic ever-changing electrical circuit at work.

The act of launching a program (for example double-click a desktop icon) results in the OS (Linux, Windows, Mac, etc.) loading the program into RAM, creating a new process task (thread) and adding it to the CPU Dispatcher’s list. The CPU Dispatcher allocates a fixed amount of CPU time in a round robin fashion to each and every active thread in the system. Interrupt processing has highest priority (called DPCs) followed by threads which are grouped into priority classes. The requested program eventually gets CPU “airtime” and is able to perform its intended task, e.g. prompt user for a media file to be played.

Given such an elaborate architecture, it makes sense to ensure an optimal playback solution has:

Least amount of active threads (resulting is reduced Dispatcher overheads and ensuring less competition for CPU and RAM)
Maximum available RAM (avoids paging to HDD, fragmentation and maximizes amount of audio data that can be RAM loaded)
Least amount of device intrusion by way of reduced interrupt processing and associated software handling. Besides devices themselves are polluters and must be removed if unneeded

This type of optimization is referred to as “reducing the OS runtime footprint”. Left unchecked this creates an unfriendly environment that has a direct bearing on jitter. One can see this by considering an audio playback program in flight being randomly interrupted due to other active threads and/or device interrupts. Even worse, the CPU dispatcher schedules the program across different CPU cores causing L1 cache misses and unwanted “snoops”. Such unwanted activity causes additional power supply noise, ground noise and signaling overheads which directly impact audio data being streamed to the soundcard. This affects XO stability, hence jitter. Overall output signal quality to the DAC deteriorates. Conversely, a friendly environment seeks to create an ultra low-stress circuit wherein the XO delivers its free-running jitter performance within the DAC chip.

Audio Player

Having an optimal environment creates an ideal platform for revealing an audio player’s impact on sound. Yes sound differences are significant and are readily observed. Every audio player is slaved to the soundcard’s XO. That is, data streaming is a regular event and the XO determines exactly when audio buffer refills are needed. Hence we have a critical timing dependency. A periodic jitter relationship is established (see sensitivity analysis below).

Audio playback at the CPU level is a sequence of software instructions in flight. At the physical level, these instructions are executed at a furious pace that translates into a dynamic electrical circuit. Whilst it’s easy to see a poorly designed circuit and its consequences, the same is not so with software. Poorly designed software cause excessive RAM I/O, intra-core L1 cache snooping, excessive (& expensive) pipeline stalls, cache trashing and are generally inefficient (often a forced outcome as a result of extreme flexibility). Such poor designs add to jitter distortion though electrical interference that destabilise the XO and reduce signal quality. Added power & ground noise, nasty current spikes, excessive signaling stress and timing variations is responsible for this. For example, consider an audio player that causes excessive pipeline stalls. This means a CPU's pipeline is drained and a new one created. The result is a power spike that would impact the XO.

Given that the CPU is such a crucial part of the audio equation and is directly controlled by the OS and player, its optimal usage is essential for best performance. Software when viewed as a dynamic electrical circuit has large implications for resource utilization and its consequences. Reducing CPU stress through software is an art. There are many paths to achieving this and requires a deep understanding of the CPU. Every player unleashes a different circuit causing jitter induced sound differences. Hence sound differences do occur even when the exact same bits are presented to the DAC chip. This is not a new scientific phenomenon – its well understood physics at work.

Periodic Jitter - sensitivity analysis

Changes are readily observed with each new cPlay release. Why such seemingly subtle changes impact sound quality is explained. Modeling (Periodic) Jitter Theory for a wide range of J_pp using a frequency sweep of the audible range and beyond yields the following:

Practical implications are profound as such Jitter Distortion damages sound quality:

Sideband distortion rapidly increases with input frequency, i.e. high frequency sounds are most affected. This is seen in the rapid rate of distortion increase as frequency rises. Critical harmonic sound information in this HF band (delivered by tweeters) is distorted. This has detrimental effects such as veiling, poor transients and poor tonal decay.
Most important insight is the extent to which J_pp must be reduced.
- We see that for high J_pp (150..350ps and beyond) an audio designer is not well rewarded for Jitter reduction efforts, i.e. despite large scale drop in J_pp, distortion levels remain stubbornly high. Modern DACs (including some soundcards) have an excellent noise floor of -118db or better. This implies that jitter distortion arising from HF input (above 5kHz) will be well above such noise floor. For example, an input tone of 10kHz and J_pp at 150ps would yield sideband distortion of -112.6dbFS. Ouch!
  
  Interestingly this explains why some reviewers unexpectantly express disappointment with front-ends offering ~200ps J_pp performance. That is, the noise floor is polluted with excessive sideband distortion.
  
  This may also explain why some vendors have become disillusioned with Jitter and have abandoned it's role as a major source of audible distortion. Sadly, others have yet to understand how to measure such distortion (which can only be done correctly at the analogue outputs).
- Implications for poor setups with J_pp above 150ps suggest sound quality improvements will not be easily experienced. Perhaps this explains why some object to for example RAM size, quality & setup to have sound quality impact.
- Much to our relief, distortion drops rapidly below 150ps J_pp. As can be seen, for each 50ps drop, distortion reduces in ever increasing quantums. That is, we see exponential distortion decay. Consider for example 50ps J_pp, any improvements here will have factors more benefit than with same improvement at higher J_pp. This suggests high quality setups like a fully specified cMP (whose measured jitter performance is 51ps J_pp RSS using foobar) would be significantly better at revealing improvements.
- For the ideal case, jitter performance results in sideband distortion for all input frequencies below the DAC's noise floor. A noise floor of -130dbFS, requires jitter performance below 10ps J_pp.

This sensitivity analysis has important implications for Minimum or Non-linear Phase filters (aka apodizing filters). It's use reflects poor jitter performance in the DAC - see The Minimum Phase Riddle.

Rarely do theory and practice meet so beautifully. The efforts of Julian Dunn who laid the foundations for understanding Jitter and its challenges were truly remarkable.

Conclusion

Whilst every other component in the chain receives optimal treatment why shouldn’t the CPU? Not optimizing the CPU is downright stupid. One achieves good improvement though moderating the CPU’s excessive power consumption (reducing EMI) by under-volting and under-clocking (which also reduces RFI). Its optimization however doesn’t end here. Software is not without consequences. Ignore it at your peril.

Highest performance is achieved when classical notions of DAC as “Master” or “Slave” are superseded with the design principle of creating a partnership wherein both DAC chip and Computer Transport are slaved to the XO. Jitter is tamed when our “string” is shortest and audio is streamed under minimal stress.