At the February 2000 NAMM show, Cakewalk invited representatives from Microsoft and over 30 major hardware and software vendors to the first annual "Windows Professional Audio Round table." The purpose of the round table was to work together towards solutions that will make Windows the ideal platform for professional audio. This paper presents the results of the round table discussions.
Latency: What's Required vs. What's Possible
The most important performance criterion of a DAW is latency, i.e., the delay between when the software changes a sound and when that change is actually heard. Latency effects the overall responsiveness of a DAW's user interface to input gestures as well the applicability of a DAW for live input monitoring. The present trend towards software synthesis also highlights the influence of latency on the playability of a software-based instrument. Unfortunately, latency happens to be exact place where external factors influence performance the most.
How low must latency be? A skilled audio engineer can hear subtle differences in the "feel" of a drum recording simply by moving a microphone 1 foot, a distance equaling 1 msec of delay. Studies have shown that humans can perceive inter aural (stereo) differences as low as 10 usec (0.01 msec). Obviously, lower is better.
What's the best we can deliver? Despite claims by hardware and software vendors, no one has ever scientifically measured audio latency in a DAW. However, we do know for certain that there are 3 hard limitations that put a fixed lower bound on the latency that a host application can deliver.
- The DAC's and ADC's in a sound card have some delay inherent to them. Typical converter latency is in the range of 30-50 samples, which represents about 1-1.5 msec of delay at 44.1 kHz.
- The host operating system (Win9x, NT or 2k) will introduce interrupt latency, a delay between when a hardware interrupt occurs and when the lowest levels of the driver receive control. Interrupt latency is a fundamental measure of an operating system's performance and is not a factor that is open to optimization.
An analysis of interrupt latency in Windows was presented at OSDI'99 by Erik Cota-Robles and James P. Held. Their results show that the best case latency on Win9x or WinNT is about 1 msec, and that the worst case (on Win9x) can be as long as 100+ msec.
- The scheduler in the host operating system leads to unpredictable timing when an application (user mode) thread needs to be woken up for audio streaming tasks. With clever design this can be made more predictable, so for argument's sake we'll neglect this limitation.
When you consider the effects of converter latency and interrupt latency, it becomes clear that the lowest latency you can ever hope to achieve under Windows is about 2 msec. In reality, the influence of system load on interrupt latency and the scheduler will lead to inconsistent performance (manifested by random audio drop-outs), so in most practical cases the audio latency will be much higher.
For real-world usage scenarios, minimizing the uncertainty that arises under heavy system loads is tantamount to reducing audio latency. Since WinNT (and Win2k) have tightly bounded interrupt latencies, these platforms should be better suited to the task of audio streaming. We believe an obtainable target for audio latency under Win2k is 5 msec, even under heavy system loads.
Software and Hardware Development
Observations and Conclusion
Software vendors face a daunting set of challenges. Customers demand the lowest latency possibly, but delivering this requires knowledge of O/S issues that are neither well documented nor well understood. As demonstrated by the WavePipeT technology introduced in Cakewalk Pro Audio 9, it is possible to get low latency out of standard drivers, but this is still very much dependent on the quality of the driver.
Hardware vendors are challenged even further. On the Windows platform, there are a variety of driver models to consider: VxD, NT drivers and WDM. On top of these drivers live a multitude of user-mode APIs: MME, DirectX, ASIO and EASI.
Audio hardware vendors are writing too much code to support too many driver models and too many APIs. As a result, driver performance is suffering overall.
Consider the steps a hardware vendor takes when planning which drivers to build:
- Choose a user-mode API: MME, DirectX, ASIO or EASI.
- Choose a target operating system: Win9x, WinNT.
- Develop the kernel mode component (.VxD or .SYS), utilizing the Microsoft DDK for the chosen operating system.
- Develop the user-mode component (.DRV or .DLL) component to support the API.
Observation 1: Too many drivers
Supporting both Win9x and WinNT requires writing 2 different kernel mode drivers (a .VxD and a .SYS driver). On top of that, supporting MME, ASIO and EASI requires writing 3 different user-mode drivers.
- In order to support all popular platforms and API's, hardware vendors must implement, test and support 5 different audio driver components.
Observation 2: Not enough kernel mode support
Some vendors never leave kernel mode to do their processing. Obvious examples of this are the WDM KMixer and DirectMusic software synthesizers. Furthermore, DAW vendors need the option of moving more of their mixing and DSP into kernel mode.
- User-mode APIs such as DirectX, ASIO or EASI do not provide adequate support for kernel mode processing.
Observation 3: The term "driver" is misunderstood
Referring back to the 4 steps of driver development, we see that all paths of driver development lead through the DDK. Only the DDK provides the tools for interfacing to hardware in a standard way. The majority of interfacing to hardware must be done in kernel mode, within a VxD or SYS file.
- A true "driver" runs in the kernel and is packaged as a VxD or SYS files. Technologies like MME, ASIO and EASI are merely user-mode APIs, not drivers.
The best way to manage driver complexity while providing adequate support for future technologies is to provide a single kernel-mode audio driver. A single kernel-mode driver is in fact the hallmark of the Win32 Driver Model.
The Win32 Driver Model (WDM)
WDM is Microsoft's vision of simplifying driver development, providing a unified driver model for both consumer and commercial O/S's, and providing a migration path towards future O/S offerings. In this section we shall examine how close WDM comes to achieving this ideal, and the relevance of WDM to audio streaming.
WDM works across the Win9x and Win2k platforms. A driver written to the WDM specifications will be source-code compatible on all Win9x platforms (starting with Win98SE) and Win2k. Most drivers are even binary compatible across these platforms. This implies that hardware companies can develop a single kernel-mode driver, period.
Copyright 2000- Microsoft Corporation.
WDM provides considerable leverage to audio applications. It provides an audio mixing and resampling component that runs in kernel mode, known as "KMixer." KMixer facilities multiclient access to the same hardware and provides the illusion of limitless audio streams that are mixed in realtime.
Due to its layered architecture, WDM also provides automatic support for the MME and DirectX APIs. A vendor simply needs to implement a WDM mini-port driver, and other layers in the system's driver stack provide MME and DirectX support.
Unfortunately, this power comes at a price. Due to internal buffering KMixer nominally adds 30 msec of latency to audio playback streams. (At present, Microsoft does not provide a method to allow host applications to bypass KMixer.)
WDM is relatively new to most audio hardware vendors. It is much more like the NT driver model than the Win9x VxD model. This means that the many hardware vendors who haven't yet developed NT drivers will have to learn new technology in order to support WDM.
(Note that difference between the NT driver model and VxD driver model is the primary reason for the current dearth of NT drivers, including NT ASIO drivers. Building a driver for a new platform requires learning the DDK for the platform, regardless of which API the driver is meant to support or how crisply defined the API happens to be.)
We see the newness of WDM as an opportunity for the Windows audio community at large. We are at a point in time where many hardware and software vendors are simultaneously taking their first serious look at WDM. At times when everybody is "on the same page" it is easiest focus on working together towards building the best solutions for everybody.
Another opportunity presented by WDM is its synergy with Win2k. We have already suggested that Win2k is a more desirable operating system for audio because of its well-bounded interrupt latency. Win2k also has the benefit of support for advanced file management such as asynchronous disk I/O. Since Win2k has great potential for professional audio, and WDM is the driver model for Win2k, a large trends towards the adoption of WDM seems inevitable.
Is WDM the Answer?
The following table places desirable traits of an idealized audio streaming solution alongside the capabilities of WDM.
Supported by WDM?
Requires only a single, well-defined, kernel mode component?
Yes. One kernel mode component provides MME and DirectX support automatically.
Works equally well on Win9x and Win2k?
Yes. WDM drivers are source (and often binary) compatible across these 2 operating systems.
Easy to implement?
Yes. WDM drivers are designed in a simple mini-port model, where the vendor needs only to provide details that are specific to their hardware. This model removes much of the excess "glue code" in building a driver.
Free of political or legal baggage?
Yes. WDM has no baggage. It is a standard technology built into Windows.
Provides < 5 msec latency for all hardware and software?
No. KMixer adds 30 msec of latency.
Resolving the Limitations of WDM
Fortunately, the WDM has a provision for driver extensions. The WDM DDK provides a function named IoRegisterDeviceInterface through which any kernel mode driver can "advertise" that it has custom behavior. A user-mode application can then query the system for registered drivers, and communicate directly with them via the user-mode DeviceIoControl function by sending I/O controls (IOCTLs).
This mechanism suggests a way to work around the KMixer issue in WDM. If all hardware vendors can agree on a common set of IOCTLs, and expose these IOCTLs as standard WDM registered device interfaces, then we have an ideal solution that all software vendors can use.
A further benefit of IOCTLs is that drivers use them as the underpinnings of their support for ASIO or EASI user-mode APIs. So a kernel-mode solution based on IOCTLs will be compatible with existing applications that support ASIO or EASI. In fact, any hardware vendor who is considering ASIO or EASI support under Win2k is going to have to build a mechanism like this anyway.
To summarize, an IOCTL solution provides low-latency audio across the board for all Windows audio applications. A solution based on WDM plus IOCTLs has all of the desirable properties mentioned elsewhere in the paper:
- Single kernel-mode driver component
- Usable within kernel mode as well as user mode
- Cross platform on Win9x and WinNT
- Easy to implement
- Very low latency
- Free of political or legal baggage
The Windows Professional Audio Roundtable
At the February 2000 NAMM show, Cakewalk sponsored the first annual Windows Professional Audio Roundtable. Among the attendees were representatives from NemeSys, Microsoft, Bitheadz, Emagic, IBM, IQS, Propellorheads, MIDI Manufacturers Association, Sonic Foundry, Sound Quest, Steinberg and Syntrillium. AMD, Creative/Emu, Crystal, Digigram, DAL, Echo, Gadget Labs, Guillemot, Lynx, Roland, Terratec and Yamaha represented the hardware community.
At this meeting, Cakewalk proposed using IOCTL extensions to WDM and enlisted the aid of the audio hardware community in creating an actual deliverable design. The initial response to our proposal was overwhelmingly positive. Cakewalk fully intends to see this project to fruition, and openly invites any and all companies to participate in the design.
Figure 1. Driver Componentry on Win2k/WDM
This figure shows the componentry given today's status quo of API and driver proliferation. Each component provided by the hardware vendor is drawn with a double outline. A total of 5 components is required.
Working from the top down, we see that a host application has a choice of 4 application program interfaces (APIs) by which it can communicate with the audio hardware. Each API is implemented within its own user-mode component, typically a 32-bit DLL.
To communicate with lower level drivers, each user-mode DLL uses an I/O control (IOCTL) interface. In the case of MME and DirectSound, these IOCTLs are defined by the WDM kernel-streaming interface. In the case of ASIO and EASI, these IOCTLs are left open-ended, which means each vendor implements their own "private" version.
The IOCTL interface talks down to a kernel mode driver. If MME, DirectSound, ASIO and EASI is desired, then 3 conceptually different kernel model elements are required. Finally, the hardware abstraction layer, a.k.a., HAL, strictly controls all hardware access.
Figure 2. Simplified Driver Componentry
This figure illustrates the reduced componentry that is possible with a shared IOCTL interface to the WDM mini-port. The hardware vendor needs only supply a single component with an extended IOCTL interface.
Because the single driver component is still a WDM mini-port driver, host applications will still enjoy access to the Windows APIs such as MME/wave and DirectSound, enabling support for wave editors, games and legacy applications.
For high performance low-latency streaming, the host application communicates directly with the adapter driver via the proposed open IOCTL extensions to WDM. Applications which need to talk to hardware the ASIO or EASI APIs can continue to do so by implementing a thin "wrapper layer" on top of the IOCTL interface.