Benefits of x64 for audio workstations
The x64 architecture is an extension to the current x86 architecture used by Intel and AMD processors. Intel in particular has had other 64-bit platforms in the past (notably the Itanium), but these have been targeted for server and enterprise applications. x64 brings the benefits of 64-bit computing to mainstream desktop and notebook systems.
Before entering the brave new world of x64 users of course must have an x64-compatible processor. Both Intel and AMD are shipping x64 processors today. Later this year Intel will be shipping 64-bit processors in mass quantity. AMD, the originators of x64 technology, have been shipping 64-bit processors since April 2003. What this means is people will have the chance to upgrade to an x64 processor the next time they upgrade.
Windows XP x64 Edition will be the required operating system to take full advantage of 64-bit computing. Microsoft will release this operating system to OEMs (for preloading on new systems) sometime mid year. At present our understanding is that Microsoft will not be making this version available to end users in a boxed edition or download. However, in the meantime Microsoft has a made a public beta of the O/S available as a free download on their web site. In our testing this beta has been exceptionally stable.
The x64 operating system requires 64-bit drivers. After installing the x64 operating system on an x64 workstation, all audio and MIDI drivers must be upgraded to 64-bit. Fortunately many vendors have already made beta 64-bit drivers available. At the time of writing, 64-bit drivers from Edirol, Creative and M-Audio are available. More are sure to come.
The last piece required to start working in the 64-bit realm is a native 64-bit host DAW application. At the time of this writing, the only available host application is SONAR x64 Technology Preview, a free download from the Cakewalk website at www.cakewalk.com/x64. This free technology preview includes a fully functioning 64-bit version of SONAR 4 Producer Edition and includes 64-bit versions of the Sonitus:fx suite, as well as the TTS-1, PSYN, sfz+, nPulse, and Velocity instruments. This provides a rich production environment for evaluating the benefits of the x64 platform. The SONAR x64 Technology Preview will time out on August 31 st, 2005.This x64 architecture includes 2 new enhancements that translate into direct benefits for music and audio software uses: the ability to access more physical memory (RAM), and more internal CPU registers.
More Physical Memory
In current 32-bit processors an application can theoretically utilize a maximum of 2 32 bytes of RAM, or 4 gigabytes (GB). By default, however, the Windows operating system will give a 32-bit application only half as much, or 2 GB. This can be increased to 3 GB by setting a "large address aware" bit in an application, a change which actually requires no other coding changes or special work. So the practical physical memory limit for a 32-bit application is 3 GB.
The x64 architecture extends this limit to 2 40, which is 1024 GB or 1 terabyte (TB). One might ask why isn't this limit actually 2 64 since the processor is now 64-bit. In fact a 64-bit process increases the virtual memory limit to 2 64, but not the physical memory limit. Suffice it to say there will be a few more years before we run into the practical barrier of 1 TB of RAM.
So what does it mean to have this much more RAM? It means that with a true 64-bit native host application like SONAR x64 Technology Preview, that users can store more of their song's data in RAM instead of on hard disk, allowing the software to more quickly access this data. A practical example would be when creating a projects that incorporate loops. SONAR stores audio loops in RAM allowing for real-time pitch shifting and time compression expansion. With the advantages provided by the x64 platform, loop-based projects will benefit from the ability to keep a significantly larger pool of loops resident in RAM. . Another example would be when using large sample sets for projects, users will be able to use more simultaneous sample sets.
With current technological limitations, samplers introduce an interesting wrinkle. Many samplers employ some form of disk streaming technology to work around today's RAM limitations. However, it is difficult if not impossible to fetch data from the disk "just in time" when a user starts playing a note. It's a solvable problem, but the technology that is required to combine instant note-on latency with disk streaming has patent protection (by Tascam, for Gigasampler). An attempt to not accidentally infringe on the patent or a desire to avoid paying licensing fees results in few soft-sampler vendors that actually use disk streaming in its most efficient form.Having the ability to access up to 1 terabyte of RAM makes this problem virtually go away. With sufficient RAM a sampler wouldn't need to stream off the hard disk, it could simply load all the data into RAM and always read it from RAM. This means huge sample libraries will be available to more samplers, which should help drive innovation among software sampler products.
More Registers + Better Design = Better Performance
The second major benefit of the x64 architecture is better performance. This is due to the increased number of registers on x64 processors, as well as the improved design of the floating point unit (FPU) on these processors.
It's helpful to understand the different ways that data is stored and retrieved on a CPU. Different kinds of memory are conceptually "closer" or "farther" from the CPU. This distance determines how fast data can be retrieved from that kind of memory.
Registers are the kind of memory that are closest to the CPU. Data in a register can be accessed the instant the CPU needs it without any delay or penalty. Therefore one key to efficient coding is keeping as much data as possible within registers.
When data isn't in a register, a program must go fetch it from RAM. At this point performance penalties start to apply, but CPUs have design elements to mitigate the penalties. For example, CPUs employ a "cache," which is special memory dedicated to remembering recently accessed data. When a program reads the contents of a location in RAM, the cache will detect if the location was recently read, and if so will quickly give the value back to the program. If the location was not recently read -- a so called "cache miss" -- the CPU must really go out to RAM to get a value, introducing a huge performance penalty.
How big are these penalties? A carefully optimized program can perform an addition or multiplication instruction about once every CPU clock cycle. The penalty for a cache miss can be dozens of clock cycles. During a cache miss the program will pay a performance penalty, doing nothing but waiting for the data to arrive from RAM. During that time dozens of cycles of DSP processing will not occur, degrading performance.
Clearly a processor with more registers (and more cache) will be able to produce more efficient programs. In the case of x64, the CPU has twice the general purpose registers and twice the floating point unit (FPU) registers, compared to x86 processors. And each general purpose register is also twice as wide, 64 bits instead of 32 bits.
For a specific example of why this matters, consider a digital equalizer. The math involved in computing equalization usually involves a processing element known as a "biquad". A biquad is a small code fragment that manipulates 8 numeric values in a series of additions and multiplications. The x86 floating point unit has a total of 8 registers, so even something as simple as 1 biquad will use up all of its registers. Most practical equalizers have multiple biquads, either because they are processing in stereo or they do more interesting filtering. In these cases the DSP code would need to repeatedly load and store values from RAM, slowing it down.
Since x64 processors have double the FPU registers, DSP code for a digital filter can go out to RAM less frequently, thereby boosting performance. Furthermore, on x64 processors all floating point math is computed using Streaming SIMD Extensions technology (SSE/SSE2), which means the DSP code can exploit parallelism in the data, for example by processing stereo streams in parallel.
In our benchmarking of SONAR x64 Technology Preview, nearly all of the test files showed a performance gain when running at 64-bit. Our testing methodology was to start with a system using the same audio hardware and the same test files. The system was set up to dual boot under 32-bit Windows XP or 64-bit Windows XP. We would boot the system in 32-bit mode, load the 32-bit version of SONAR and plug-ins, and measure the performance of the test file. Then we would boot the system in 64-bit mode, load the 64-bit version of SONAR and plug-ins, and measure the performance again.In this kind of testing, when there was a performance gain, it was in the 20%-30% range. Full disclosure: in a very small number of test cases 64-bit performed the same as 32-bit, but it was never worse. This kind of performance gain is huge. If you have a 3 GHz processor, a 30% performance boost makes it feel more like a 4 GHz processor. When was the last time you got a whole GHz for free?
The issue of plug-in and soft-synth compatibility bears some discussion. Existing 32-bit plug-ins cannot be run natively within a 64-bit application. For these plug-ins to get the full benefit of 64-bit processing they must be ported to 64-bit.
The reason for this is that when the processor runs in 64-bit mode it uses an entirely different binary instruction set. The binary codes that would make a 32-bit program run in 32-bit mode would be nonsensical in 64-bit mode. So if your application is 64-bit, everything in its environment must be 64-bit -- plug-ins, soft-synths, drivers, etc.Microsoft has developed an emulation mode that allows 32-bit applications to run on the Windows x64 operating system. This development is a kind of "sandbox" that tricks 32-bit applications into seeing a 32-bit operating system. The amount of overhead in this layer is minimal, but applications that run this way will definitely not see the performance gains of a 64-bit application.
The Best Is Yet To Come
The future holds even more promise for 64-bit platforms. We see the potential for building efficient 64-bit data paths within our mixing engine, in other words, processing using 64-bit floats instead of 32-bit floats. This would put the word length of host based systems beyond that of DSP based systems. Also, both Intel and AMD have announced plans for dual core, 64-bit processors. These would effectively be 2 fully-functional 64-bit processors on the same wafer. This could come close to fully doubling the performance of SONAR. In conclusion, x64 processing is a very exciting new development for audio workstations. For about the same cost as typical processor upgrade you will get 20-30% more horsepower, plus the ability to play more loops and samples from RAM. We haven't seen a quantum jump in performance like this in a very long time. At Cakewalk we are very proud to be on the forefront of this new technology.