NVIDIA’s GT200: Inside a Parallel Processor
Over the last 10 years, an interesting trend in computing has emerged. General purpose CPUs, such as those provided by Intel, IBM, Sun, AMD and Fujitsu have increased performance substantially, but nowhere near the increases seen in the late 1980?s and early 1990?s. To a large extent, single threaded performance increases have tapered off due to the low IPC in general purpose workloads and the ?power wall? ? the physical limits of power dissipation for integrated circuits (ignoring for the moment exotic techniques such as IBM?s Multi-Chip Modules). The additional millions and billions of transistors afforded by Moore?s Law are simply not very productive for single threaded code and a great many are used for caches, which at least keeps power consumption at reasonable levels.
At the same time, the GPU ? which was once a special purpose parallel processor ? has been able to use ever increasing transistor budgets effectively, geometrically increasing rendering performance over time since rendering is an inherently parallel application. As the GPU has grown more and more computationally capable, it has also matured from an assortment of fixed function units to a much more powerful and expressive collection of general purpose computational resources, with some fixed function units on the side. Some of the first signs were when DirectX 9 (DX9) GPUs such as ATI?s R300 and the NVIDIA NV30 added support for limited floating point arithmetic, or programmable pixel and vertex shaders in the DX8 generation. The obvious watershed moment was the first generation of DirectX 10 GPUs, which required a unified computational architecture instead of special purpose shader processors that operated on different data types (pixels and vertices primarily). A more subtle turning point (or perhaps a moment of foreshadowing) was when AMD acquired ATI ? many people did not quite realize the motivation was more complicated than simply competing with Intel on a platform level, but in any case, DX10 made everything quite clear.