Unbeknownst to many, the design process of modern GPU architectures is a long, drawn out process that involves hundreds of engineers, thousands of software architects and a healthy dose of assumption. NVIDIA's newest Kepler architecture is a prime example of this; the core that lies within the GeForce GTX 680's began its life as a rough schematic about five years ago. As they say, Rome wasn’t built in a day but in the GPU world “guestimates” are a way of life because no one really knows exactly where the market (or the competition) will be in a half decade's time. Most of the time, the architectural teams have a good idea of directionality but there’s always a significant amount of risk when it comes to releasing a new GPU core.
At its heart Kepler was conceived as a way to further refine a DirectX 11 and HPC centric approach that began with Fermi. You see, unlike AMD, NVIDIA already had the solid foundation of an existing DX11 architecture to build upon and was able to focus upon rendering efficiency and performance per watt this time around. In many ways Kepler can be considered a kind of “Fermi 2.0” since it still uses many of the same building blocks as its predecessor but as we will see on the upcoming pages, nearly every one of the rendering pipeline’s features have been augmented in some way. More importantly, NVIDIA’s initial offering GK104 / GTX 680 is smaller and more efficient than AMD’s own Tahiti XT.
For the time being the GTX 680 will occupy the flagship spot in NVIDIA’s lineup and with good reason. It boasts 1536 CUDA cores –a threefold increase over the GTX 580- while the texture units have been doubled to 128, matching the HD 7970’s layout. On the other hand, the quantity of ROPs has been dropped to 32 but as with many things in the Kepler architecture, the interaction between certain processing stages and these units has been refined, resulting in better throughput. We can also see that NVIDIA has halved the PolyMorph Engine count. On paper this should lead to a 50% reduction in tessellation performance but the fixed function stages of Kepler have received a thorough facelift, making them substantially more powerful than those in previous generations.
Some of the most noticeable changes here are found in the GTX 680’s clock speeds. The asynchronous graphics and processor clocks have now become a thing of the past with both engines running at a parallel 1:1 ratio. So while the separate clock speeds haven’t necessarily been eliminated, the change has led to a much faster graphics clock of just over 1GHz but the shaders are operating at a cut down speed when compared against many Fermi-based cards.
With the introduction of the GTX 680, NVIDIA is also premiering a new technology which they affectionately call GPU Boost. Learn to love this term because you’ll likely be seeing a lot of it in the coming months. GPU Boost acts like an overdrive gear for the GPU core, allowing it to dynamically increase clock speeds in certain situations where the architecture isn’t fully utilized. We go into further detail about it in a dedicated section but for the sake of this section, 1058MHz should be the minimum Boost speed with different variations above this depending upon the application.
Along with a core clock speed that makes AMD’s “GHz Edition” marketing seem like nothing more than a gimmick, the GTX 680 boasts 2GB of some of the fastest GDDR5 memory around with speeds of 6Gbps. This is paired up with a 256-bit interface which does come as a surprise for a flagship level product but when paired up with the blistering 6GHz clocks, the GTX 680 offers the same memory bandwidth as the outgoing 384-bit GTX 580. Hopefully the additional 512MB of memory allows this card to overcome the high resolution performance limitations of its predecessor. We just can’t forget that AMD’s card still sits atop the market with a staggering 264GB/s of bandwidth on tap.
Having learned early on that adding a massive amount of geometry processing and compute horsepower to a GPU architecture invariably increases die size and decreases overall efficiency, NVIDIA has been able to optimize several aspects of the GK104 core to better fit within the market’s new realities. The result is a TDP of just 195W which undercuts the HD 7970’s supposed 210W power draw and bucks a longstanding trend which had NVIDIA always releasing less efficient cards than AMD.
With a die size of just 294mm2 the GK104 should also be quite inexpensive (when compared against Fermi and Tahiti) to manufacture and NVIDIA’s pricing structure reflects this. Ready for a shock? Instead of carrying on a trend that led to a gradual increase in high end GPU prices, the GTX 680 actually undercuts AMD’s HD 7970 by $50. Not only should this lead to lower costs for the entire graphics card market once NVIDIA cascades the Kepler architecture down into more accessible price points but high level GPU performance just became that much more affordable. But that isn’t to say that the GTX 680 will underperform.
Interestingly enough, NVIDIA isn’t going for a complete knockout punch against AMD’s HD 7970 on the performance front. While the GTX 680 is indeed meant to beat its competitors’ flagship, it is supposed to do so by a significant amount in every game. This may sound completely at odds with NVIDIA’s old mantra of performance at any cost but they believe a focus upon efficiency and cost meshes seamlessly with the current post financial market meltdown realities. Make no mistake about it; the GTX 680 will be the fastest GPU on the planet, but its foremost goal is to run against many people’s preconceptions about NVIDIA’s graphics cards and chart a new course for the GeForce lineup.
At its heart Kepler was conceived as a way to further refine a DirectX 11 and HPC centric approach that began with Fermi. You see, unlike AMD, NVIDIA already had the solid foundation of an existing DX11 architecture to build upon and was able to focus upon rendering efficiency and performance per watt this time around. In many ways Kepler can be considered a kind of “Fermi 2.0” since it still uses many of the same building blocks as its predecessor but as we will see on the upcoming pages, nearly every one of the rendering pipeline’s features have been augmented in some way. More importantly, NVIDIA’s initial offering GK104 / GTX 680 is smaller and more efficient than AMD’s own Tahiti XT.

For the time being the GTX 680 will occupy the flagship spot in NVIDIA’s lineup and with good reason. It boasts 1536 CUDA cores –a threefold increase over the GTX 580- while the texture units have been doubled to 128, matching the HD 7970’s layout. On the other hand, the quantity of ROPs has been dropped to 32 but as with many things in the Kepler architecture, the interaction between certain processing stages and these units has been refined, resulting in better throughput. We can also see that NVIDIA has halved the PolyMorph Engine count. On paper this should lead to a 50% reduction in tessellation performance but the fixed function stages of Kepler have received a thorough facelift, making them substantially more powerful than those in previous generations.
Some of the most noticeable changes here are found in the GTX 680’s clock speeds. The asynchronous graphics and processor clocks have now become a thing of the past with both engines running at a parallel 1:1 ratio. So while the separate clock speeds haven’t necessarily been eliminated, the change has led to a much faster graphics clock of just over 1GHz but the shaders are operating at a cut down speed when compared against many Fermi-based cards.

With the introduction of the GTX 680, NVIDIA is also premiering a new technology which they affectionately call GPU Boost. Learn to love this term because you’ll likely be seeing a lot of it in the coming months. GPU Boost acts like an overdrive gear for the GPU core, allowing it to dynamically increase clock speeds in certain situations where the architecture isn’t fully utilized. We go into further detail about it in a dedicated section but for the sake of this section, 1058MHz should be the minimum Boost speed with different variations above this depending upon the application.
Along with a core clock speed that makes AMD’s “GHz Edition” marketing seem like nothing more than a gimmick, the GTX 680 boasts 2GB of some of the fastest GDDR5 memory around with speeds of 6Gbps. This is paired up with a 256-bit interface which does come as a surprise for a flagship level product but when paired up with the blistering 6GHz clocks, the GTX 680 offers the same memory bandwidth as the outgoing 384-bit GTX 580. Hopefully the additional 512MB of memory allows this card to overcome the high resolution performance limitations of its predecessor. We just can’t forget that AMD’s card still sits atop the market with a staggering 264GB/s of bandwidth on tap.

Having learned early on that adding a massive amount of geometry processing and compute horsepower to a GPU architecture invariably increases die size and decreases overall efficiency, NVIDIA has been able to optimize several aspects of the GK104 core to better fit within the market’s new realities. The result is a TDP of just 195W which undercuts the HD 7970’s supposed 210W power draw and bucks a longstanding trend which had NVIDIA always releasing less efficient cards than AMD.
With a die size of just 294mm2 the GK104 should also be quite inexpensive (when compared against Fermi and Tahiti) to manufacture and NVIDIA’s pricing structure reflects this. Ready for a shock? Instead of carrying on a trend that led to a gradual increase in high end GPU prices, the GTX 680 actually undercuts AMD’s HD 7970 by $50. Not only should this lead to lower costs for the entire graphics card market once NVIDIA cascades the Kepler architecture down into more accessible price points but high level GPU performance just became that much more affordable. But that isn’t to say that the GTX 680 will underperform.
Interestingly enough, NVIDIA isn’t going for a complete knockout punch against AMD’s HD 7970 on the performance front. While the GTX 680 is indeed meant to beat its competitors’ flagship, it is supposed to do so by a significant amount in every game. This may sound completely at odds with NVIDIA’s old mantra of performance at any cost but they believe a focus upon efficiency and cost meshes seamlessly with the current post financial market meltdown realities. Make no mistake about it; the GTX 680 will be the fastest GPU on the planet, but its foremost goal is to run against many people’s preconceptions about NVIDIA’s graphics cards and chart a new course for the GeForce lineup.