AMD has been talking about Heterogeneous System Architecture or HSA for what seems like ages now but, with the launch of their Kaveri APUs, those plans are finally coming to fruition. Kaveri doesn’t represent a dramatic departure from previous generations though. It is simply another stepping stone, though a significant one, towards what AMD hopes will be a user and developer environment which embraces their approach.
In order to understand what makes Kaveri special, learning the basics of HSA is essential. In a nutshell HSA is an effort to leverage the potential CPU and graphics horsepower within AMD’s APUs by properly routing parallel and serial workloads towards the resources best able to process them. You see, the x86 cores excel in serial and task parallel scenarios while data parallel workloads can be handled much more efficiently by the GPU’s multiple compute cores. Since an Accelerated Processing Unit combines both x86 cores and a dedicated graphics subsystem, it’s perfectly suited for both situations. The challenge has always resided in developing a synergy between these two seemingly disparate elements. That’s where Kaveri comes into the equation.
Kaveri is based off of GlobalFoundries’ new 28nm Super High Performance (SHP) process node which represents a dramatic shift away from the 32nm APUs of Trinity and Richland. The move to what AMD calls an “APU optimized” 28nm process has allowed them to retain the dual module, quad core x86 computing layout while significantly expanding Kaveri’s feature sets and incorporating a more capable GPU through higher transistor density. All told, one of these new APUs will weigh in at 2.41 billion transistors spread across a die area of 245mm² delivering up to 856 GFLOPS of combined performance.
Unfortunately, 28nm SHP has some shortcomings. It is a relatively immature process node so leakage is slightly above where AMD wanted it to be. This meant sacrificing clock speeds at higher TDP levels but its affect on mid to entry level parts will be minimal at most.
One benefit of AMD’s 28nm approach is that it avoids the stacked transistors of Intel’s 3D lithography technology. While this does tend to increase the die are of these APUs versus their Haswell competitors, heat dispersion will be much easier which should lead to more efficient cooling with today’s heatsinks.
What you really need to know about Kaveri boils down to two words: Steamroller and GCN. Steamroller represents the latest iteration of AMD’s Bulldozer microarchitecture and includes several optimizations for better concurrent data throughput. Don’t expect titanic improvements over Piledriver but it does bring some much-needed single thread performance boosts to the table.
The addition of GCN or Graphics Core Next is a key element here; and not only because the GPU component takes up a whopping 47% of Kaveri’s available die space. It boasts significant benefits over the older VLIW4-based cores within Trinity and Richland and in this iteration even incorporates the “GCN 2.0” features we saw on the new R9 290-series Hawaii GPUs.
In keeping with current market trends, AMD hasn’t designed Kaveri for enthusiasts but rather targets segments that have further reaching impact. This means their primary design goals revolved around three core principles: create a winning notebook solution, deliver optimal performance per watt and implement a solution that can scale as necessary into other segments. As we’ve already mentioned, this simply fits with the APU’s current evolutionary process.
While history has shown that designing a one-size-fits-all solution is challenging, several of the new desktop parts seem to show many of these targets have already been achieved. If anything, Kaveri will have serious implications as its design trickles down into the notebook and ultra mobile spaces where its power efficiency can be used to the fullest effect.
With so much real estate being reserved for the graphics subsystem, it should go without saying that AMD is looking to maximize GPU computing on Kaveri. Not only is this one of the cornerstones of their HSA initiative but it actually meshes quite well with the current and upcoming market realities. It is also why AMD still believes that a quartet of x86 cores remains a “sweet spot” and doesn’t see the need for giving up valuable die space for an additional two-core CPU module.
As content consumption and creation are now being given equal values by many consumers, the GPU’s resources are needed for applications that can benefit from it massively parallel nature. Media playback, multimedia editing and gaming alongside accelerated UI features have all been given priority in Kaveri’s design but actually getting these to function correctly and efficiently on a balanced architecture is really the final frontier.
Coming back to our points about synergy, AMD has an architecture which is adaptable to both CPU and GPU workloads but they are also providing developers and programmers the software tools they need to leverage this hardware advantage. This all-in-one solution is the only way they’ll be able to achieve broad acceptance for HSA and the potential advantages it brings to the table.
On the hardware side, uniform memory access between components and workload dispatch equality bring a truly heterogeneous environment that much closer to the table as the CPU and GPU can share on-die resources. Meanwhile, the key to actually unlocking the architecture’s potential horsepower lies with software which, in this case, takes the form of AMD’s unified software development kits and their new CodeXL developer suite. We’ll take a look at each of these individually a bit later.
As it currently stands, Kaveri isn’t meant to compete against Intel’s higher end Haswell models, nor will it be priced over $200. AMD is firmly planted on the value end of the spectrum but has still thrown in some features that will appeal to enthusiasts but as you’ll see on the next page, the most interesting facet this new architecture isn’t necessarily its flagship APUs. Rather, the highly efficient and still powerful mid-range SKUs will likely be the ones which hold the most exciting elements for anyone reading this article.
In order to understand what makes Kaveri special, learning the basics of HSA is essential. In a nutshell HSA is an effort to leverage the potential CPU and graphics horsepower within AMD’s APUs by properly routing parallel and serial workloads towards the resources best able to process them. You see, the x86 cores excel in serial and task parallel scenarios while data parallel workloads can be handled much more efficiently by the GPU’s multiple compute cores. Since an Accelerated Processing Unit combines both x86 cores and a dedicated graphics subsystem, it’s perfectly suited for both situations. The challenge has always resided in developing a synergy between these two seemingly disparate elements. That’s where Kaveri comes into the equation.

Kaveri is based off of GlobalFoundries’ new 28nm Super High Performance (SHP) process node which represents a dramatic shift away from the 32nm APUs of Trinity and Richland. The move to what AMD calls an “APU optimized” 28nm process has allowed them to retain the dual module, quad core x86 computing layout while significantly expanding Kaveri’s feature sets and incorporating a more capable GPU through higher transistor density. All told, one of these new APUs will weigh in at 2.41 billion transistors spread across a die area of 245mm² delivering up to 856 GFLOPS of combined performance.
Unfortunately, 28nm SHP has some shortcomings. It is a relatively immature process node so leakage is slightly above where AMD wanted it to be. This meant sacrificing clock speeds at higher TDP levels but its affect on mid to entry level parts will be minimal at most.
One benefit of AMD’s 28nm approach is that it avoids the stacked transistors of Intel’s 3D lithography technology. While this does tend to increase the die are of these APUs versus their Haswell competitors, heat dispersion will be much easier which should lead to more efficient cooling with today’s heatsinks.
What you really need to know about Kaveri boils down to two words: Steamroller and GCN. Steamroller represents the latest iteration of AMD’s Bulldozer microarchitecture and includes several optimizations for better concurrent data throughput. Don’t expect titanic improvements over Piledriver but it does bring some much-needed single thread performance boosts to the table.
The addition of GCN or Graphics Core Next is a key element here; and not only because the GPU component takes up a whopping 47% of Kaveri’s available die space. It boasts significant benefits over the older VLIW4-based cores within Trinity and Richland and in this iteration even incorporates the “GCN 2.0” features we saw on the new R9 290-series Hawaii GPUs.

In keeping with current market trends, AMD hasn’t designed Kaveri for enthusiasts but rather targets segments that have further reaching impact. This means their primary design goals revolved around three core principles: create a winning notebook solution, deliver optimal performance per watt and implement a solution that can scale as necessary into other segments. As we’ve already mentioned, this simply fits with the APU’s current evolutionary process.
While history has shown that designing a one-size-fits-all solution is challenging, several of the new desktop parts seem to show many of these targets have already been achieved. If anything, Kaveri will have serious implications as its design trickles down into the notebook and ultra mobile spaces where its power efficiency can be used to the fullest effect.

With so much real estate being reserved for the graphics subsystem, it should go without saying that AMD is looking to maximize GPU computing on Kaveri. Not only is this one of the cornerstones of their HSA initiative but it actually meshes quite well with the current and upcoming market realities. It is also why AMD still believes that a quartet of x86 cores remains a “sweet spot” and doesn’t see the need for giving up valuable die space for an additional two-core CPU module.
As content consumption and creation are now being given equal values by many consumers, the GPU’s resources are needed for applications that can benefit from it massively parallel nature. Media playback, multimedia editing and gaming alongside accelerated UI features have all been given priority in Kaveri’s design but actually getting these to function correctly and efficiently on a balanced architecture is really the final frontier.

Coming back to our points about synergy, AMD has an architecture which is adaptable to both CPU and GPU workloads but they are also providing developers and programmers the software tools they need to leverage this hardware advantage. This all-in-one solution is the only way they’ll be able to achieve broad acceptance for HSA and the potential advantages it brings to the table.
On the hardware side, uniform memory access between components and workload dispatch equality bring a truly heterogeneous environment that much closer to the table as the CPU and GPU can share on-die resources. Meanwhile, the key to actually unlocking the architecture’s potential horsepower lies with software which, in this case, takes the form of AMD’s unified software development kits and their new CodeXL developer suite. We’ll take a look at each of these individually a bit later.
As it currently stands, Kaveri isn’t meant to compete against Intel’s higher end Haswell models, nor will it be priced over $200. AMD is firmly planted on the value end of the spectrum but has still thrown in some features that will appeal to enthusiasts but as you’ll see on the next page, the most interesting facet this new architecture isn’t necessarily its flagship APUs. Rather, the highly efficient and still powerful mid-range SKUs will likely be the ones which hold the most exciting elements for anyone reading this article.


Last edited: