What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

AMD Radeon HD 6970 and HD 6950 Review

Status
Not open for further replies.

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
The mad, headlong rush towards Christmas is normally punctuated with plenty of new products and this year has been no different. Most of the news has come from NVIDIA’s camp as they have successfully introduced a pair of refreshed Fermi cards called the GTX 580 and GTX 570. Naturally, AMD wants in on the fun and is now finally releasing two retaliatory products: the HD 6970 and HD 6950.

To say this code named “Cayman” GPU has been the topic of some heated debates in past week would be a massive understatement. Ever since the first rumors began circulating, people began thinking of Cayman as AMD’s broadside in the DX11 war and considering the broad scope of architectural changes, it seems more than equal to that task. As a replacement for the highly popular Cypress core, this new series of products has some understandably high expectations riding on its shoulders.

For the time being, AMD’s enthusiast lineup will consist of two SKUs: Cayman XT and Cayman Pro, otherwise known as the HD 6970 and HD 6950. The higher end HD 6970 doesn’t target the GTX 580 like many hoped it would but rather aims directly for NVIDIA’s GTX 570. As evidenced by a starting price of $369 USD, it is definitely priced to move.

Presently, there is a void at the $299 price point that was once occupied by the GTX 470, before cuts lowered its suggested retail price to under $270. Since AMD is anxious to tap this lucrative market this is precisely where the HD 6950 will be sitting. Considering the budgets of many gamers, we’re sure AMD will win many over with their pricing structure on both of these cards.

The hype surrounding this launch has reached considerable proportions, and the delays have pushed Cayman’s release precariously close to Christmas, but today we’re about to find out what the HD 6970 and HD 6950 are all about.

 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
An Architectural Deep Dive: Geometry Processing

An Architectural Deep Dive


One of the main design goals for the Cayman series of cards was to increase overall performance in a number of key areas. AMD’s original plan was to have this generation of products produced on TSMC’s 32nm manufacturing process but that didn’t turn out quite as expected. Upon tape-out the realization dawned that 32nm wouldn’t bring forth the expected performance or economic benefits so the decision was made to stick with the already-mature 40nm process. This meant porting over the original designs to an existing process which did cause some delays particularly at the upper end of the spectrum where power consumption and thermals became concerns. With all of this being said; the products we have all come to know as Cayman XT and Cayman Pro (the HD 6970 and HD 6950 respectively) are the 40nm “clones” of the originally planned Ibiza cards.

With all of this being said, do the Cayman series of cards feature an all-new architecture? Yes and no. The Barts products did borrow quite of bit of their design and core features from the Cypress series but with Cayman, AMD charted a different course. From a high-level architectural standpoint, very little has changed in terms of the overall core layout but nearly all of the “building blocks” have either seen a significant face lift or have had their functionality refined. In order to cover all of these changes, we will start with some of the individual items that make up this micro architecture.


Geometry Processing to the Next Level



In both the Cypress and Barts cores, there is a single unified graphics engine that is accessed through the main Command Processor. Cayman on the other hand uses a true “dual engine” architecture which breaks up the fixed function stages into a pair of identical engines. Not only does this setup lead to more efficient dispatch calls to be issued throughout the core but it also allows for two primitives to be processed per clock and a doubling-up of the number of tessellators and geometry / vertex assemblers. The dual rasterizers also allow for up to 30 pixels per clock to be processed through the two fixed function stages.

The tessellators themselves have been upgraded once again to what AMD calls an “eighth generation” design. These allow for off-chip buffering which allows geometry data from tessellation workloads to be stored in the DRAM if the on-chip cache becomes saturated. There have been other minor improvements made throughout the architecture in order to address the way tessellation is processed and this leads to a near threefold increase in high level geometry performance over the Cypress series.


The additional geometry processing horsepower which can be achieved through the new fixed function pipeline is significant when compared to the outgoing Cypress series. The dual tessellators and their ability to defer certain workloads allow for improved and much more consistent performance across all tessellation levels instead of just focusing upon lower levels as the Barts series did.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
From VLIW5 to VLIW4 & A Slight ROP Change

From VLIW5 to VLIW4



AMD’s past architectures used VLIW5 (Very Long Instruction Word) thread processors which basically meant there were four ALUs (or Stream Processors) which were dedicated to processing standard functions while a fifth unit was dedicated towards processing transcendentals or special functions. Cayman’s design now centers upon a VLIW4 core architecture whereby the fifth special function ALU has been eliminated and its functions have been spread among the remaining four processors. So instead of four ALUs doing simple and one doing the complex calculations, we have four with the ability to handle all functions.

Even though this may not seem like a huge change to the overall design, it has a noteworthy impact upon thread issuance and the overall efficiency of the architecture. Instructions are can now be spread equally within the SIMD arrays which allows for better overall core utilization and higher thread efficiency though simplified scheduling. More importantly, the “saved” space from eliminating 1 ALU per thread processor was also be redistributed for increased functionality in other areas such as the dual graphics engines we talked about earlier.

While the move to VLIW4 architecture netted an approximate 10% performance increase per square millimeter, it could also have a negative impact upon certain rendering scenarios that require high special function utilization.


ROPs: More the What First Meets the Eye



The layout of Cayman’s L2 cache and ROP array hasn’t changed from the Cypress series upon first glance but there have been some minor changes done under the hood. These render back ends feature ROPs with improved bandwidth which should help anti aliasing performance – an area where AMD’s HD 5000-series was particularly weak in. The change to these ROPs also increases 16-bit integer operation performance about twofold. 32-bit floating point calculations are also done at a much quicker pace.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
The Cayman Core Top to Bottom

The Cayman Core Top to Bottom



A bird’s-eye view of the Cayman core really doesn’t show that much of a departure from Cypress but there are some noteworthy changes other than the ones we mentioned in the last few sections. Since AMD’s has moved to a simplified VLIW4 architecture for the thread processors the number of SIMD engines has been increased by four for a total of 24. Each of these engines features 16 thread processors with four ALUs each (for a total of 64 ALUs per SIMD), four texture units, 512KB of L2 texture cache and 64KB associated towards the local data share. This means a full-enabled HD 6970 will have 1536 shaders and 96 TMUs while the ROPs array layout hasn’t changed from Cypress with its 32 colour and 128 z-stencil ROPs.

All in all, Cayman may have less Shader Processors than Cypress but the processors themselves are slightly more efficient and the architecture has additional texture processing power granted by the additional 4 SIMD engines.

Much like on the Barts series, we can also see that in an effort to increase rendering efficiency even more, AMD has broken up the Ultra Threaded Dispatch Processor into two with each section having its own instruction and constant cache. This dispatch processor basically acts like a traffic cop, directing draw calls to the SIMD arrays. With each directing its own “half” of the SIMD engine, rendering information can be processed at a much quicker rate.

AMD’s design choices are interesting to say the least but this new architecture will have its strengths and weaknesses when compared on a level playing field against Cypress. Since geometry performance has been the overriding focus here, we can naturally expect Cayman-based cards to run circles around the HD 5800-series in some games. However, not all of the first generation DX11 games incorporate higher level geometry or higher levels of tessellation. DX10 and to a greater extent DX9 applications also lack a real need for increased performance in this area as well, which may very well lead to a relatively minor gap between AMD’s current and past generations.

One thing to remember here is the Cayman represents a second generation DX11 architecture so the focus was put upon increasing DX11 performance rather than addressing any non-existent need for higher DX9 / DX10 rendering capacity. This means that for the time being many of the available games simply lack the resources that will allow AMD’s HD 6900-series to really shine.


A Revised GPU Compute Layout



Cayman XT & Cayman Pro Core Layouts

AMD has also thoroughly reworked the way in which their cores can handle GPU compute tasks. In the case of Cayman, the number of bidirectional direct memory access (DMA) engines has been increased to two and each can access the PCI-E interface independently. Basically, two calls to / from the interface or one call in each direction can be made which in turn can in theory maximize the overall bandwidth utilization of the architecture.

In addition to the dual DMA engines, several other items have been added to improve GPGPU efficiency. Direct access has been given to the Load / Store units which runs hand in hand with improved information flow through the architecture. Double precision operations have also been improved to run at one quarter of the SP rate. Will this mean the massive improvement many have been hoping for in Folding @ Home? There may be some minor improvements but at this time, Stanford’s support of AMD’s GPU and APP technology (formerly Stream) is cursory at best.

The most interesting addition to this equation is Cayman’s ability to conduct asynchronous operations in a GPGPU environment. While other computing solutions allow or create a single application that can spawn multiple threads to run on the GPU, AMD has found a way to allow multiple programs to address the GPU at the same time. Real-time load balancing allows each program to run on the GPU with its own separate batch of threads while concurrently sharing those same compute resources with any other application. This could allow something like [email protected] to run at the same time as video transcoding.

Unfortunately, these asynchronous operations are currently only a glimmer in the eye of AMD’s development team since they are not natively supported in Windows or DirectCompute for that matter. There is some hope that some OpenCL developers will begin seeing some uses with this and write their applications accordingly, but until that day we won’t see this feature anytime soon.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
PowerTune: Keeping Consumption in Check

PowerTune: Keeping Consumption in Check


One of the largest challenges GPU manufacturers have been smashing into as of late is the rapid increase in the power consumption of their higher-end ASICs. NVIDIA’s solution to cut consumption and TDP in their GTX 500-series has been a combination of input current monitoring and upgraded heatsinks as well as application detection. AMD meanwhile is taking a different path with their PowerTune technology which uses a complex set of current calculations to determine on-the-fly TDP levels. It can then adjust clock speeds once the card reaches a pre-determined maximum thermal design power level.

The entire point of PowerTune is to allow AMD to strike a delicate balance between power consumption, thermals and clock speeds. If such a middle-man didn’t exist, the clock speeds of Cayman series products would have been significantly lower since there would have been nothing to keep TDP in check.


A typical GPU will likely be used of any number of applications but its primary focus will usually be upon one thing: entertainment. While there are several synthetic benchmarks which cause a graphics card to consume copious amounts of power, most typical games will never even begin to approach these levels. As such, AMD is focusing their PowerTune technology upon scenarios which put unrealistic loads upon the GPU rather than games. Since most of us don’t sit around all day benchmarking with 3DMark, this is good news.

Unfortunately, depending on their rendering methods there may still be the odd game which will be caught up in the crossfire and have its performance capped but we will be tackling this potential issue in a later section. It is just important to remember that AMD has tuned this technology to deliver the best gaming performance while weeding out potential power viruses.


As AMD describes it, this new technology is simply used to contain power consumption in such a way that the actual TDP of a given product will in effect determine clock speeds. Instead of letting the card run amok for the few seconds of absolute peak consumption that will likely occur every now and then, PowerTune caps power draw through clock speed modification. After the peak periods are concluded, clock speeds along with performance will return to normal.

This may all sound like doom and gloom for overall performance but PowerTune is actually designed for a worst-case scenario rather than a typical usage pattern. The algorithm to determine implied power consumption is based upon an extremely high leakage ASIC operating with 45 degree inlet temperature. Remember that high temperatures increase power draw in transistors so this ensures products are not artificially capped in lower temperature scenarios. Since TDP is the determining factor here, if you keep your card cool within a well ventilated case you should in theory never see PowerTune kick in while gaming.


Even in one of those power sucking scenarios like the Perlin Noise test in 3DMark Vantage, the cards are able to maintain a constant framerate whilst fluctuating the core clock. Of course this does tend to decrease the overall peaks and valleys normally seen in benchmark sequences’ performance but in AMD’s thinking, this is better than seemingly random application detection kicking in.


One of the beauties of this technology is the control the end user has over it. Within the Catalyst Control Center’s Overdrive panel, there is now a Power Control Setting slider that allows PowerTune to add some overhead to its calculations. This could also improve overclocking since it allows the core to loosen its grip on TDP.


While the current PowerTune cap for the HD 6970 is 250W and the HD 6950 is 200W, this will allow for theoretical consumption limits of 300W and 240W respectively which could increase performance if one is running into rendering limitations. However, setting additional overhead does not guarantee games will perform any better since as we saw there are very few (if any) applications that will be limited by the already-lax limits AMD has instituted. It should also be mentioned that since PowerTune is considered an overclocking tool, its usage will not be covered by certain board partners’ warranties.


Consequently, AMD has also allowed for the tightening of containment as well. There may not be much value in this for a typical hardcore gamer but if performance is above that magical 60 FPS mark, lowering the PowerTune setting could still net perfectly acceptable framerates coupled with lower power consumption and temperatures.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
Image Quality Improvements Aplenty

Image Quality Improvements Aplenty


AMD has also brought some new image quality enhancements to the table. The cornerstone of this push to increase IQ is the addition of a new anti-aliasing method which AMD calls Morphological AA.


Morphological AA Explained


Morphological AA is basically a new form of fullscreen anti-aliasing that delivers an image quality which is comparable to Super Sample AA, but can be implemented with a fraction of SSAA’s performance hit. The AA algorithms are calculated more efficiently by leveraging the GPGPU compute abilities of modern Radeon cards and the power of the DirectCompute API. Since the post-processing filtering is done by DirectCompute, the whole scene can be quickly analyzed so this AA method isn’t limited to only certain aspects of a given image.


One of the more interesting benefits of Morphological AA being done through a standalone API is the fact that it can be applied to both 2D and 3D scenes. It can be applied to things like video, Flash apps and more. In addition, since it is controlled directly through AMD’s Catalyst Control Center and makes use of DirectCompute, Morphological AA has the ability to be forced in any DX9, DX10 or DX11 game.


Enhanced Quality AA Makes an Entry


Enhanced Quality Anti Aliasing is another new image enhancement routine which AMD has implemented for Cayman-series cards. This is not DirectCompute-controlled and acts much like CSAA routines. Unlike Multi-Sample AA which uses an equal number of color and coverage samples per pixel, EQAA allows for each to be controlled independently with a maximum of 16 coverage samples per pixel. It is also compatible with existing AA modes which could further enhance image quality.


The main benefit of EQAA is the performance impact (or lack thereof) it has upon performance. According to AMD’s own numbers, in most cases users will see less than a ten percent drop in framerates when enabling this feature.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
AMD EyeSpeed & UVD3

AMD EyeSpeed



The Northern Islands family of GPUs has a whole stable of video playback features, and in order to keep them all under banner AMD created the Eyespeed brand. Eyespeed will be the all-encompassing term given to all of the multimedia enhancing technologies such as pre / post processing, transcoding and HD playback within a unified ecosystem.

Eyespeed is split within two main spheres of influence: leveraging parallel processing for improved performance and video decoding through AMD’s UVD3.


UVD3; UVD2 on Steroids



As many of you probably already know, AMD’s Universal Video Decoder has been around for years and is known as one of the most capable video processing platforms currently available. UVD is now taking the next logical step forward with an expanded list of accelerated codecs in addition to the ones which were already compatible with past iterations.

One of the main features which have been added to the newly minted UVD3 is the ability to decode videos which use MVC encoding. As part of the H264 / MPEG-4 AVC codec, MVC is responsible for creating the dual video bitstreams which are essential for stereoscopic 3D output. Supporting this standard brings AMD’s GPUs the ability to process Blu-ray 3D movies through a HDMI 1.4a connector.

MPEG-4 Part 2 hardware acceleration for DivX and Xvid codecs has also been added but there is no mention made about the Nero Digital codec. For the time being, we’ll assume that Nero Digital acceleration will be added at a later date.


AMD’s main focus for these new graphics cards and for future products is to go beyond simply decoding HD content and instead add high end image quality improvements prior to the signal reaching the display. Through the use of the compute resources within a given system, additional pre and post-processing can be done before outputting an HD video stream.


AMD claims the additional processing their products can accomplish will significantly boost overall image quality whether it is for a simple upscale standard definition image or a true high definition video stream. The HQV benchmark is a highly, highly, highly subjective metric in which to determine image quality but supposedly the HD 6000 series will simply destroy the competition once all of its video processing features are enabled.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
HD3D: AMD Jumps into the Stereo 3D Pool

HD3D: AMD Jumps into the Stereo 3D Pool


Stereoscopic 3D has been the talk of the town for the last few years and AMD is understandably anxious to get in on the game. Unfortunately, quite a few consumers have been turned off of 3D movies due to studios ramming them down our throats even though many fail to live up to the benchmark set by Avatar.

The Stereoscopic gaming market on the other hand has been dominated by NVIDIA’s 3D Vision which we have complimented again and again. It offers excellent driver support and through 3D Vision Play can be used on a large number of LCD / LED TV screens as well. AMD on the other hand is promoting their “HD3D” as an open standard that is compatible with a number of third party stereo 3D driver wrappers, active shutter glasses and monitors. And yes, just like NVIDIA, AMD has chosen to go the active shutter glasses route.


In order to build a structure in which stereo 3D games can be developed and 3D movies played on AMD’s hardware, a number of companies have been recruited.

Ensuring the availability of active shutter glasses will obviously be one of the main concerns for AMD but most certified monitors will come with at least one set of glasses. Considering nearly all of the big names in the active shutter glasses market seem to be supported through HD3D, there should be no shortage of high quality glasses from the likes of XpanD and Bit Cauldron.

AMD’s graphics cards will also be compatible with Bit Cauldron’s excellent HeartBeat technology that allows for the virtual elimination of the sync issues that sometimes plague active shutter glasses. Bit Cauldron’s BC5000 glasses are actually some of the first to boast AMD certification.


Since there aren’t native stereo 3D drivers available from AMD, middleware partners are expected to provide third party driver wrappers which enable support through programs that piggyback off of the Catalyst drivers. DDD and iZ3D have been releasing these compatible wrappers for quite some time now and together they provide support for some 400 games. Both allow stereo 3D to be added to games and movies which don’t natively support depth perception.

This is actually a significant risk since AMD has very little control over the quality and compatibility of third party software. These driver wrappers very rarely carry WHQL certification which could lead to additional conflicts as well. From our experience, the DDD’s TriDef and iZ3D’s own software don’t play nice together so users will likely have to choose between one or the other. In addition, the number of native stereo 3D games that support AMD solution at this point is precisely zero which is why AMD needs to count on these driver wrappers from third parties. AMD simply has no plans to implement their own stereo 3D drivers.


One of the most important things to remember is that even though AMD is pimping HD3D, they are not actually the ones doing the vast majority of development. This has already led to both iZ3D and DDD releasing driver wrappers that work equally well with NVIDIA cards as they do with AMD’s products. One good example of this is the recently announced Viewsonic V3D241WM-LED which uses wired shutter glasses and needs a specific iZ3D driver wrapper to function properly with GeForce and Radeon products.

Being an open standard, HD3D can’t really be classified as AMD’s technology and believe it or not, support for Radeon graphics cards from DDD, iZ3D, XpanD and all the other companies listed above isn’t anything new. AMD is basically helping developers implement support for their graphics cards without introducing a proprietary standard. Due to a lack of content and quality control, there may be some issues with this approach but only time will tell whether or not it can be counted successful.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
Eyefinity Gets Refined

Eyefinity Gets Refined


One area in which AMD blazed a trail over the last year or so is in the surround gaming market. With the introduction of Eyefinity technology, gamers interested in using multi monitor setups no longer had to worry about spotty driver support or jumping through hoops to get things working properly. NVIDIA released their own Surround-branded multi monitor support as well but AMD hasn’t been standing still when it comes to updating Eyefinity.


Naturally, AMD sees fit to brag about their accomplishments in terms of accessibility for Eyefinity and in the coming months there will be yet more reasons to choose Eyefinity over NVIDIA Surround. Features such as a 5x1 portrait mode, enhanced bezel correction and more customization tools will soon be added but the most interesting addition will be on the Northern Islands cards themselves.


It isn’t quite business as usual on the backplate of the HD 6000 series since AMD has augmented the connector selection in order to better support Eyefinity. In order to make Eyefinity slightly more flexible, a pair of mini DisplayPort 1.2 connectors has been added in the place of the single large DP 1.1 connector which was seen on the HD 5000 series.


DisplayPort 1.2 brings one huge advantage to Eyefinity users: the ability to drive up to three monitors off of a single connector via a hub that will be sold separately. In addition, most of the upcoming monitors which utilize this new standard will have both DisplayPort inputs AND outputs so you can connect the primary display to the GPU and then use the output on the display to daisy chain other monitors together. This means a single HD 6800 series card can natively support up to 6 monitors.

Bandwidth shouldn’t be an issue either since the DisplayPort 1.2 standard effectively doubles the bandwidth of the current 1.1 standard to approximately 17 Gbit/s. This is enough to run up to four 1080P displays at a 60Hz refresh rate or two 2560 x 1600 displays off of a single connector and is sufficient to support 120Hz stereo 3D content to a single 120Hz display with a resolution up to 2560 x 1600. 3D is also supported through the included HDMI 1.4a connector.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,264
Location
Montreal
From ATI Stream to AMD APP Technology

From ATI Stream to AMD APP Technology


As AMD is moving to all but abolish the ATI brand, a number of technologies are being renamed and rationalized. The Stream name was once given to ATI’s GPGPU compute initiative to differentiate it from NVIDIA’s highly successful CUDA environment. Things are about to change….a bit.


In order to carry on Stream’s torch, AMD Accelerated Parallel Processing or APP has been created. Yes, the whole “App” moniker has been used so much that it’s now becoming a bit of a cliché but in this case, it seems to be aptly translated. This is now an all-encompassing term which can be used for graphics cards as well as AMD’s new upcoming generation of Fusion APUs.

AMD’s new APP SDK v2.2 is about to hit developers’ doorsteps and with it will come the integration of OpenCL computing for both x86-based CPUs and GPUs. This is one of APP’s major benefits over NVIDIA’s competing solution as it can be leveraged for a heterogeneous environment where specific tasks are sent towards the processor which will complete them most efficiently.


From our understanding, Stream’s name may have changed but its goal to deliver high performance computing on the GPU through OpenCL, DirectCompute and other APIs is still very much alive and well. If anything, it has actually expanded now that AMD is able to leverage both their CPUs and GPUs under the same programming umbrella.
 
Status
Not open for further replies.

Latest posts

Twitter

Top