AMD Radeon R9 290 4GB Review

SKYMTL · Oct 30, 2013

With the R9 290X released alongside the R9 280X, R9 270X and R7 260X, October 2013 has been a banner month of AMD’s graphics division. Not only was NVIDIA caught flat-footed with cards that were suddenly deemed overpriced (in some cases by a significant amount) but with an impressive lineup of games being released this holiday season, the new Radeon lineup couldn’t be better positioned. Now, we’re about to see yet another launch from AMD in the form of their R9 290 4GB and this one may be the most important of them all.

The R9 290 4GB is very much the lynchpin in AMD’s lineup since it targets gamers who can’t afford or don’t want to spend the $549 demanded by the R9 290X. It also acts as a bridge product between AMD’s Titan killer and the rebranded R9 280X which is slightly more efficient and boasts better overclocking headroom than its bigger brother. NVIDIA’s lineup is also ripe for the picking since, even with their latest price reductions, the GTX 770 and GTX 780 may still not live up to the price / performance ratio offered by the R9 290.

The Hawaii core has undergone a few small revisions in order to create the R9 290. While it remains a 6.2 billion transistor behemoth based on a 28nm manufacturing process, AMD has cut out a quartet of Compute Units which should fractionally lower power consumption and improve yields. ROPs and their associated Render Back Ends, the L2 cache and memory controller allotment haven’t been touched. There has been a slight reduction of 64KB of L1 cache but that shouldn’t adversely affect performance in any way.

The ability to remove individual Compute Units without affecting a whole Shader Engine or other key parts of the architecture is an important aspect AMD’s GCN core technology. It allows for the flexibility to tailor design products for a given segment. With that being said, all of the enhancements rolled into the R9 290X have been carried over into this core. We’ve already covered the Hawaii architecture in detail within our R9 290X review and won’t be rehashing it so head on over there for additional technical details.

With just four less Compute Units, the R9 290 4GB is still quite close to the R9 290X in terms of core specifications. It uses 2560 stream processors and 160 texture units which is 256 and 16 less respectively than the fully enabled Hawaii core. It also comes with the same 4GB, 512-bit, 5Gbps memory interface as its sibling so there will be plenty of bandwidth for applications that require a high amount of texture memory or ultra high resolutions.

The real differentiator here is core clocks. While the R9 290X typically operates between 850MHz (or lower in some rare instances) and 1GHz but tended to average 865MHz in Silent Mode, the R9 290’s engine typically hovers between the 875MHz and 925MHz boundaries though we found continual speeds to be about 900MHz after continuous gameplay. That’s a particularly interesting figure since it points to the cut-down architecture having additional frequency headroom which the Silent Mode 290X didn’t. To achieve this, AMD has increased fan speeds to 47% which somewhat explains the higher figures and also points towards why they decided not to include an Uber Mode.

With the R9 290X sitting at $549 and the R9 280X at $299 AMD didn’t really need to thread a needle with the R9 290’s price. At just $399 it is being launched into a particular sweet spot within the gaming market while taking over from the HD 7970 GHz Edition’s price point. Considering how well the 290X performed, this should be a pleasant surprise for anyone looking for a high performance graphics card for Battlefield 4 and other triple-A titles being released before Christmas.

With such close specification proximity, some may be wondering whether or not the R9 290X is now overpriced. As we'll see in this review, it's a valid question and there may be some reason for R9 290X buyers to worry.

The competing GeForce cards like the GTX 780 and GTX 770 have been largely supplanted by the superior pricing structure of AMD’s new lineup. Indeed, had this launch been a week ago, it would have been like rubbing salt in NVIDIA’s open wounds since the R9 290 would have gone head to head against the GTX 770, a card that typically finds itself exchanging blows with AMD’s lower end R9 280X. That’s since changed since NVIDIA has finally given in to pressure and cut costs of their high end GTX-series products while including some excellent value-added gaming bundles.

With the GTX 780 now sitting at $499 and the GTX 770 at $329, the R9 290 finds itself in a relative sea of calm. NVIDIA’s cards do come with an excellent games bundle which may put them ahead in some respects but in terms of long-term value, it will be interesting to see where the R9 290 4GB ultimately resides.

SKYMTL · Oct 30, 2013

A Closer Look at the R9 290

A Closer Look at the R9 290

The reference R9 290 looks exactly like the R9 290X which is understandable considering how close both card’s core architectures are to one another. It uses a black heatsink shroud with tasteful red accents, leveraging AMD’s newfound understated design language which looks great even though there aren’t any fancy side-mounted LEDs like the competition. The overall length is about 11.5” so the R9 290 won’t have any problem fitting into an ATX chassis.

Once again we see that AMD has added small fan intakes around shroud’s backmost edge. Supposedly, this will help with airflow when two cards are placed closely together in Crossfire.

Speaking of Crossfire, you won’t find any of the usual “finger” connectors on this card since AMD is using their new XDMA engine for inter-card communications. This means AMD is using a hardware DMA engine to stream dual and triple card interactions over the PCI-E bus rather than an external connector. Not only does the new approach provide a bandwidth uplift over the previous solution but it also has the potential to improve latency.

Unlike the 290X, the 290 doesn’t ship with Silent and Uber BIOS modes. Rather, its BIOS switch is present simply because AMD has carried over the R9 290X’s PCB en masse. That doesn’t preclude board partners from adding their own custom BIOSes but the reference cards will only ship with a lone default setting.

Moving around to the card’s I/O connectors, there is a pair of DVI outputs (note that neither is compatible with a VGA adaptor) along with a DisplayPort with daisy chaining capabilities and an HDMI output. This meshes perfectly with AMD’s new Eyefinity groupings which allow a single card to natively support up to six displays.

SKYMTL · Oct 30, 2013

AMD's Clock Gating Goes Dynamic

AMD's Clock Gating Goes Dynamic

As AMD and NVIDIA have begun pushing the limits of the 28nm manufacturing process, both have struggled to optimized performance, power consumption and heat output. PowerTune was launched by AMD in an effort to address these three factors by constraining clocks at predetermined maximum limits which could only be modified via overclocking. This capped both power consumption and performance but unlike NVIDIA’s innovative GeForce Boost, it didn’t take into account the myriad of other factors like temperatures (a major contributing factor to efficiency) and available power overhead.

One good example of this metric is the HD 7970 GHz Edition’s approach to clock speeds in comparison to a GTX 780. The AMD card had a hard cap of 1050MHz regardless of how low the ASIC’s temperatures were. The GTX 780 on the other hand had the capability to take advantage (within reason of course) of cooler conditions by boosting up to higher clock speeds which is something AMD lacked. With the R9 290 cards, AMD is pushing aside their old, relatively archaic approach to on-die performance optimizations and implementing a whole new plan for balancing performance and efficiency.

While the baseline PowerTune equation has remained largely the same (ie: performance being tied closely to power consumption), AMD has added a number of features into their refreshed architecture which allows for closer control over the ASIC’s power management and how it determines optimal frequencies. Many of these have been carried over from lessons learned within the APU segment.

As with PowerTune’s previous iteration, temperature and activity sensors are interspersed throughout the GPU core to determine how much power the GPU should be actively using. Much of this data was estimated rather than calculated at the source. However, AMD has now added a secondary group of sensors which actively monitor the actual amount of current being drawn from the board’s regulators. This data is then combined with the information accrued through the architecture’s power estimation engine to paint a much more accurate picture of the board’s power needs.

With the thermal sensors and power telemetry feeding into a common DPM arbitrator, the new PowerTune hardware control mechanism can begin making choices based real time information rather than assumptions. This should allow AMD to wring the best possible frequencies out of their architecture without worrying about smashing face first into a TDP wall. This means there is no single clock for these new products since the functional relationship between power and clocks has to be maintained (ie: constantly moving power metrics and temperatures determine frequencies).

One of the most important factors of the new PowerTune equation is AMD’s commitment to totally user-customizability. If someone wants a cool running card that operates at higher frequencies by way of an increased thermal overhead, they can achieve just that through the Catalyst Control Center. The same thing goes for slightly lower clock speeds and absolute silence or any point in-between. In short, a gamer can now how the card they want regardless of their needs.

The second generation serial VID interface boasts an all-new voltage regularity controller which has the capability to balance all aspects of the core’s power spectrum. The controller can ensure the core remains within .5% of a given power target by switching clock and voltage inputs at a rate of up to 500 transitions per second. This allows for additional frequency granularity without the large peaks and valleys sometimes experienced in competing solutions. Best of all, much of this will be transparent to end users but it can still be controlled via different modifiers within AMD’s software stack.

Fan control also plays a large part in the new PowerTune equation. While previous solutions used standard fan speed tables the integrated controllers within the GPU can now target an optimal operating temperature (which is 95°C on the R9 290X) via more precise methods. For example, the fan controller is now completely variable in nature and reacts in real time to changing conditions while also boasting a predictive element so it can determine needs down the road.

This new fan controller and its relationship to the SVID interface means the fan speed fluctuate in a strictly controlled manner rather than drastic, noticeable upwards and downwards changes.

SKYMTL · Oct 30, 2013

Exploring the R9 290's Clock Speed Boundaries

Exploring the R9 290's Clock Speed Boundaries

A few days before the R9 290’s original launch date, AMD rolled out a new driver which boosted maximum fan speeds from 40% to 47% and allows for enhanced heat dispersion. Due to PowerTune’s enhanced algorithms, this effectively increases core frequencies in TDP-limited scenarios even though the effect upon acoustics tends to be pronounced. Without an “Uber Mode” to play with, we took readings at three separate settings within Hitman: 40% (what the card was originally supposed to ship with), 47% which is the new default and 55%. The differences were eye-opening to say the least but remember that Hitman is a worst-case scenario while other applications will cause this card to react differently.

As we can see, temperatures gradually climb to the usual 94°C as PowerTune endeavors to balance thermals and clock speeds when the fan is operating at lower levels. However, unlike the R9 290X’ Uber Mode, it seems like the cut-down architecture allows the 55% setting to keep the core below its thermal threshold.

Drilling down into the effects of heat and fan speeds on core frequencies shows us an interesting set of results. At 40% fan speed, heat production becomes rampant and PowerTune has to throttle like mad, creating a rapid fluctuation in clock speeds with a 150MHz delta in some areas. The R9 290 actually bottomed out around 660MHz which is nearly 300MHz lower than AMD’s stated maximum. In certain games, it almost felt like the card was stuttering if you looked closely enough.

The R9 290’s default setting retuned much better results with consistently high frequencies around the 900MHz mark. This is particularly interesting since it represents a point above the R9 290X’s Silent Mode averages and could, in the right situation, allow this less expensive card to match or even beat its higher priced sibling. Meanwhile, at 55% the heatsink was able to keep the R9 290’s core under its maximum operating temperature and thus, PowerTune was able to maximize clock speeds.

With all of the above taken into account, framerates tend to wildly differ from one scenario to another with the “new” default setting delivering excellent results. However, if you’re looking to an extra performance boost, all that needs to be done is increase fan speeds.

SKYMTL · Oct 30, 2013

More Display Possibilities for Eyefinity

More Display Possibilities for Eyefinity

While Eyefinity may not be used by the majority of gamers, the few who use three or even six monitors compose a demanding group of enthusiasts. Unfortunately, in the past, using a single card for Eyefinity purposes either limited output options since the mini DisplayPort connector needed to be part of any monitor grouping. This meant either using a DisplayPort-equipped panel, buying a HDMI / DVI to DisplayPort adaptor or waiting for an Eyefinity version of the card to be released.

On all new R7 and R9 series cards, a user will be able to utilize any combination of display connectors when hooking their card up to an Eyefinity grouping. While most cards won’t have the horsepower necessary to power games across three 1080P screens (the beastly R9 290X and R9 290 will likely be the exceptions to this), the feature will surely come in handy for anyone who wants additional desktop real estate.

These possibilities have been applied to lower-end R7 series cards as well. This is particularly important for content creation purposes where 3D gaming may not be required but workspace efficiency can be greatly increased by using multiple monitors.

Most R-series cards will come equipped with two DVI connectors, a single HDMI 1.4 port and a DisplayPort 1.2 output. While there are a number of different display configurations available, most R9 280X cards will come with a slightly different layout: two DVIs, a single HDMI port and two mini DisplayPorts. In those cases, AMD’s newfound display flexibility will certainly come in handy.

While we’re on the subject of connectors, it should also be mentioned that the R9 290X and R9 290 lack the necessary pin-outs for VGA adaptors. It looks like the graphics market will finally see this legacy support slowly dwindle down with only lower-end cards featuring the necessary connections.

For six monitor Eyefinity, the newer cards’ DisplayPort 1.2 connector supports multi-stream transport which allows for multiple display streams to be carried across a single connector. This will allow daisy-chaining up to three monitors together with another three being supported by the card’s other connectors.

The DP 1.2 connector’s multi streaming ability also allows MST hubs to be used. These hubs essentially take the three streams and then break them up into individual outputs, facilitating connections on monitors that don’t support daisy-chaining. After years of talk, there’s finally one available from Club3D but its rollout in North America isn’t guaranteed.

SKYMTL · Oct 30, 2013

TrueAudio; A Revolution in Audio Technology?

TrueAudio; A Revolution in Audio Technology?

When we think of gaming in relation to graphics cards, the first thing that likely comes to mind will be in-game image fidelity and how quickly a given solution can process high graphical detail levels. Realism and player immersion is only partially determined by how “good” a game looks and there are many other factors that contribute to how engaged a player will be in a game. Unfortunately, in the grand scheme of game design and a push towards higher end graphics, the soundstage is often overlooked despite its ability to define an environment and truly draw a gamer in.

Multi channel positional audio goes a long way towards player immersion but the actual quality produced by current solutions isn’t usually up to the standards most expect. We’ve all heard it time and again: a multitude of sounds which get jumbled together or a simple lack of ambient sound with the sole focus being put on the player’s gunshots or footsteps. Basically, it’s almost impossible to find a game with the high definition, visceral audio tracks found in today’s Hollywood blockbusters despite the fact that developers sink hundreds of millions into their titles.

The lack of developer generated, high quality audio tracks isn’t absent for lack of trying. Indeed, the middleware software and facilitators are already present in the marketplace but developers have a finite amount of CPU resources to work with. Typically those CPU cycles have to be shared with primary tasks such as game world building, compute, A.I., physics and simply running the game’s main programming. As you might expect, audio processing is relatively low in the pecking order and rarely gets the reserved CPU bandwidth many think it deserves. This is where AMD’s TrueAudio gets factored into the equation.

While sound cards and other forms of external audio renderers can take some load off the processor’s shoulders, they don’t actually handle the lion’s share of actual processing and sound production. TrueAudio on the other hand remains in the background, acting as a facilitator for audio processing and sound creation and allows for ease-of-use from a development perspective, thus freeing up CPU resources for other tasks.

TrueAudio’s stack provides a highly programmable audio pipeline and allows for decoding, mixing and other features to be done within a versatile environment. This frees programmers from the constraints typically placed upon audio processing during the game creation process.

In order to give TrueAudio some context, let’s compare it to graphics engine development. Audio engineers and programmers usually record real-world sounds and then mix them down or modify layers to create a given effect. Does the player need to hear a gunshot at some point? Record a gunshot and mix accordingly. There is very little ground-up environmental modeling like game designers do with triangles and other graphics tools.

TrueAudio on the other hand allows audio teams to get a head start on the sound development process by creating custom algorithms without having to worry about CPU overhead. As a result, it could allow for more audio detailing without running headfirst into a limited allocation of processor cycles.

According to AMD, one of the best features of TrueAudio is its transparency to developers since it can be accessed through the exact same means as the current audio stack. There aren’t any new languages to learn since it can be utilized through current third party middleware programs, making life for audio programmers easier and allowing for enhanced artistic freedom.

TrueAudio’s position within the audio stack enhances its perception as a facilitator since it runs behind the scenes, rather than attempting to run the show. Supporting game audio tracks are passed to TrueAudio, processed and then sent back to the main Windows Audio stack so it can be output as normal towards the sound card, USB audio driver or via the graphics processor’s HDMI / DisplayPort. It doesn’t take the place of a sound card but rather expands the possibilities for developers and works alongside the standard pipeline to ensure audio fidelity remains high.

TrueAudio is implemented directly within supporting Radeon graphics cards (the R7 260X, R9 290 and R9 290X) via a set of dedicated Tensilica HiFi EP audio DSP cores housed within GPU die. These cores will be dedicated to in-game audio processing and feature floating point as well as fixed point sound processing which gives game studios significantly more freedom than they currently have. It also allows for offloading the processing part of audio rather than remaining tied at the hip to CPU cycles.

In order to ensure quick, seamless access to routing and bridging is possible, the DSPs have rapid access to local-level memory via onboard cache and RAM. There’s also shared instruction data for the streaming DMA engine and other secondary audio processing stages. More importantly, the main bus interface plugs directly into the high speed display pipeline with its frame buffer memory for guaranteed memory access. At all times

While TrueAudio ensures that processing can be done on dedicated DSP cores rather than on the main graphics cores, there can still be a CPU component here as well since TrueAudio is simply supplementing what the main processor is already tasked with doing. In some cases, these CPU algorithms can build upon TrueAudio platform, enhancing audio immersion even more.

One of the primary challenges for audio engineers has always be the creation of a three dimensional audio space through stereo headphones. In typical setup, the in-game engine does the preliminary processing and then mixes down the tracks to simple stereo sound. Additional secondary DSPs (typically located on a USB headphone amp) then render the track into virtual surround signal across a pair of channels, adding in the necessary reverberations, separation and other features to effectively “trick” a user into hearing a directionally-enhanced soundstage. The end result is typically less than stellar since the sounds tend to get jumbled up due to a lack definition.

TrueAudio helps virtual surround sound along by offering a quick pathway for its processing. It uses a high quality DSP which insures individual channels can be separated and addressed with their own dedicated, primary pipeline. AMD has teamed up with GenAudio to get this figured out and from presentations we’ve seen, it seems like they’ve made some incredible headway thus far.

While nothing has to be changed from a developer standpoint since all third party applications and runtimes can work with TrueAudio, this new addition can leveraged for more than just optimizing CPU utilization. Advanced effects, a richer soundstage, clearer voice tracks and more can all be enabled due to its lower overhead and broad-ranging application support. In addition, mastering limiters can allow for individual sounds to come through without distortion.

Unlike some applications, TrueAudio isn’t an end-all-be-all solution since it can be used to target select, high bandwidth streams so not all sounds have to be processed through it. AMD isn’t cutting the CPU out of this equation and that’s important as they move towards a heterogeneous computing environment.

As with all new initiatives, the failure or success of TrueAudio will largely depend on the willingness of developers to support it. While it feels like we've been down this road before with HD3D, Bullet Physics and other AMD marketing points from years past that never really got off the ground, we fell like TrueAudio can shine. Developers are already onboard and AMD has gone through great pains to make its development process easy.

Audio is one of the last frontiers that hasn’t been already addressed. Anything that improves the PC audio experience is welcome but don’t expect TrueAudio to work miracles. It will still only be as good as the end point hardware (in this case your headphones and associated sound card) but it should allow better speaker setups to shine, taking immersion to the next level. Will it fundamentally redefine what the PC graphics card can provide? We certainly hope so.

SKYMTL · Oct 30, 2013

AMD's Mantle; A Possible Game Changer

AMD's Mantle; A Possible Game Changer

In order to understand where Mantle is coming from, we need to go back in time and take the Playstation 3 as an example of how AMD wants to change the way games interact with a PC’s graphics subsystem. While the PS3’s Cell processor and its associated graphics core are extremely hard to program for, games like Uncharted 3 and The Last of Us boast visuals that are equal to if not better than some of today’s newest PC games which run on hardware that was unimaginable when Sony launched their console.

So how was this seemingly impossible feat accomplished? Consoles give developers easier access to the graphics subsystem without messy driver stacks, loads of API overhead, a cluttered OS and other unnecessary eccentricities eating up valuable resources. As a result, console games are able to fully utilize a given resource pool and allow programmers to do more with less. In some cases (the PS3 is another excellent example of this) the flow towards true utilization takes a bit longer as programmers have to literally relearn how to approach their trade but AMD's focus here is to streamline the whole process.

Mantle has been created to reduce the number of obstacles placed before developers when they’re trying to create new PC titles or port games over from consoles. In the past, things like CPU optimizations and efficient inter-component communication have largely been pushed aside as developers struggled to come to grips with the wide range of PC hardware configurations being used. This leads to multi core CPUs remaining idle, the GPU’s on-die resources being wasted and a real lack of optimal performance conditions on the PC, regardless of its advanced hardware.

There’s also a very heavy software component when programming for the PC environment since developers routinely have to contend with a predominantly heavy driver stack and slowly evolving primary level software. That’s a problem since it leads to the software / memory interaction becoming a rather stringent traffic light, bottlenecking the flow of information between the CPU and GPU, limiting throughput.

DirectX 10 and DX11 have gone a long way towards addressing some of these roadblocks but their overall performance is still hindered by their high-level nature. They keep communication between the API, GPU, game and CPU under strict control, something developers don’t want to wade through. When using them, transmitting a large number of draw calls leads to a CPU bottleneck, meaning today’s graphics architectures can never realize their full potential.

This is where Mantle gets factored into the equation; not as a direct replacement for DirectX or OpenGL but rather as a complementary force. It’s an API that focuses on “bare metal”, low level programming with a thin, lightweight driver that effectively manages resource distribution, grants additional control over the graphics memory interface and optimizes those aforementioned draw-calls. Think of Mantle like a low level strafing run that targets key components rather than high level carpet bombing that may or may not achieve a given objective.

With a more direct line of access to the GPU, AMD is hoping that GCN’s performance could drastically increase through rendering efficiencies rather than having to throw raw horsepower at problems. Opening up new rendering techniques which aren’t tied at the hip to today’s primary APIs is also a possibility. Theoretically, this could allow Mantle to process a ninefold increase in draw-calls and more importantly, it will ensure optimizations can be carried over from the console version of a game to the PC and vice versa.

There are also some notable speedbumps to this approach as well. While the high-level API (in this case DirectX / Direct3D) will remain the same across multiple hardware and product classes, Mantle is only compatible with GCN. This is great news for anyone using a compatible graphics card or one of the new consoles but GeForce and Intel HD 4000 users may be left out in the cold since, for the time being at least, neither NVIDIA not Intel have a comparable or compatible solution. Pre-GCN cards have also been cast aside in favor of forward progress. With all of this in mind, developers may be forced to bypass legacy support, add in Mantle as a selectable option, or simply ignore Mantle until incompatible products are discontinued.

It goes without saying that AMD has won the next generation console race with the Jaguar APU on both Xbox One and PS4 so leveraging those design wins is an integral part of their future strategy. But very little has been said about the high-level and lower-level APIs being used within those products, primarily the Xbox One. Direct3D 11.2 is a given but no one could point a finger at the low-level API. Microsoft has been forthcoming by saying that it isn't Mantle but the inclusion of native DirectX HLSL compatibility could go a long way towards making AMD’s cross-platform dreams come true.

In many ways, this approach reminds us of 3dfx’s Glide, another low-level application programming interface developed years ago but doomed to failure due to a lack of developer support and its parent company’s eventual demise.

One of main challenges with trying to do anything outside of the typical DirectX / OpenGL environments is selling the idea to developers. Mantle may have show limitless promise but we’ve seen plenty of other budding technologies pushed aside due to a lack industry support. This time around, AMD has some major backers from day one.

Had AMD trotted out a small development studio to pimp their new wares, very few industry pundits would have taken it seriously. Instead, they achieved immediate street cred by gaining the support of Electronic Arts and by extension Dice; a combination responsible for the Battlefied series and a multi-billion dollar juggernaut of the gaming industry.

According to Johan Andersson, lead designer at Dice and a well respected evangelist of higher level PC technology, Mantle is a collaborative effort between AMD, EA, Dice and other studios. As a result, Dice’s Frostbite 3 engine will natively support the new low-level API. That’s a huge deal when you consider the number of triple-A titles that will be using it. Games like Need for Speed: Rivals, Dragon Age Inquisition, Star Wars Battlefront and Battlefield 4 will all utilize the Frostbite 3 engine and could potentially include a Mantle option. Other developers will quickly come onboard too since Mantle was created to fulfill many of their wishes and address their concerns when creating a PC title or port.

The first title to include Mantle support will be the eagerly anticipated Battlefield 4. However, Mantle will only be rolled out in a December patch, about two months after the initial release. This bodes well for the future since it proves Mantle can be patched into a game without affecting the underlying code structure and we’d still like to see it as an onscreen option much like the Source Engine allows for switching between DirectX / OpenGL rendering modes. We will also be interested to see what kind of performance boost (if any) it will achieve over its competitors.

So with all of this taken into account, where does Mantle stand within the current API landscape? We’re not all that sure yet since the technology is very much in its infancy. Judging from the blogosphere’s reaction to its announcement, developers are already clamoring to get early access to the PC-centric version. That’s an encouraging sign. However, how Mantle evolves beyond its current iteration largely depends on AMD’s willingness to support it in the coming years and this is where the question mark lies. Due to its close association with consoles and an ability to facilitate console to PC porting, supporting it from the developer and hardware engineer standpoint will be a no-brainer so the hope for continued engagement is certainly there.

Mantle and its rollout tells us a number of things: AMD’s developer relation program is starting to pay some serious dividends, the inclusion of Graphics Core Next architecture into next generation consoles is a much bigger deal than NVIDIA initially made it out to be and AMD is dead serious about gaming as part of their core focus. Mantle is a common thread which binds so many initiatives together and it has the potential to be a watershed moment for everything in AMD’s portfolio, from the upcoming Kaveri APU to discrete graphics cards to consoles.

The move to Mantle won’t be something completed overnight. Last week’s unveiling is only the first step on what will be a long and hopefully rewarding journey. Stay tuned for more information and a complete in-depth look at Mantle during AMD’s Developer Summit in November.

SKYMTL · Oct 30, 2013

Test System & Setup

Main Test System

Processor: Intel i7 3930K @ 4.5GHz
Memory: Corsair Vengeance 32GB @ 1866MHz
Motherboard: ASUS P9X79 WS
Cooling: Corsair H80
SSD: 2x Corsair Performance Pro 256GB
Power Supply: Corsair AX1200
Monitor: Samsung 305T / 3x Acer 235Hz
OS: Windows 7 Ultimate N x64 SP1

Acoustical Test System

Processor: Intel 2600K @ stock
Memory: G.Skill Ripjaws 8GB 1600MHz
Motherboard: Gigabyte Z68X-UD3H-B3
Cooling: Thermalright TRUE Passive
SSD: Corsair Performance Pro 256GB
Power Supply: Seasonic X-Series Gold 800W

Drivers:
NVIDIA 331.58 Beta
AMD 13.11 v8 Beta

*Notes:

- All games tested have been patched to their latest version

- The OS has had all the latest hotfixes and updates installed

- All scores you see are the averages after 3 benchmark runs

All IQ settings were adjusted in-game and all GPU control panels were set to use application settings

The Methodology of Frame Testing, Distilled

How do you benchmark an onscreen experience? That question has plagued graphics card evaluations for years. While framerates give an accurate measurement of raw performance , there’s a lot more going on behind the scenes which a basic frames per second measurement by FRAPS or a similar application just can’t show. A good example of this is how “stuttering” can occur but may not be picked up by typical min/max/average benchmarking.

Before we go on, a basic explanation of FRAPS’ frames per second benchmarking method is important. FRAPS determines FPS rates by simply logging and averaging out how many frames are rendered within a single second. The average framerate measurement is taken by dividing the total number of rendered frames by the length of the benchmark being run. For example, if a 60 second sequence is used and the GPU renders 4,000 frames over the course of that time, the average result will be 66.67FPS. The minimum and maximum values meanwhile are simply two data points representing single second intervals which took the longest and shortest amount of time to render. Combining these values together gives an accurate, albeit very narrow snapshot of graphics subsystem performance and it isn’t quite representative of what you’ll actually see on the screen.

FCAT on the other hand has the capability to log onscreen average framerates for each second of a benchmark sequence, resulting in the “FPS over time” graphs. It does this by simply logging the reported framerate result once per second. However, in real world applications, a single second is actually a long period of time, meaning the human eye can pick up on onscreen deviations much quicker than this method can actually report them. So what can actually happens within each second of time? A whole lot since each second of gameplay time can consist of dozens or even hundreds (if your graphics card is fast enough) of frames. This brings us to frame time testing and where the Frame Time Analysis Tool gets factored into this equation.

Frame times simply represent the length of time (in milliseconds) it takes the graphics card to render and display each individual frame. Measuring the interval between frames allows for a detailed millisecond by millisecond evaluation of frame times rather than averaging things out over a full second. The larger the amount of time, the longer each frame takes to render. This detailed reporting just isn’t possible with standard benchmark methods.

We are now using FCAT for ALL benchmark results.

Frame Time Testing & FCAT

To put a meaningful spin on frame times, we can equate them directly to framerates. A constant 60 frames across a single second would lead to an individual frame time of 1/60th of a second or about 17 milliseconds, 33ms equals 30 FPS, 50ms is about 20FPS and so on. Contrary to framerate evaluation results, in this case higher frame times are actually worse since they would represent a longer interim “waiting” period between each frame.

With the milliseconds to frames per second conversion in mind, the “magical” maximum number we’re looking for is 28ms or about 35FPS. If too much time spent above that point, performance suffers and the in game experience will begin to degrade.

Consistency is a major factor here as well. Too much variation in adjacent frames could induce stutter or slowdowns. For example, spiking up and down from 13ms (75 FPS) to 28ms (35 FPS) several times over the course of a second would lead to an experience which is anything but fluid. However, even though deviations between slightly lower frame times (say 10ms and 25ms) wouldn’t be as noticeable, some sensitive individuals may still pick up a slight amount of stuttering. As such, the less variation the better the experience.

In order to determine accurate onscreen frame times, a decision has been made to move away from FRAPS and instead implement real-time frame capture into our testing. This involves the use of a secondary system with a capture card and an ultra-fast storage subsystem (in our case five SanDisk Extreme 240GB drives hooked up to an internal PCI-E RAID card) hooked up to our primary test rig via a DVI splitter. Essentially, the capture card records a high bitrate video of whatever is displayed from the primary system’s graphics card, allowing us to get a real-time snapshot of what would normally be sent directly to the monitor. By using NVIDIA’s Frame Capture Analysis Tool (FCAT), each and every frame is dissected and then processed in an effort to accurately determine latencies, frame rates and other aspects.

We've also now transitioned all testing to FCAT which means standard frame rates are also being logged and charted through the tool. This means all of our frame rate (FPS) charts use onscreen data rather than the software-centric data from FRAPS, ensuring dropped frames are taken into account in our global equation.

SKYMTL · Oct 30, 2013

Assassin’s Creed III / Crysis 3

Assassin’s Creed III (DX11)

The third iteration of the Assassin’s Creed franchise is the first to make extensive use of DX11 graphics technology. In this benchmark sequence, we proceed through a run-through of the Boston area which features plenty of NPCs, distant views and high levels of detail.

2560 x 1440

Crysis 3 (DX11)

Simply put, Crysis 3 is one of the best looking PC games of all time and it demands a heavy system investment before even trying to enable higher detail settings. Our benchmark sequence for this one replicates a typical gameplay condition within the New York dome and consists of a run-through interspersed with a few explosions for good measure Due to the hefty system resource needs of this game, post-process FXAA was used in the place of MSAA.

2560 x 1440

SKYMTL · Oct 30, 2013

Dirt: Showdown / Far Cry 3

Dirt: Showdown (DX11)

Among racing games, Dirt: Showdown is somewhat unique since it deals with demolition-derby type racing where the player is actually rewarded for wrecking other cars. It is also one of the many titles which falls under the Gaming Evolved umbrella so the development team has worked hard with AMD to implement DX11 features. In this case, we set up a custom 1-lap circuit using the in-game benchmark tool within the Nevada level.

2560 x 1440

Far Cry 3 (DX11)

One of the best looking games in recent memory, Far Cry 3 has the capability to bring even the fastest systems to their knees. Its use of nearly the entire repertoire of DX11’s tricks may come at a high cost but with the proper GPU, the visuals will be absolutely stunning.

To benchmark Far Cry 3, we used a typical run-through which includes several in-game environments such as a jungle, in-vehicle and in-town areas.

2560 x 1440

xentr_theme_editor

xentr_page_width

xentr_toggle_page_width

xentr_color_pickers

xentr_toggle_color_picker

xentr_typography

xentr_node_layout

xentr_grid_layout

Style variation

xentr_sidebar

xentr_toggles_sidebar

xentr_sticky_sidebar

discussion AMD Radeon R9 290 4GB Review

HardwareCanuck Review Editor

HardwareCanuck Review Editor

A Closer Look at the R9 290

HardwareCanuck Review Editor

AMD's Clock Gating Goes Dynamic

HardwareCanuck Review Editor

Exploring the R9 290's Clock Speed Boundaries

HardwareCanuck Review Editor

More Display Possibilities for Eyefinity

HardwareCanuck Review Editor

TrueAudio; A Revolution in Audio Technology?

HardwareCanuck Review Editor

AMD's Mantle; A Possible Game Changer

HardwareCanuck Review Editor

The Methodology of Frame Testing, Distilled

HardwareCanuck Review Editor

Assassin’s Creed III (DX11)

Crysis 3 (DX11)

HardwareCanuck Review Editor

Dirt: Showdown (DX11)

Far Cry 3 (DX11)

xentr_legal_notice_title

AMD Radeon R9 290 4GB Review