What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

The NVIDIA TITAN X 12GB Performance Review

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
The Pascal-based TITAN X has finally arrived, a lot sooner than many may have expected but nonetheless ready to lead NVIDIA’s current lineup into the future. The way this card was launched via an off-the cuff announcement is unique, its specifications may leave you slack-jawed (as will the price!) and, based upon the GPU market’s current direction, the new TITAN X could very well reign supreme for a very long time indeed.

As with previous TITAN cards, this version of the TITAN X is meant to combine professional-oriented features with gaming performance that is sure to appeal to the top 1% of gamers. The professional side of that equation lies with the folks who are focusing on Deep Learning wherein the potential of a staggering 44 teraflops of 8-bit integer throughput will be money in the bank. Before the new TITAN X, they would have had to spend upwards $6000 or more on a Tesla P100 to get remotely close to that performance.

Gamers will likely lambaste the Pascal-based TITAN X’s $1200 price but it’s tough to complain when there’s obviously a market for these types of ultra high end products. Many will likely make their way into the hands of facilities focused on the aforementioned deep learning but plenty of others will be found in systems from the likes of Maingear, Falcon Northwest and other builders. As a matter of fact, many may prefer to go this route over two GTX 1080’s in SLI since they won’t have to worry about missing profiles or day-one support issues; the raw performance of TITAN X will help them power through nearly every scenario.

One of the other oddities of the TITAN X is the manner in which it will be sold: board partners simply won’t have access to this card, at least not initially. Instead, it will be offered exclusively through GeForce.com’s storefront and through those aforementioned boutique system builders. The Americas (including Canada) and the EU will have access to it on August 2nd while other regions will have sales open sometime in the future. That could complicate things for returns outside of the continental USA but NVIDIA is already well-versed when it comes to potential international RMAs.


In terms of raw specifications the 16nm GP102 core beating at the new TITAN X’s heart is a pretty beastly thing with 28 Streaming Multiprocessors accounting for 3584 cores and 22 Texture Units spread across 12 billion transistors. For those keeping track at home, that’s a good 40% more cores and TMUs than NVIDIA’s previous flagship card, the GTX 1080. However, that 40% may not translate into a direct performance increase since in order to keep TDP within a reasonable window (that being 250W) the TITAN X clock speeds are a bit lower than its little brother, clocking in with Base / Boost frequencies of 1417MHz and 1531MHz respectively.

Moving onto the memory and here we have a feature which is obviously meant for professional scenarios: 12GB of GDDR5X operating across a 384-bit memory interface at 10Gbps. The resulting bandwidth of about 480GB/s is nothing short of titanic (see what I did there? ;) ) but its real-world benefits for gaming scenarios will likely be quite limited. This is because even at 4K today’s games struggle to require more than 4GB and even the most memory-hungry applications seem to stop seeing performance benefits above 6GB. In that case, the TITAN X could unofficially be considered the first true “5K capable” graphics card.

With all of this taken into account alongside the potential 25% to 30% improvements over the GTX 1080, I’ll once again have to raise up the specter of NVIDIA’s pricing structure for the TITAN X. It costs 70% more but that’s actually less than buying two GTX 1080 Founders Edition cards. Yeah, I understand that is likely cold comfort and still bloody expensive regardless of how green-colored your glasses may be. However, the TITAN X does offer a unique set of features and performance throughput for those who are willing to pay that premium regardless of whether they’re on the Deep Learning side of the equation or simply a gamer who wants the best available right now.


The GP102 itself is a pretty interesting creation since, even though it is all-powerful within the TITAN X, it actually isn’t fully enabled this time around. In order to increase yields (and “increase” is a relative term since there won’t be many TITAN X’s to begin with) a pair of SMs have been disabled. If NVIDIA ever does get around to launching GP102 in its fullest form, it would be a 3840 core, 256 texture unit monster. However, with a distinct lack of competition on the horizon, we have to wonder if GP102 will ever see the light of day outside of the professional Quadro market.


Other than its black-colored heatsink shroud, the new TITAN X’s looks aren’t all that much different from the GTX 1080 or GTX 1070. Despite being a much more exclusive solution, it actually utilizes the exact same cooler a significantly lower priced options. That’s something which may not sit all that well with anyone who was looking for a more bespoke solution but as the saying goes: if it ain’t broke, don’t fix it.


Considering this is a card with a 250W TDP the 8+6 pin power input layout shouldn’t come as any surprise. What is interesting though is that even though NVIDIA has been extremely careful about discussing the TITAN X’s deep learning focus, the heatsink shroud still has an LED-lit GeForce GTX emblazoned on it. This really does point towards some odd component recycling going on but I still think this is one of the best looking reference designs ever made.


The card’s underside is completely covered with a backplate which once again uses the GeForce GTX TITAN X moniker even though NVIDIA has chosen to officially market it as simply the “TITAN X”.


I/O connections are par for the course as well with three DisplayPort 1.3 outputs a single HDMI 2.0 connector and finally a legacy DVI.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
Test System & Setup

Test System & Setup



Processor: Intel i7 5960X @ 4.3GHz
Memory: G.Skill Trident X 32GB @ 3000MHz 15-16-16-35-1T
Motherboard: ASUS X99 Deluxe
Cooling: NH-U14S
SSD: 2x Kingston HyperX 3K 480GB
Power Supply: Corsair AX1200
Monitor: Dell U2713HM (1440P) / Acer XB280HK (4K)
OS: Windows 10 Pro


Drivers:
AMD Radeon Software 16.5.2
NVIDIA 368.14 WHQL
NVIDIA 368.14 (TITAN X) WHQL


*Notes:

- All games tested have been patched to their latest version

- The OS has had all the latest hotfixes and updates installed

- All scores you see are the averages after 3 benchmark runs

All IQ settings were adjusted in-game and all GPU control panels were set to use application settings


The Methodology of Frame Testing, Distilled


How do you benchmark an onscreen experience? That question has plagued graphics card evaluations for years. While framerates give an accurate measurement of raw performance , there’s a lot more going on behind the scenes which a basic frames per second measurement by FRAPS or a similar application just can’t show. A good example of this is how “stuttering” can occur but may not be picked up by typical min/max/average benchmarking.

Before we go on, a basic explanation of FRAPS’ frames per second benchmarking method is important. FRAPS determines FPS rates by simply logging and averaging out how many frames are rendered within a single second. The average framerate measurement is taken by dividing the total number of rendered frames by the length of the benchmark being run. For example, if a 60 second sequence is used and the GPU renders 4,000 frames over the course of that time, the average result will be 66.67FPS. The minimum and maximum values meanwhile are simply two data points representing single second intervals which took the longest and shortest amount of time to render. Combining these values together gives an accurate, albeit very narrow snapshot of graphics subsystem performance and it isn’t quite representative of what you’ll actually see on the screen.

FCAT on the other hand has the capability to log onscreen average framerates for each second of a benchmark sequence, resulting in the “FPS over time” graphs. It does this by simply logging the reported framerate result once per second. However, in real world applications, a single second is actually a long period of time, meaning the human eye can pick up on onscreen deviations much quicker than this method can actually report them. So what can actually happens within each second of time? A whole lot since each second of gameplay time can consist of dozens or even hundreds (if your graphics card is fast enough) of frames. This brings us to frame time testing and where the Frame Time Analysis Tool gets factored into this equation.

Frame times simply represent the length of time (in milliseconds) it takes the graphics card to render and display each individual frame. Measuring the interval between frames allows for a detailed millisecond by millisecond evaluation of frame times rather than averaging things out over a full second. The larger the amount of time, the longer each frame takes to render. This detailed reporting just isn’t possible with standard benchmark methods.

We are now using FCAT for ALL benchmark results in DX11.


DX12 Benchmarking


For DX12 many of these same metrics can be utilized through a simple program called PresentMon. Not only does this program have the capability to log frame times at various stages throughout the rendering pipeline but it also grants a slightly more detailed look into how certain API and external elements can slow down rendering times.

Since PresentMon throws out massive amounts of frametime data, we have decided to distill the information down into slightly more easy-to-understand graphs. Within them, we have taken several thousand datapoints (in some cases tens of thousands), converted the frametime milliseconds over the course of each benchmark run to frames per second and then graphed the results. This gives us a straightforward framerate over time graph. Meanwhile the typical bar graph averages out every data point as its presented.

One thing to note is that our DX12 PresentMon results cannot and should not be directly compared to the FCAT-based DX11 results. They should be taken as a separate entity and discussed as such.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
Analyzing Temperatures & Frequencies Over Time

Analyzing Temperatures & Frequencies Over Time


Modern graphics card designs make use of several advanced hardware and software facing algorithms in an effort to hit an optimal balance between performance, acoustics, voltage, power and heat output. Traditionally this leads to maximized clock speeds within a given set of parameters. Conversely, if one of those last two metrics (those being heat and power consumption) steps into the equation in a negative manner it is quite likely that voltages and resulting core clocks will be reduced to insure the GPU remains within design specifications. We’ve seen this happen quite aggressively on some AMD cards while NVIDIA’s reference cards also tend to fluctuate their frequencies. To be clear, this is a feature by design rather than a problem in most situations.

In many cases clock speeds won’t be touched until the card in question reaches a preset temperature, whereupon the software and onboard hardware will work in tandem to carefully regulate other areas such as fan speeds and voltages to insure maximum frequency output without an overly loud fan. Since this algorithm typically doesn’t kick into full force in the first few minutes of gaming, the “true” performance of many graphics cards won’t be realized through a typical 1-3 minute benchmarking run. Hence why we use a 10-minute warm up period before all of our benchmarks.

The TITAN X is obviously a massively powerful card but it also boasts a TDP of about 250W. That means it outputs a massive amount of heat despite using the same heatsink as the much more efficient GTX 1080. Does this spell trouble? Let’s find out.


Temperatures actually start off quite well and remain below the 85°C mark throughout the test with very little movement. However, as we have seen in the past, NVIDIA’s Boost algorithms are meant to effectively balance power consumption, temperatures, fan speeds and frequencies in an effort to achieve optimal overall performance. That means one or more of those elements are about to be sacrificed to insure that 84°C is maintained.


The first sacrificial lamb is obviously fan speeds and acoustics (more about the noise output on the next page!) since the TITAN X’s rotational speeds beat every other reference design we’ve seen during this generation. With that being said, the difference of about 500RPMs isn’t all that much considering what’s at play here.


NVIDIA claims the TITAN X has a Base and Boost clock of 1417MHz and 1531MHz respectively. Even when running at full tilt our sample was still able to hit about 1600MHz even after hours of gameplay. While performance is indeed sacrificed to insure normal operating parameters can be maintained, it’s hard to complain about what’s happening here. I can only imagine what this card would perform like when put under water.


Raw and unadulterated framerates are the name of the game but due to the slight dips in frequency from the beginning to the end of testing, performance isn’t quite maintained. It does however highlight exactly why we insure that every test is done after a brief warm-up period.
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
Thermal Imaging / Acoustics & Power Consumption

Thermal Imaging



There are no extreme areas of concern on the TITAN X when it comes to thermals but, due to the high heat output, there are several surfaces that do get quite warm. I’d recommend letting the card cool down for a few minutes after an extended gaming session before handling it.


Acoustical Testing


What you see below are the baseline idle dB(A) results attained for a relatively quiet open-case system (specs are in the Methodology section) sans GPU along with the attained results for each individual card in idle and load scenarios. The meter we use has been calibrated and is placed at seated ear-level exactly 12” away from the GPU’s fan. For the load scenarios, Rise of the Tomb Raider is used to generate a constant load on the GPU(s) over the course of 15 minutes.


The TITAN X certainly isn’t a quiet card but it isn’t all that loud either. The fan does put out more noise than a GTX 1080’s but it does so in a controlled manner rather belting out a banshee-like scream.


System Power Consumption


For this test we hooked up our power supply to a UPM power meter that will log the power consumption of the whole system twice every second. In order to stress the GPU as much as possible we used 15 minutes of Unigine Valley running on a loop while letting the card sit at a stable Windows desktop for 15 minutes to determine the peak idle power consumption.


Now this is an interesting result but not one that was completely unexpected. NVIDIA’s advanced 16nm manufacturing process has proven to be extremely efficient and the GP102 within the TITAN X proves that once again. Despite the card supporting 12 billion transistors and 12GB of GDDR5X memory, it only requires about 27W more than a GTX 980 Ti and 18W more than AMD’s Fury X.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 1440P: Ashes of the Singularity / Fallout 4

Ashes of the Singularity


Ashes of the Singularity is a real time strategy game on a grand scale, very much in the vein of Supreme Commander. While this game is most known for is Asynchronous workloads through the DX12 API, it also happens to be pretty fun to play. While Ashes has a built-in performance counter alongside its built-in benchmark utility, we found it to be highly unreliable and often posts a substantial run-to-run variation. With that in mind we still used the onboard benchmark since it eliminates the randomness that arises when actually playing the game but utilized the PresentMon utility to log performance




Fallout 4


The latest iteration of the Fallout franchise is a great looking game with all of its detailed turned to their highest levels but it also requires a huge amount of graphics horsepower to properly run. For this benchmark we complete a run-through from within a town, shoot up a vehicle to test performance when in combat and finally end atop a hill overlooking the town. Note that VSync has been forced off within the game's .ini file.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 1440P: Far Cry 4 / Grand Theft Auto V

Far Cry 4


This game Ubisoft’s Far Cry series takes up where the others left off by boasting some of the most impressive visuals we’ve seen. In order to emulate typical gameplay we run through the game’s main village, head out through an open area and then transition to the lower areas via a zipline.




Grand Theft Auto V


In GTA V we take a simple approach to benchmarking: the in-game benchmark tool is used. However, due to the randomness within the game itself, only the last sequence is actually used since it best represents gameplay mechanics.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 1440P: Hitman / Rise of the Tomb Raider

Hitman (2016)


The Hitman franchise has been around in one way or another for the better part of a decade and this latest version is arguably the best looking. Adjustable to both DX11 and DX12 APIs, it has a ton of graphics options, some of which are only available under DX12.

For our benchmark we avoid using the in-game benchmark since it doesn’t represent actual in-game situations. Instead the second mission in Paris is used. Here we walk into the mansion, mingle with the crowds and eventually end up within the fashion show area.





Rise of the Tomb Raider


Another year and another Tomb Raider game. This time Lara’s journey continues through various beautifully rendered locales. Like Hitman, Rise of the Tomb Raider has both DX11 and DX12 API paths and incorporates a completely pointless built-in benchmark sequence.

The benchmark run we use is within the Soviet Installation level where we start in at about the midpoint, run through a warehouse with some burning its and then finish inside a fenced-in area during a snowstorm.[/I]


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 1440P: SW Battlefront / Division / Witcher 3

Star Wars Battlefront


Star Wars Battlefront may not be one of the most demanding games on the market but it is quite widely played. It also looks pretty good due to it being based upon Dice’s Frostbite engine and has been highly optimized.

The benchmark run in this game is pretty straightforward: we use the AT-ST single player level since it has predetermined events and it loads up on many in-game special effects.





The Division


The Division has some of the best visuals of any game available right now even though its graphics were supposedly downgraded right before launch. Unfortunately, actually benchmarking it is a challenge in and of itself. Due to the game’s dynamic day / night and weather cycle it is almost impossible to achieve a repeatable run within the game itself. With that taken into account we decided to use the in-game benchmark tool.




Witcher 3


Other than being one of 2015’s most highly regarded games, The Witcher 3 also happens to be one of the most visually stunning as well. This benchmark sequence has us riding through a town and running through the woods; two elements that will likely take up the vast majority of in-game time.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 4K: Ashes of the Singularity / Fallout 4

Ashes of the Singularity


Ashes of the Singularity is a real time strategy game on a grand scale, very much in the vein of Supreme Commander. While this game is most known for is Asynchronous workloads through the DX12 API, it also happens to be pretty fun to play. While Ashes has a built-in performance counter alongside its built-in benchmark utility, we found it to be highly unreliable and often posts a substantial run-to-run variation. With that in mind we still used the onboard benchmark since it eliminates the randomness that arises when actually playing the game but utilized the PresentMon utility to log performance




Fallout 4


The latest iteration of the Fallout franchise is a great looking game with all of its detailed turned to their highest levels but it also requires a huge amount of graphics horsepower to properly run. For this benchmark we complete a run-through from within a town, shoot up a vehicle to test performance when in combat and finally end atop a hill overlooking the town. Note that VSync has been forced off within the game's .ini file.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,861
Location
Montreal
DX11 / 4K: Far Cry 4 / Grand Theft Auto V

Far Cry 4


This game Ubisoft’s Far Cry series takes up where the others left off by boasting some of the most impressive visuals we’ve seen. In order to emulate typical gameplay we run through the game’s main village, head out through an open area and then transition to the lower areas via a zipline.




Grand Theft Auto V


In GTA V we take a simple approach to benchmarking: the in-game benchmark tool is used. However, due to the randomness within the game itself, only the last sequence is actually used since it best represents gameplay mechanics.


 
Top