AMD Ryzen Threadripper 1920X & 1950X Review
I’m going to start this article off with a simple number: five. Not only is that the number of months it has taken AMD to effectively turn the x86 processor world on its ear, but that’s also the number of distinct model families that they’ve introduced over a relatively short time. The rapid staccato of desktop market releases with Ryzen 7, Ryzen 5, Ryzen 3, the new Athlon X4’s and Bristol Ridge-based A-9000 APUs have left Intel scrambling to find answers. People who perennially root for the underdog have seen their dreams of a balanced competitive landscape come true. Now the sixth piece and literal cornerstone of AMD’s burgeoning foundation is finally being lifted into place; Ryzen Threadripper has arrived.
With Threadripper, AMD hopes to recapture at least a portion of the key high-end desktop market, a space where they haven’t been able to compete since the K8 microarchitecture back in 2003. For those keeping track at home, that was more than a decade ago, but during that time AMD hasn’t just been sitting on their laurels. Quite the opposite actually. There were promising architectures like Phenom and Bulldozer, but their actual performance metrics fell short of promises.
After the successes of previous Zen-based CPUs in the Ryzen lineup, there’s plenty of high hopes riding on Threadripper since its siblings are already proving their worth. But going toe-to-toe against Intel in the entry to mid-level market is one thing, stepping foot into Intel’s heavily guarded high-end desktop monopoly is something else altogether. Remember, Intel just recently staked their claim with a whole top-to-bottom lineup renewal with Skylake-X and Kaby Lake-X processors.
The opening (and surely not the last) salvo in AMD’s new HEDT barrage consists of three CPUs: the 1950X and 1920X – both of which are launching with availability today – and the 1900X that will be available sometime closer to the month’s end. We are expecting this lineup to expand both upwards and outwards as new products are launched to align better with Intel’s offerings.
Sitting at the top of AMD’s Ryzen Threadripper stack is the 1950X, which has a pretty lofty price of $999 and aligns perfectly (from a cost perspective at least) with the Core i9-7900X. Performance-wise, AMD might actually have a pretty significant edge since they are once again endeavoring to capitalize upon their core and thread count superiority. Whereas Intel’s current flagship features 10 cores and 20 threads, the 1950X takes things to obscene levels by featuring 16 cores and 32 threads operating at a base clock of 3.4GHz and a 4-core boost speed of up to 4.0GHz or higher with AMD’s Extended Frequency Range technology (XFR).
Stepping back a little bit brings us to the Ryzen Threadripper 1920X, which is another relatively expensive processor at $799, but it slots perfectly into a bracket where Intel doesn’t have a clear-cut alternative. The i9-7900X costs $200 more and the i9-7820’s 8/16 core/thread layout seems woefully underpowered in comparison to the mid-level Threadripper model’s 12 cores and 24 threads. According to AMD, they are hoping the 1920X proves to be the 7900X’s equal in all things. If that proves to be the case, then Intel better look long and hard at their possible responses.
The Ryzen Threadripper 1900X’s full set of specifications may be a bit nebulous at this point, but what we do know about it is quite promising. With eight cores and sixteen threads, this is the point of entry into AMD’s new TR4 / X399 platform. Priced at $549 it is actually quite well positioned both as a step up from the Ryzen 7 lineup and as a competitor to several of Intel’s key CPUs. Basically, the 1900X’s price causes it to land smack in between the $599 i9-7820X – which also supports up to 16 threads – and the less expensive 12-thread $399 i9-7800X. This battle will certainly be an interesting one, I’ll tell you that.
There are of course a few common threads that run throughout AMD’s HEDT product stack, some of which are basic carryovers from the other Zen-based products. While I’ll be discussing this ad nauseam on Page 3, even though Threadripper handles its quad-channel memory allocation in a very different way, its memory speed limitations remain identical to those on Ryzen 7, 5 and 3. That means a DDR4-2400 memory speed for optimal compatibility (which also happens to be the platform’s reference spec), whereas DDR4-2666 and DDR4-3200 are considered “overclock” frequencies. Anything over DDR4-2666 will be challenging for higher density 64GB kits, and all but impossible for the time being if all eight DIMM slots are occupied.
Threadripper is also a great example of how there’s no such thing as a free lunch when it comes to the way physical cores impact overall power consumption. While AMD’s performance-per-watt ratio looks to be extremely good, cramming 16 Zen cores and their associated I/O connections onto a single package pushes the TDP to 180W.
As with all of the other Ryzen processors, there’s a bit more to Threadripper clock speeds than what first meets the eye. Both of these CPUs have identical frequencies, other than a slight 100MHz uptick in the 1920X’s base clock. However, it’s what happens below that number and in real-life which really counts. The numbers you see above are AMD’s specifications versus the actual speeds I observed in each scenario during testing.
These differences are likely due to a combination of temperatures, power consumption and other internal factors which are taken into account by the onboard microcontrollers. Our 1950X sample never did reach its maximum All Core Boost or XFR frequencies. This is normal according to AMD since the specifications are simply guidelines that the chip strives to achieve rather than law written into stone saying “your processor WILL reach these speeds”. Also remember that these can fluctuate by 25MHz increments in any application as the Precision Boost algorithms strive to maximize performance.
The last distinguishing factor here is PCI-E lane allocation. Unlike Intel, AMD has decided to not lock out any of their chips’ interconnect bandwidth so every one of these Threadripper processors comes with 60 PCI-E 3.0 lanes and an additional four lanes for communication between the CPU and chipset. Compare and contrast this with Intel’s flaccid 28-lane i9-7800 / i7-7820X, along with the 44-lane i9-7900X and you can see why AMD believes lane allocation could allow them to win big. This is what also allows AMD to justifiably charge more for the 1900X compared with the 20-lane Ryzen 7 1800X.
So there you have Threadripper in a nutshell. Now I know many of you will simply want to skip ahead and check out those juicy benchmarks, but I’m also going to encourage you to check out the other pages of this review as well. There’s some key information about things like the architecture, new memory modes, installation procedures, and the X399 chipset that are worth a read. On we go!
The Mother Of All Boxes & Installation Takes a Turn
There are specs and benchmarks aplenty to discuss Threadripper is also about the experience, every aspect of which makes it feel like a premium product. Unlike Intel’s “boring as beige” approach to their packaging scheme, Threadripper’s box seems to have been thought up by a marketing savant who was crossed with an industrial design perfectionist. It is absolutely brilliant and shows that AMD is going above and beyond the call of duty in their catering to enthusiasts and folks who will be “unboxing” these things on social media.
There’s a clear window into the box’s soul where you are able to see the processor. Actually getting inside involves turning an airlock-like knob at the rear once the actual acrylic box is separated from the protective embraces of its foam holder. At this point you’d almost expect a dramatic “pssssht” followed by smoke effects straight out of a Han Solo carbonite scene.
The knob is actually part of a pretty elaborate chip holder that has a massive Ryzen Threadripper processor nestled within. But the unboxing process doesn’t end there either since the next step is to not-so-gently tug on the metallic spring mechanism which puts pressure on the clamshell casing and then finally press the two plastic tabs to lift off the outer protective shell. Yeah, AMD is asking you to jump through some hoops here but it sure as hell beats out what the other guys are offering.
The processor itself is not only massive but it also comes cradled in a secondary orange plastic sleeve which just adds to the visual bulk. However, don’t think this is just for show; the plastic bracket actually plays a key role during the installation process and should never be removed. More on that below.
Below the processor lies an installation manual (you’ll likely want to take a look at it since installing Threadripper is a unique experience) as well as two key tools. There’s a Torx style screwdriver with an automatic stop function to insure you don’t over-tighten the CPU retention plate. It is also used to loosen the three screws which hold down the socket bracket to prepare the area of reception of a CPU.
An adapter bracket for most Asetek-sourced liquid coolers is also included. Head over to AMD’s microsite to check out which coolers are compatible. https://www.amd.
Due to its gargantuan size, Threadripper certainly isn’t easy to manipulate so AMD borrowed a book out of HP’s solution book. Whereas HP had their so-called “Smart Socket” to facilitate installation, this version of Ryzen has the same style of chassis protection that also acts as a type of sled which slides into the TR4 retention bracket and then clicks into place.
The next step is to lower the Threadripper processor within its retention bracket and push down the two blue tabs until they too click into place.
The final step is to simply reuse the supplied Torx screwdriver to tighten down the main retention plate by following the simple to understand “1, 2, 3” order engraved on the bracket.
While this installation process may seem to be a bit convoluted and it certainly isn’t idiot proof (some nimrods have already posted “how to” guides that have the orange caddy removed), we have to appreciate AMD’s approach. Installing large processors can lead to unforeseen issues and this clearly explained procedure should alleviate any hiccups.
Seeing Double; Inside Threadripper
Whereas many of AMD’s previous chip designs didn’t live up to understandably high expectations, Zen represents a fundamental shift on many levels. As we already described at length during the original Ryzen 7 article, this architecture was a complete rethink rather than simply an evolution of a previous effort. That’s an important distinction to make with a CPU series like Threadripper since simply evolving wouldn’t have allowed AMD to even think of competing in the HEDT space. Plus, the results already speak for themselves in lower price brackets; this thing is the real deal so let’s start at the top and work our way down.
The primary building block of any Ryzen-based processor is the Compute Complex or CCX. Each of these has four Zen processing cores with 2MB of L2 cache (512K per core), 8MB of shared L3 cache and the ability to process eight concurrent threads. As with other Zen-based processors, Threadripper also comes with a full suite of SenseMi technologies like Precision Boost, Pure Power, Extended Frequency Range, Neural Net Prediction and Smart Prefetch. You can check out our Ryzen 7 launch day article for more information about those.
Put two of these CCX’s together which communicate with one another over AMD’s Infinity Fabric high speed interconnect and you have the baseline die layout of every Ryzen 7, 5 and 3 processor we’ve seen to date. What has been done here is actually quite interesting, be it from a positive or negative standpoint since every die produced to date actually has eight cores. In order to create new SKUs AMD has simply put all of their dies through a binning process wherein they make the cut for either Ryzen’s 8, 6 and 4-core variants.
Granted, the sheer number of transistors causes some challenges on the TDP and processing efficiency fronts but the Infinity Fabric is supposed to be versatile enough to (somewhat) compensate. This approach has also allowed AMD to rapidly roll out a huge number of processors in a short about of time, putting pressure on Intel’s entire lineup without having to drastically redesign new die packages. How this all translates to Ryzen Threadripper should be obvious by now but according to AMD only the top 5% of dies actually make it into these high end processors.
Threadripper takes that dual CCX approach and turns it up to eleven by basically taking a pair of dual CCX dies and installing them onto a single processing package. Think of this as two Ryzen 7 1800X’s melded together.
Those two dies communicate across yet another Infinity Fabric link resulting in a trio of interconnects and a die to die bi directional bandwidth of 102.22 GB/s. With that being said, this distinct die-based structure could very well lead to some higher on-chip latencies and lower performance metrics that a more traditional design. On the positive side it has allowed AMD to move forward with an extremely scalable architecture which can be easily adapted for various usage scenarios.
A good example of this adaptability is how the lineup of Threadripper processors was created. Whereas the 1950X has the full array of cores enabled across all four CCXs and two dies in a 4+4 / 4+4 pattern, the 1920X has a quartet of evenly distributed cores disabled creating a 3+3 / 3+3 layout. Almost assuredly the eight core 1900 has an even simpler 2+2 / 2+2 distribution.
Other than the raw processing potential of AMD’s Threadripper, each of these massive processors also acts as a fully fledged system on a chip. Each has access to 60 PCIe Gen3 lanes that can be divided up between graphics expansion slots and NVMe storage interfaces and up to eight USB 3.1 Gen1 connections through its high speed IO interface. There is also an integrated high definition audio codec. This is supposed to alleviate bottlenecks for storage devices which, on some of Intel’s platforms, have to vie for bandwidth on a limited DMI interface.
Perhaps the most interesting aspect of this design is how it handles memory requests and I’ll get into that on the next page.
UMA & NUMA; Threadripper’s Memory Architecture
One of the inherent challenges that arises when working with multi die CPU packages is memory access latency. We see this on multi processor systems and to a lesser extent on some of Intel’s much larger Xeon CPU setups. Simply put, everything from the cores’ physical size to interconnect length works against optimal speeds for memory transactions.
In the case of Threadripper, these effects are both multiplied and minimized in different ways. In each processor there are two very distinct sets of memory channels of which each set is somewhat paired up with one of the two die packages. While this creates a quick linkage of 78ns to each die’s “nearest” channels, distributing access across all memory channels can increase latency to a massive 133ns when a dual CCX die is forced to communicate with the “furthest” channels.
The reason for this is pretty simple: Ryzen Threadripper aren’t native quad channel processors. Rather, AMD has –for lack of a better term- spliced two processors together with much of their key communications being funneled through the Infinity Fabric. This can cause notable memory latency increases as well so a rather novel idea was implemented: giving the user access to a number of Memory Access Modes.
At this time AMD allows you to choose between either Distributed or Local modes in the BIOS or the Ryzen Master Software. Meanwhile the Auto setting represents AMD’s default which is Distributed Mode and the one they actually recommend you use on a regular basis. I’ll be testing these modes further a bit later on in the review but for the time being lets get a quick rundown about what these modes are intended to accomplish.
Within Distributed Mode the system is placed into a Uniform Memory Access configuration or UMA. UMA basically endeavors to balance memory transactions across all DRAM channels equally, capitalizing upon the architecture’s quad channel layout. In essence this significantly boosts available bandwidth and can benefit applications that require wide memory access like After Effects, Adobe Premier, Blender and 3DS Max.
The downside to this configuration is that it sacrifices latency for maximum bandwidth since each of those dies is trying to access even the furthest memory channels. This is where AMD’s Local Mode enters into our equation.
The Local Mode is far more nuanced than Distributed since it localizes memory access to the channels which are physically nearest to the cores processing the workload by placing the system into a Non-Uniform Memory Access (NUMA) configuration. In most situations where Threadripper has an advantage (read: high level multi threaded workloads) this will actually reduce performance since it lowers latency while also reducing available bandwidth. But there is one key area which could benefit this NUMA setup: games.
According to AMD, their research indicates that quite a few titles benefit more from lower latencies than they do from higher bandwidth. This is actually one of the reasons we tend to run our processor benchmarks at DDR4-2666 but with very tight timings – those settings finely balance bandwidth and access times. Bu that’s not it, during AMD’s testing, they found something else out:
Further, Local (NUMA) hints to the OS scheduler that a modestly-threaded application should stay resident in one die and prefer the near-connected memory until full (at which point there is spillover to the next chunk of RAM). This die residency minimizes the chance that game threads (e.g. physics, AI, sound) with high synchronization requirements will be split to another die with longer round trip times. This, in addition to faster memory access, also improves performance specifically in games.
So it seems like the scheduler in Windows 10 will treat the NUMA setup as a more holistic entity rather than a split section of two distinct processing nodes. As a result it could (there is no guarantee of this) more efficiently handle in-game workloads.
But does this theory translate to reality? We’ll find out a bit later.