What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

AMD Threadripper 2950X Performance Review

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
So here we are in what’s the first review on the Hardware Canucks website in an embarrassingly long time. What’s been going on behind the scenes will all be revealed soon but for the time being let’s get things started again with AMD’s newest introduction: the second generation Threadripper 2 processors.

By now you’ve likely seen our Explained video as well as our first build being put together but what was missing should have been obvious. There were no benchmarks other than a few AMD-provided performance results which showed their new processors in a particularly favorable light against both Skylake-X and previous generation Threadripper CPUs. Well this review will be used to fill in those blanks which the build and Explained video left out.

But this review will be just that; a benchmark-heavy affair with only a bit of time spent on the architectural differences which the Zen+ evolution brings to the table. If you want to see a fuller explanation about what makes Threadripper and Zen in general “tick”, then I recommend you go check out our original review and even our original Ryzen article. In essence everything has remained pretty much the same since AMD will continue to use their highly capable X399 platform as a foundation upon which to build their HEDT lineup.

Meanwhile, Zen+ represents an evolution –albeit an important one- of the game-changing Zen microarchitecture but there are several key differences this time around on both the core and software side of things. Those will be detailed on the next page but for now let’s take a look at what this new lineup brings to the table.


Let’s cut right to the heart of things by quickly introducing the four new processors that will be part of this refreshed lineup. Starting right at the top, there’s the big daddy 2990WX which is the first 32-core, 64-thread processor to be available for the desktop market. Its smaller sibling is the slightly cut down but nonetheless insanely capable 24-core, 48-thread 2970WX. Naturally, these CPUs will obviously cost a king’s ransom at $1800 and $1300 respectively but when you compare this Intel’s closest competitors -the $2,000, 36 thread i9-7980XE and $1700, 32 thread i9-7960X- these CPUs seem to be a very, very good value indeed.

There’s a caveat here too and one that AMD has been extremely transparent about: the WX-series is absolutely not for gamers or even people who want to game while also processing a video and streaming at the same time. Those folks would be much better served by Ryzen 2, the X-series or Coffee Lake processors. The WX-series on the other hand is specifically targeted towards creative professionals, people working with parallelized virtual machines, high level visualization or other extremely multithreaded tasks. I can sympathize with AMD on this since they don’t want their bleeding edge CPUs’ already limited availability to be impacted by folks who can’t take full advantage of them.


Speaking of that X-series, the 2950X and 2920X will be more than capable enough to power through games, online streaming and other tasks all at the same time. Priced at $900 and $650 respectively, they’re well within reach for enthusiasts but still not that inexpensive for gamers. Again, people who just want to game should look towards Coffee Lake and Ryzen 2 while maximizing their GPU and storage purchases but you’ll get that hammered home later on in this review anyways.

One of the highlights of Zen+ is its move to an advanced 12nm manufacturing process. As a result, AMD has been able to leverage Threadripper 2 towards higher overall clock speeds without sacrificing power consumption. That’s a pretty important distinction since a boost in overall frequencies will allow these new CPUs to better compete against Intel’s Skylake-X family. Remember, each Skylake-X core has higher IPC rates than Zen (and now Zen+) which allows those processors to boast higher performance when identical core-count chips are compared. AMD is still able to offer more cores for higher level application execution but those clock speed uplifts will be a welcome addition nonetheless.


By focusing solely on the 2950X and 2920X we can see their baseline specifications haven’t changed all that much when compared to their predecessors. Other than the very minor uplift to the 2950X’s base clock, the real differences between generations are notable in the boost speeds and memory support. Naturally, some of that higher frequency is due to the aforementioned manufacturing process efficiencies but there’s also Precision Boost 2 and XFR 2 factored into this equation, both of which contribute to more clock speed overhead. More on those a bit later. There’s finally higher level memory support too with 2933MHz being a new target speed bin.

Something else which needs to be taken into account is the price at which these CPUs are launching at. While there has been very little to any movement in Skylake-X’s cost hierarchy since launch, AMD is moving into very aggressive territory with second generation Threadripper SKUs. The 2950X will go for a solid $100 less than the 1950X while the 2920X takes a massive $150 bite out of its predecessor’s initial cost.

Part of the latter’s large downwards shift is due to AMD’s decision to not launch a replacement for their 1900X. The reasoning behind this should be self-evident: with 8 core, 16 thread Ryzen 2 processors there was no reason to have an overlapping Threadripper part. Intel learned these lessons themselves with the ill-fated Kaby Lake-X series.


The ace up AMD’s collective sleeves this time around is the 32-core 2990WX which represents the highest number of cores AMD’s current dies can achieve and effectively matching AMD’s own EPYC series payload. So what you see is what you get folks; if AMD and users want more cores they’ll have to switch to a brand new architecture.

Here, instead of two dies being enabled as they are in the X-series, the processor’s full allotment of four dies and eight CCX’s (each with four physical cores) get kicked into action. The only difference between this layout and that of EPYC is the 32-lane PCIe and dual channel memory controllers in the first and third dies are disabled since X399 doesn’t support eight channel memory or 128 PCIe lanes.

As usual, these dies communicated with one another over the Infinity Fabric high speed interconnect but due to this 4-die topology, support for distributed mode or Unified Memory Access (UMA) isn’t possible. That means WX-series CPUs will default to a localized NUMA configuration which should be a slight bit better for gaming while suffering extremely small penalties in certain non-gaming applications. It should also be mentioned that die-to-die bandwidth decreases from 50GB/s in the X-series to 25GB/s here.


The last thing I wanted to mention before getting into the meat of this review is how AMD will be staggering this launch. Threadripper 2 products will be trickling out rather than being available all at the same time. The first one out of the gate will be Threadripper 2990WX which starts shipping on August 13th which happens to be today and that’s followed by the 2950X on August 30th. For those of you wondering, presales of these two processors have been ongoing since last Monday. Then we’ll have to wait all the way until October for the 2970WX and 2920X. I have to wonder if these dates are being planned around Intel’s rumored launch schedule or it’s just a matter of insuring availability.

With all of that being said, this is an exciting time for AMD. Not only have they proven Zen can effectively be scaled upwards for the desktop market but also that continual evolution can yield a more competitive landscape. If they continue on this course, Intel could find themselves fighting an uphill battle against Zen 2 before this time next year. But until that point, let’s take a bit of a deeper –albeit brief- dive into what these new processors have to offer.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
XFR 2 & Precision Boost 2 Explained

XFR 2 & Precision Boost 2 Explained


By this point in time it should be obvious that second generation Threadripper CPUs represent an evolutionary step forward for the Zen architecture rather than a revolution. Part of that entails a switch to Globalfoundries’ 12nm LP manufacturing process but feature-level support has also been improved.

According to AMD, the so-called Zen+ architecture boasts up to 15% better cache latency and an improvement of about 2% to memory latency. These may not sound like a huge improvement but when combined with other advances, these new CPUs could achieve between 4% and 15% better performance than their predecessors.


One of the major contributors to the 2950X’s advantage over the 1950X is the inclusion of Precision Boost 2. Much like the original Precision Boost, this version uses an algorithm that monitors things like temperature, power consumption and electrical current in an effort to maximize clock speeds. But the first generation technology had some limitations.

First and foremost, it only recognized two distinct clock speed states, those being an “all core” boost and a “four core” boost. That meant a lot of potential performance was being left on the table when a workload didn’t precisely match one of those two states. For example, all of the cores could have been engaged in a workload but said workload may not have been taxing every thread to 100%. In that situation, Precision Boost would have downclocked the CPU to an all-core boost state even though there was likely clock speed headroom to be gained.

Precision Boost 2 on the other hand does away with those two arbitrary “states” and uses an algorithm which strives to achieve the highest possible frequency regardless of the number of threads. Naturally, it will still be limited by thermal or power constraints but it is much more graceful in its approach.

The chart above illustrates this point very well. The dashed white line represents the algorithm’s predetermined curve or target frequency whereas the orange line shows actual achieved clock speeds at various thread counts. Rather than the extremely jagged graph we originally saw in previous Ryzen and Threadripper CPUs, this one is more linear and closer to optimal even though it still utilizes 25MHz frequency increments.


Another blast from the past the Extended Frequency Range or XFR but this time around it has been massaged to deliver a higher level of overall performance. Whereas Precision Boost 2 is designed to opportunistically take advantage of thermal/load/power/electrical headroom its built-in limits are still somewhat conservative. This is because they are designed with sub-optimal situations in mind, thus laying out baseline “minimum” frequency specifications.

XFR steps into the game by rewarding users who have better cooling setups by further boosting clock speeds –to a certain level- beyond the typical range of Precision Boost. Unlike the first generation technology which extended the range of a small number of cores, if given the right thermal operating conditions, XFR 2 is able to further enhance the frequencies any number of cores.

Supposedly that can lead to a performance boost of up to 16% in the 2990WX’s case. Unfortunately we didn’t receive information about the 2950X but the headroom will be somewhat more constrained due to its lower operating temperature.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Test Setups & Methodology

Test Setups & Methodology


For this review, we have prepared a number of different test setups, representing many of the popular platforms at the moment. As much as possible, the test setups feature identical components, memory timings, drivers, etc. Aside from manually selecting memory frequencies and timings, every option in the BIOS was at its default setting.


For all of the benchmarks, appropriate lengths are taken to ensure an equal comparison through methodical setup, installation, and testing. The following outlines our testing methodology:

A) Windows is installed using a full format.

B) Chipset drivers and accessory hardware drivers (audio, network, GPU) are installed.

C)To ensure consistent results, a few tweaks are applied to Windows 10 and the NVIDIA control panel:
  • UAC – Disabled
  • Windows HPET – Disabled
  • Indexing – Disabled
  • Superfetch – Disabled
  • System Protection/Restore – Disabled
  • Problem & Error Reporting – Disabled
  • Remote Desktop/Assistance - Disabled
  • Windows Security Center Alerts – Disabled
  • Windows Defender – Disabled
  • Screensaver – Disabled
  • Power Plan – High Performance
  • V-Sync – Off
  • All BIOS-enabled performance enhancements - Disabled
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
System Benchmarks: AIDA64

AIDA64 Extreme Edition


AIDA64 uses a suite of benchmarks to determine general performance and has quickly become one of the de facto standards among end users for component comparisons. While it may include a great many tests, we used it for general CPU testing (CPU ZLib / CPU Hash) and floating point benchmarks (FPU VP8 / FPU SinJulia).


CPU PhotoWorxx Benchmark
This benchmark performs different common tasks used during digital photo processing. It performs a number of modification tasks on a very large RGB image:

This benchmark stresses the SIMD integer arithmetic execution units of the CPU and also the memory subsystem. CPU PhotoWorxx test uses the appropriate x87, MMX, MMX+, 3DNow!, 3DNow!+, SSE, SSE2, SSSE3, SSE4.1, SSE4A, AVX, AVX2, and XOP instruction set extension and it is NUMA, HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.




CPU ZLib Benchmark

This integer benchmark measures combined CPU and memory subsystem performance through the public ZLib compression library. CPU ZLib test uses only the basic x86 instructions but is nonetheless a good indicator of general system performance.



CPU AES Benchmark

This benchmark measures CPU performance using AES (Advanced Encryption Standard) data encryption. In cryptography AES is a symmetric-key encryption standard. AES is used in several compression tools today, like 7z, RAR, WinZip, and also in disk encryption solutions like BitLocker, FileVault (Mac OS X), TrueCrypt. CPU AES test uses the appropriate x86, MMX and SSE4.1 instructions, and it's hardware accelerated on Intel AES-NI instruction set extension capable processors. The test is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.



CPU Hash Benchmark

This benchmark measures CPU performance using the SHA1 hashing algorithm defined in the Federal Information Processing Standards Publication 180-3. The code behind this benchmark method is written in Assembly. More importantly, it uses MMX, MMX+/SSE, SSE2, SSSE3, AVX instruction sets, allowing for increased performance on supporting processors.



FPU VP8 / SinJulia Benchmarks

AIDA’s FPU VP8 benchmark measures video compression performance using the Google VP8 (WebM) video codec Version 0.9.5 and stresses the floating point unit. The test encodes 1280x720 resolution video frames in 1-pass mode at a bitrate of 8192 kbps with best quality settings. The content of the frames are then generated by the FPU Julia fractal module. The code behind this benchmark method utilizes MMX, SSE2 or SSSE3 instruction set extensions.

Meanwhile, SinJulia measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified "Julia" fractal. The code behind this benchmark method is written in Assembly, and utilizes trigonometric and exponential x87 instructions.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
System Benchmarks: Cinebench / PCMark 8 / WPrime

CineBench R15 64-bit


The latest benchmark from MAXON, Cinebench R15 makes use of all your system's processing power to render a photorealistic 3D scene using various different algorithms to stress all available processor cores. The test scene contains approximately 2,000 objects containing more than 300,000 total polygons and uses sharp and blurred reflections, area lights and shadows, procedural shaders, antialiasing, and much more. This particular benchmarking can measure systems with up to 64 processor threads. The result is given in points (pts). The higher the number, the faster your processor.



PCMark 8


PCMark 8 is the latest iteration of Futuremark’s system benchmark franchise. It generates an overall score based upon system performance with all components being stressed in one way or another. The result is posted as a generalized score. In this case, we didn’t use the Accelerated benchmark but rather just used the standard Computational results which cut out OpenCL from the equation.




WPrime


wPrime is a leading multithreaded benchmark for x86 processors that tests your processor performance by calculating square roots with a recursive call of Newton's method for estimating functions, with f(x)=x2-k, where k is the number we're squaring, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum. This is a highly multi-threaded workload. Below are the scores for the 1024M benchmark.

 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Single Thread Performance

Single Thread Performance


Even though most modern applications have the capability to utilize more than one CPU thread, single threaded performance is still a cornerstone of modern CPU IPC improvements. In this section, we take a number of synthetic applications and run them in single thread mode.

 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Productivity Benchmarks: 7-Zip / Adobe Premier Pro

7-Zip


At face value, 7-Zip is a simple compression/decompresion tool like popular applications like WinZip and WinRAR but it also has numerous additional functions that can allow encryption, decryption and other options. For this test, we use the standard built-in benchmark which focuses on raw multi-threaded throughput.



Adobe Premier Pro CC


Adobe Premier Pro CC is one of the most recognizable video editing programs on the market today as it is used by videography professionals and YouTubers alike. In this test we take elements of a 60-second 4K video file and render them out into a cohesive MP4 video via Adobe’s Media Encoder. Note that GPU acceleration is turned on.

 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Productivity Benchmarks: Blender / 3ds MAX Corona

Blender


Blender is a free-to-use 3D content creation program that also features an extremely robust rendering back-end. It boasts extremely good multi core scaling and even incorporates a good amount of GPU acceleration for various higher level tasks. In this benchmark we take a custom 1440P 3D image and render it out using the built-in tool. The results you see below list how long it took each processor to complete the test.



3ds MAX Corona Renderer


Autodesk’s 3ds MAX is currently one of the most-used 3D modeling, animation and rendering programs on the market, providing a creative platform for architects to industrial designers alike. Unfortunately its rendering algorithms leave much to be desired and third party rendering add-ons are quite popular. One of the newest ones is called Corona.

In this test we take a custom 3D scene of a room with global illumination enabled and render it out in 720P using Corona’s built-in renderer.


 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Productivity Benchmarks: GIMP / Handbrake

GIMP


While it may be open source, GIMP is actually one of the most popular free photo editors available right now. It uses both CPU and GPU acceleration for certain tasks. In this test we use an 8K image and use a script to run eight different filters in succession. This is considered a lightly threaded workload since the memory, CPU and storage drive can all play a role in performance.




Handbrake


Video conversion from one format to another is a stressful task for any processor and speed is paramount. Handbrake is one of the more popular transcoders on the market since it is free, has a long feature list, supports GPU acceleration and has an easy-to-understand interface. In this test we take a 6GB 4K MP4 and convert it to a 1080P MKV file with a H.264 container format. GPU acceleration has been disabled. The results posted indicate how long it took for the conversion to complete.

 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
12,900
Location
Montreal
Productivity Benchmarks: POV Ray / WinRAR

POV Ray 3.7


POV Ray is a complex yet simple to use freeware ray tracing program which has the ability to efficiently use multiple CPU cores in order to speed up rendering output. For this test, we use its built-in benchmark feature which renders a high definition scene. The rendering time to completion is logged and then listed below.



WinRAR


WinRAR is one of those free tools that everyone seems to use. Its compression and decompression algorithms are fully multi-core aware which allows for a significant speedup when processing files. In this test we compress a 3GB folder of various files and add a 256-bit encryption key. Once again the number listed is the time to completion.

 
Top