What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

AMD Bulldozer FX-8150 Processor Review

Status
Not open for further replies.

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
After countless years of gossip, presentations, leaks, and showcases AMD's new Bulldozer processors are finally here. This is AMD's attempt at making an architecture with significant multi-threaded performance but at a price which most people can afford.

Many believe that AMD’s last “great” architecture arrived back in the Socket 754 / 939 days when the Athlon 64 processors found themselves competing against and in many ways beating the Pentium 4 black and blue. In those days names like Clawhammer, Sledgehammer, Newcastle and San Diego came to the forefront but somehow their successors never again competed on a level footing against Intel’s flagship processors. AMD needed to get back into the game and their Bulldozer architecture promised exactly that but it didn’t come easy. There were several delays and transitioning to the 32nm manufacturing process seemed to have been particularly hard for AMD’s manufacturing partner. Nonetheless, here we stand with a new architecture primed and finally ready to go.

Bulldozer is actually a broad name for an architecture that will be with us for a long, long time and should eventually encompass several different product categories. On the desktop side we will see chips carrying the Zambezi codename along with a relaunch of the FX family of chips, which was last used in 2007 for the AMD Quad FX platform and Athlon 64 FX processors. Like past FX-series chips, these will ship unlocked which is pretty cool given that we know that they have plenty of frequency headroom. This is an effort to brand these new chips as premium, enthusiast-class products but we'll see if they have been successful.

Initially, four FX series Zambezi processors will be launched which offer between four and eight cores. True to AMD’s mantra over the last few years, they are once again eschewing the enthusiast level pricing and have gone straight to the budget friendly $115 to $245 price points. Now that may sound like the fight against high end Sandy Bridge processors has already been lost but Bulldozer supposedly has a few tricks up its sleeve to even things out a bit.

At this moment, AMD are years behind Intel when it comes to overall processor performance, specifically when it comes to lightly-threaded workloads. Situation is about to get worse with Sandy Bridge-E LGA2011 right around the corner and Ivy Bridge coming in early 2012. With years still to go on this new Bulldozer architecture, this is pivotal moment for them to catch up or fall even further back. With 8 cores on tap, we think the initial FX-series offerings will certainly have the multi-threaded aspect covered. However, the real question is whether AMD has managed to increase IPC enough to give these new processors a chance at competing with Intel's all-powerful Sandy Bridge LGA1155 chips.

 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Codename "Zambezi" – AMD FX Series

Codename "Zambezi" – AMD FX Series




Gulftown/Sandy Bridge/Phenom II/Zambezi - Click on image to enlarge

After being teased for oh so many years, Bulldozer is officially here. As mentioned in the introduction, AMD have sought fit to re-launch their vaunted FX series branding, which has historically only been used on the highest-end enthusiast-oriented hardware. While the consumer desktop chips are all part of the FX series of products, their codename is Zambezi, a name which was taken from one of the longest rivers in Africa. This is actually an apt moniker, because as you read about later on, the Bulldozer microarchitecture has some very long pipelines.

As you can see, the processor itself retains the same look and dimensions as the previous generations, but what lies underneath the integrated heatspreader is radically different. Before going into any specific details it should be highlighted that as was the case with the Llano APU's, AMD simply could not have manufactured these Bulldozer parts without having made the move to GlobalFoundries new 32nm manufacturing process. The Zambezi CPU die features over 2 billion transistors, which is a whopping number when compared to the 758 million of the Phenom II X4 and 904 million of the Phenom II X6. Just as a point of reference, Sandy Bridge comes in a relatively svelte 995 million transistors, while the six-core/twelve-thread Gulftown clocks in at 1.17 billion transistors. Transistor counts are obviously not what most people are interested in, so let's dive right into the juicy specs outlined on the tablet below:


Click on image to enlarge

Clearly, on paper these new chips look terrific. They all have very high default clock speeds and/or can Turbo up to very high levels. Many of the models have gotten a small northbridge frequency bump (up from 2000Mhz on Phenom II), they all sport a much faster HyperTransport Link interface (up from 4.0 GT/s), have native support for DDR3-1866 memory modules. Perhaps most impressively these Zambezi processors have a truck load of cache (up to 16MB in total), and some fancy new instructions that are more advanced then anything that Intel will have to offer for the foreseeable future. The TDP numbers are about what we would expect, which is to say identical with previous generations, but then again AMD always overestimates heat output.

These new processors look even better when we focus on the price. $245 for the top of the line 8-core CPU that can turbo up to 4.2Ghz? Deal of the century, right? Not so fast, this clearly shows that AMD lacks a little faith in their flagship part, since its not willing to go head-to-head price-wise with the four-core/eight-thread Core i7-2600K, which typically retails for about $315. It is however in direct competition with the $220 Core i5-2500K, a four-core/four-thread part. As you will see in the coming pages, they ultimately made the right decision...kind of. By the way, today AMD are launched the FX-4100, FX-6100, FX-8120, and of course the FX-8150 that is the focus of this review. Expect the other models to launch in the not too distant future.

Before we get into any discussions over the microarchitecture and performance, let's take a quick look at the new packaging for the FX-8150, as well as closer look at the chip itself.


Click on image to enlarge

As you can see, AMD has totally overhauled their packaging when it comes to Zambezi, at least with regard to the flagship part. The FX-8150 will ship in this 5" x 5" x 3" metal container with a standard heatsink that looks quite similar to the one that shipped with the Phenom II X6 processors. As has been leaked in these past few weeks, AMD is indeed planning on releasing a SKU with a closed-loop liquid cooler, similar to the Corsair Hydro series, which should be available shortly after launch. We will be getting our hands on one in the coming days.


Click on image to enlarge

As you can see, our chip was manufactured in the 35th week of 2011. That is a three full months later than our recently reviewed A6-3650 APU sample. This is some very fresh silicon. Like every other AMD processor in recent history, this part's CPU die was manufactured at Fab 1 in Dresden, Germany and assembled in Malaysia.

Although it would take an expert to spot it, these new processors actually have two additional pins when compared to Phenom II's, it still fits in the standard AM3+ socket. Zambezi was designed to work with AMD's newest 9-series chipsets, but it should also work on socket AM3 motherboards, but that will obviously depend on motherboard manufacturers releasing compatibles bioses...and whether they properly designed their motherboards to deal with the higher power requirements that an 8-core chip requires. On a site note, those of you considering purchasing a Zambezi processor should take a look at this blog post from AMD, just to ensure that you have a painless upgrade process.


Click on image to enlarge

With Zambezi, the lowest idle clock speed has been increased from 800Mhz to 1400Mhz in order to lessen the latency when ramping up to a higher performance state. This is a neat little idea, and it should come in handy since the FX-8150 can ramp up to an impressive 4200MHz. As you can see, the chip's core voltage changes radically depending on the performance state, but our sample topped out at 1.380V, exactly the same as our Phenom II X4 but lower than the 1.44V level of our Phenom II X6's. Obviously every chip will be different, but this gives you an indication that AMD really put high frequency above lowering power consumption. Although our engineesing sample doesn't have a revision listed, we believe that it is a C2, which is what all retail chips should be.

Next up let's take a closer look at the new Bulldozer microarchitecture that forms the basis for these Zambezi processors.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Clock-per-clock: Deneb vs. Zambezi

Clock-per-clock: Deneb vs. Zambezi



One of the questions that we have seen asked over and over during the last few years is how much faster will Bulldozer be clock-per-clock when compared with Phenom II. Given the radically new microarchitecture it was pretty much impossible for anyone to give a credible estimate, but today we are going to attempt to answer that very question.


Click on image to enlarge

Here is the duel: FX-8150 "Zambezi" versus Phenom II X4 980 "Deneb". To ensure that both chips were competing on a roughly equal playing field, we set identical frequencies and timings for both processors, and we also disabled the FX-8150's extra cores. However, we allowed the FX-8150 two advantages in the form of its faster 2200MHz northbridge frequency (vs. 2000MHz for Phenom II) and 2600Mhz HyperTransport Link (vs. 2000MHz for Phenom II).

(*EDIT*: Check out the bottom page for updated and more accurate clock-per-clock results) Now you should take these results with a grain of salt. Unlike with Intel's chips, where it is easy to disable cores from within the BIOS, there is no such luxury with Bulldozer at this time. Therefore, we had to limit the FX-8150's number of cores from within the OS. This fact and perhaps some peculiarities when it comes to how Windows 7 assigns workloads to the Bulldozer microachitecture might have caused exaggerated results. We will be better able to gauge C-P-C performance once we get our hands on a true four-core Zambezi chip.


Now looking at the above table will cause just about anyone who's been even casually awaiting Bulldozer to ask; what the hell happened? Nearly across the board we experienced a serious decline in performance. The only exception to this was WinRAR 4.0.1, which is probably making use of one the new instruction sets.

It's impossible for us to tell whether the issue is the inherent performance-killing effect of the additional pipeline stages, whether the integer cores are waiting for access to resources and thus creating extra latency, or whether the much higher cache latencies are the main cause for this situation. Assuming our numbers are indeed correct, something has pretty clearly caused overall clock-per-clock performance to dive off a cliff.

Yes, Zambezi might take the lead in some specialized software that takes advantage of its AES, AVX, FMA4, and XOP instructions, but those are few and far between in the consumer software realm at the moment. As you will see in the coming pages, at full-strength the FX-8150 can deliver some impressive multi-threaded performance, but single and lightly-threaded performance has actually gotten worse despite the huge clock speed advantage that Zambezi brings over Phenom II.


EDIT:

As mentioned above, since we had serious doubts about the validity of our previous clock-per-clock results due to Windows 7's wonky (vis-à-vis Bulldozer anyways) scheduler, we decided to try another approach. Instead of telling the OS to simply ignore the other four cores, we decided to try manually setting processor affinity from within the task bar. Every time we opened a program, we set the processor affinity to cores 0, 2, 4 and 6 (which provide optimal performance according to AMD). This allowed us diminish the negative impact the OS was having on our C-P-C tests. We had to remove WinRAR and DiRT 3 since it we couldn't prevent them from using all eight cores.


As you can see, our new approach made a sizeable difference in some instances. Having said that, many of the comments we made above still ring true with regard to Zambezi's performance shortcomings compared to the venerable Deneb core. Zambezi's performance is inconsistent, never impressive in lightly threaded workloads, but also sometimes lagging badly in highly multi-threaded programs.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Feature Test: Turbo CORE Technology

Feature Test: Turbo CORE Technology



As you all know by now, the Turbo CORE technology that AMD first introduced on the Phenom II X6 processors has found its way onto Zambezi, and with great effect! So you want to know how AMD's implementation of Turbo CORE works? Well let us direct you to the following chart:


Confused? Don't worry, we were too. Here is a very basic explanation.

Processor performance states (P-States) are effectively supported operating frequencies and voltages that the processor can switch between in order to manage power consumption and lower heat output. These states are controlled by the ACPI function in the operating system. Processors can move in and out of these P-states in a manner that is seamless to the user. The lower the P-state number the higher the processor speed.

As we mentioned previously, AMD is very conservative with their TDP estimates, and as such their processors usually have a considerable amount of TDP headrom. With that in mind, AMD have devised a two-tier Turbo implementation for Zambezi. There is Turbo Core and there Max Turbo. Turbo Core can increase the frequency the frequency of all eight cores by up to 300Mhz when there is extra TDP headroom. It's not infinite turbo though, since power consumption increases as the amount of threads being used increases. Therefore, once the TDP limit has been attained, the AMD Power Manager will ramp down to a P-state that is within the TDP. Eventually, when additional headroom is once again available, then it will Turbo up to higher performing P-state.

Max Turbo is a new mode that is engaged on lightly-threaded workloads. It can increase the frequency of half the cores by up to 600MHz, and keep that higher P-state for a much longer time than previous turbo iterations.

Here is how the Turbo Core implementation looks like in real-time:


In our experience, with the FX-8150 you can expect between 3900-4200Mhz when using 1 to 4 cores, and 3600-3900Mhz when using between 5 to 8 cores. Whether you get the high-end or low-end of that range really depends on the workload, specifically when more than 4 threads are needed. If the workload is "bursty" (ie: most applications & games) then you will likely get the highest possible frequency, but if the workload is a static 100% load (which is rare outside of stress test apps) that is just hammering the cores, then expect the lower-end. Either way, Zambezi's Turbo CORE seems to hold the turbo frequency better and longer than the previous ititeration.

Are there any worthwhile performance gains? Let's find out:



As you can see, the gains are quite evident when it comes to lightly-threaded workloads, with a performance increase of about 14-15%. This is pretty much in-line with the 17% frequency boost that Turbo Max provides. However, in very highly-threaded applications the performance difference ranges from none at all to a little under 5%. Thankfully, if you do encounter a highly threaded workload, you do have eight highly clocked cores are your disposal.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Feature Test: DDR3 Frequency Performance Scaling

Feature Test: DDR3 Frequency Performance Scaling



One of the cool new features that AMD has implemented with their new 32nm processors, first on the Llano APU’s and now on Zambezi, is native support for DDR3-1866 memory speeds. Phenom II's really did not excel at hitting very high memory frequencies, achieving close to DDR3-2000 was an immense struggle, but Llano can handle DDR3-2400 and above extremely easily. We aren't actually going to be testing those insane frequencies today, but we are interested in determining whether Zambezi benefits from higher memory bandwidth.


Click on image to enlarge

In order to do this test, we benchmarked the FX-8150 with a G.Skill RipJawsX 8GB memory kit running at DDR3-1333 7-7-7-1T, and with the same kit overclocked to DDR3-1866 9-11-9-1T. This represents two easily attainable and relatively affordable specifications.


Basically, no difference. What this tells us is that Zambezi parts are not being bottlenecked by the memory subsystem. So you can buy cheaper memory without worrying about impacting the FX-8150's performance. Sure if you can find some 1866 8-9-8 or 1866 7-9-7 (or higher) memory kits you might see a small performance boost, but its probably not worth the two to three-fold price premium for those kits.
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Test Setups & Methodology

Test Setups & Methodology



For this review, we have prepared four different test setups, representing all the popular platforms at the moment, as well as most of the best-selling processors. As much as possible, the four test setups feature identical components, memory timings, drivers, etc. Aside from manually selecting memory frequencies and timings, every option in the BIOS was at its default setting.

AMD Zambezi AM3+ Test Setup​

AMD Llano FM1 Test Setup​

AMD Phenom II AM3 Test Setup​

Intel Core i5/i7 LGA1155 Test Setup​

Intel Core i3/i5/i7 LGA1156 Test Setup​

Intel Core i7 LGA1366 Test Setup​

For all of the benchmarks, appropriate lengths are taken to ensure an equal comparison through methodical setup, installation, and testing. The following outlines our testing methodology:

A) Windows is installed using a full format.

B) Chipset drivers and accessory hardware drivers (audio, network, GPU) are installed.

C)To ensure consistent results, a few tweaks are applied to Windows 7 and the NVIDIA control panel:
  • UAC – Disabled
  • Indexing – Disabled
  • Superfetch – Disabled
  • System Protection/Restore – Disabled
  • Problem & Error Reporting – Disabled
  • Remote Desktop/Assistance - Disabled
  • Windows Security Center Alerts – Disabled
  • Windows Defender – Disabled
  • Screensaver – Disabled
  • Power Plan – High Performance
  • V-Sync – Off

D) Windows updates are then completed installing all available updates

E) All programs are installed and then updated, followed by a defragment.

F) Benchmarks are each run three to eight times, and unless otherwise stated, the results are then averaged..

Here is a full list of the applications that we utilized in our benchmarking suite:
  • 3DMark06 Professional v1.2.0
  • 3DMark Vantage Professional Edition v1.1.0
  • 3DMark11 Professional Edition v1.0.2
  • 7-Zip 9.22 beta 64-bit
  • AIDA64 Extreme Edition v1.85.1641 Beta
  • Cinebench R10 64-bit
  • Cinebench R11.529 64-bit
  • Civilization V 1.0.1.383
  • Crysis v1.2.1 64-bit
  • Crysis 2 v1.9 + DX11 Pack + HiRes Texture Pack
  • Deep-Fritz 12
  • DiRT 3 v1.2.0
  • Far Cry 2 v1.03
  • HyperPI 0.99b
  • Lame Front-End 1.0 (LAME 3.97 32-bit codec)
  • Left 4 Dead 2 v2.0.8.9
  • LuxMark v1.0
  • MaxxMEM² - PreView v1.90
  • PCMark 7 Professional Edition v1.0.4
  • Photoshop CS4 64-bit
  • POV-Ray v3.7 RC3 64-bit
  • SPECviewperf 11
  • Street Fighter IV Benchmark V1.0.0.1
  • Team Fortress 2 v1.1.7.6
  • TrueCrypt 7.1
  • Valve Particle Simulation Benchmark v1.0.0.0
  • WinRAR 4.0.1 64-bit
  • World in Conflict Demo v1.0.0.0
  • wPRIME version 2.05
  • x264 HD Benchmark 2.0
  • X3: Terran Conflict Demo v1.0

That is about all you need to know methodology wise, so let's get to the good stuff!
 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Synthetic Benchmarks: AIDA64 / MaxxMEM²

Synthetic Benchmarks: AIDA64 / MaxxMEM²




AIDA64 Extreme Edition 1.85 - CPU & FPU Benchmarks





AIDA64 Extreme Edition 1.85 - Cache Benchmark




AIDA64 Extreme Edition 1.85 - Memory Benchmarks





MaxxMEM² - Memory Benchmarks



 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
Synthetic Benchmarks: SuperPI 32M / wPRIME 1024M

Synthetic Benchmarks: SuperPI 32M / wPRIME 1024M



SuperPi Mod v1.5


When running the SuperPI 32MB benchmark, we are calculating Pi to 32 million digits and timing the process. Obviously more CPU power helps in this intense calculation, but the memory sub-system also plays an important role, as does the operating system. We are running one instance of SuperPi via the HyperPi 0.99b interface. This is therefore a single-thread workload.



wPRIME 2.03


wPrime is a leading multithreaded benchmark for x86 processors that tests your processor performance by calculating square roots with a recursive call of Newton's method for estimating functions, with f(x)=x2-k, where k is the number we're sqrting, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum. This is a highly multi-threaded workload.

 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
System Benchmarks: Cinebench R10 / Cinebench R11.5

System Benchmarks: Cinebench R10 / Cinebench R11.5



Cinebench R10


Cinebench R10 64-bit
Test1: Single CPU Image Render
Test2: Multi CPU Image Render
Comparison: Generated Score


Developed by MAXON, creators of Cinema 4D, Cinebench 10 is designed using the popular Cinema software and created to compare system performance in 3D Animation and Photo applications. There are two parts to the test; the first stresses only the primary CPU or Core, the second, makes use of up to 16 CPUs/Cores. Both are done rendering a realistic photo while utilizing various CPU-intensive features such as reflection, ambient occlusion, area lights and procedural shaders



Cinebench R11.5


Cinebench R11.5 64-bit
Test1: CPU Image Render
Comparison: Generated Score


The latest benchmark from MAXON, Cinebench R11.5 makes use of all your system's processing power to render a photorealistic 3D scene using various different algorithms to stress all available processor cores. The test scene contains approximately 2,000 objects containing more than 300,000 total polygons and uses sharp and blurred reflections, area lights and shadows, procedural shaders, antialiasing, and much more. This particular benchmarking can measure systems with up to 64 processor threads. The result is given in points (pts). The higher the number, the faster your processor.

 

MAC

Associate Review Editor
Joined
Nov 8, 2006
Messages
1,106
Location
Montreal
System Benchmarks: Deep Fritz 12 / POV-Ray 3.7 RC3

System Benchmarks: Deep Fritz 12 / POV-Ray 3.7 RC3



Deep Fritz 12 - Chess Benchmark




POV-Ray 3.7 RC3


 
Status
Not open for further replies.

Latest posts

Twitter

Top