Westmere Microarchitecture - Gulftown Edition pt.2
Westmere Microarchitecture - Gulftown Edition pt.2
The Nehalem microarchitecture, upon which Westmere-based chips like Gulftown are derived,
was highly touted as one of the most significant architectural overhauls ever. Although sharing significant roots with the original P6 microarchitecture that was debuted in the Pentium Pro in 1995, and the later Core/Penry microarchitecture, Intel's engineers have added significant performance-oriented features, like an integrated memory controller, a completely new system interconnect, and a multi-level shared cache. They also focused a great deal on the chip's power efficiency capabilities.
As discussed on the previous page, Gulftown has fundamentally the same features and technology as the Bloomfield chips. It has a triple-channel DDR3 memory interface, it has Hyper-Threading, and the same relatively mild implementation of Turbo Boost. The sole difference is one new instruction set, AES-NI.
Nevertheless, let's examine some of these features and technologies:
For the Nehalem architecture, Intel has foregone the legacy front side bus in favour of the QuickPath Interconnect (QPI). The QPI is a high-speed, low-latency point-to-point processor link. From a technical standpoint, the QPI is a bi-directional 20-bit wide bus that is integrated onto the processor itself. The result? An incredibly fast interconnect that will improve overall bandwidth while reducing latency. This high-speed interface is used to access the distributed shared memory, it helps cores communicate with each other, and it also links up with the X58 northbridge; now known as the IO Hub (IOH).
Being a high-end performance part like the i7-965 Extreme Edition and i7-975 Extreme Edition, the new i7-980X features the faster 6.4 Gigatransfers per second (GT/s) QPI link, which has a theoretical maximum bandwidth of 25.6GB/s; equivalent to Nehalem's triple-channel DDR3-1066 memory bandwidth. The lower-end i7-920/930/940/950 models all feature a 4.8GT/s QPI interface with 19.2GB/s of bandwidth. The benefits of the faster QPI link are mostly seen in graphically intensive applications, specifically when multiple graphics cards are installed.
- Integrated Memory Controller (IMC)
As we have come to expect from Nehalem-based chips, Gulftown features an integrated memory controller. As a result, the memory has a direct link to the processor, which not only means significantly lower latency, but much higher bandwidth as well. Current Core i7 processors feature a triple-channel memory interface, and each channel can support one or two DDR3 modules. This means that memory modules should be installed in sets of three, not two as has been the norm since the dual-channel memory architecture was first introduced back in 2003.
While there was much speculation that Intel would increase the stock memory frequency to DDR3-1333, it remains at DDR3-1066, which means 25.6GB/s of memory bandwidth. This amount of memory bandwidth proved to be overkill on Bloomfield processors, so it will be interesting to see if the situation is any different with Gulftown.
Building upon Penryn's implementation of SSE4.1, which was focused on improving video encoding, image/video editing, faster 3D game physics, etc...the Nehalem architecture adds 7 new instrutions, namely Accelerated String and Text New Instructions (STTNI) and Application Targeted Acceleration (ATA), which focus on faster XML parsing, faster search and pattern matching, and other cryptic processor functions.
A brand new addition to the Westmere core are the Advanced Encrytion Standard New Instructions (AES-NI). There are 12 new instructions designed to accelerate tasks that use the AES algorithm, such as whole disk encryption/decryption, internet security, VoIP, etc. Baiscally, this essentially allows the processor to do real-time high-security encryption/decryption with little to no effect on system performance.
Nehalem also brought Hyper-Threading (HT) back from the dead, and it's a huge factor on Gulftown, turning a 6-core processor into a 12 thread crunching workhorse. With HT enabled, a processor with four physical cores is viewed by the operating system as having eight logical cores. A core usually processes the pieces of the different threads one after another, however an HT-enabled core can process two threads in a simultaneous manner. While Hyper-Threading did not perform particularly well on the Pentium 4, Nehalem's architecture was designed to remove many of the processing bottlenecks. Depending on the workload, and how effectively multi-threaded an application is, the performance increases could be 20% or higher.
Nehalem’s Power Control Unit (PCU) is an extremely innovative power management feature that uses an on-chip micro-controller to actively manage the power and performance of the entire processor with the help of numerous integrated power sensors. The PCU can dynamically alter the voltage and frequency of the CPU cores to lower power consumption or provide performance boost in conjunction with the new Turbo Mode feature. Also, thanks to a development know as Power Gates, idle cores can be completely shut down and placed in a C6 sleep mode while other cores continue working. This is noteworthy because C6 mode had previously only been featured on mobile processors.
Turbo Boost is arguably the most discussed feature brought forth by Nehalem. Basically, all Core i7 LGA1366 processors come with two additional speed bins, which is to say that they have two higher multipliers that they can use under certain scenarios. For example, if you are using a single-threaded application, the PCU will down-clock or shut down the unused cores, thereby freeing up power and lowering heat output while "overclocking" that one core that is in use. If an application is multi-threaded and the cores are not running too hot, the PCU will overclock all the cores up one speed bin.
If you are more visually-inclined, the following illustration should help explain the Turbo Boost implementation:
Taken as a whole, Gulftown doesn't really bring any major new updates, but there weren't really any glaring feature/technology omissions on Bloomfield to begin with. We would have definitely liked to see
Lynnfield's more aggressive Turbo Boost implementation make its way to Gulftown, but since the i7-980X has unlocked CPU multipliers, I don't foresee many people being limited in their quest for extra performance.