Microarchitecture Dissected Continued
Microarchitecture Dissected Continued
Despite sharing significant roots with the original P6 microarchitecture that was debuted in the Pentium Pro in 1995, and being fundamentally derived from
'Penryn', the Nehalem microarchitecture represents one of the most significant overhauls ever. Intel's engineers added significant performance-oriented features, like an integrated memory controller, a completely new system interconnect (QPI), and a multi-level shared cache, while still focusing a great deal on the chip's power efficiency capabilities.
With Lynnfield, Intel have kept what makes the current 'Bloomfield' Core i7 900 series processors great, while removing the aspects that really only catered to the Server/Workstation segment, such as the triple-channel memory interface and the QuickPath Interconnect (QPI). Compared to the current mainstream Core 2 Quad offerings, Intel have worked extensively to improve the power efficiency of this new processors family, while significantly increasing performance in most consumer-oriented applications, in part thanks to the very aggressive Turbo Boost technology. Furthermore, by integrating the PCIe controller onto the processor itself, Intel have been able to do away with the northbridge and create a 2-chip platform, which should help reduce motherboard prices and overall power consumption.
Let's examine some of these advancements:
- Integrated Memory Controller (IMC)
i5-750 on the right, i7-870 on the left.
As with the current Intel Core i7 900 series processors, the Lynnfield family features an integrated memory controller. The benefits of this design are that the memory is directly connected to the processor, which not only means significantly lower latency, but much higher bandwidth as well. Unlike the Bloomfield chips though -which have a triple-channel memory interface- the Lynnfield chips feature a more standard dual-channel interface. While this may seem like a significant step back, the good news is that the supported DDR3 memory speed has increased from DDR3-1066 to DDR3-1333. When all is said and done, Lynnfield processors have a very respectable 21.2GB/s of memory bandwidth, compared to 25.6GB/s for Bloomfield. Furthermore, extensive testing of dual-channel versus triple-channel on the Core i7 900 series has shown that a dual-channel interface has noticeably lower latency. If this holds true with Lynnfield, we expect it's memory subsystem performance to be quite good.
- Integrated PCIe Controller
An industry first, Intel have moved the PCIe controller from the northbridge onto the processor itself, continuing the push towards a true System on Chip (SoC) design. This integrated memory controller supports 16 PCI-E 2.0 lanes, which can directed towards a single PCI-E x16 slot or two mechanical PCI-E x16 slots in x8/x8 configuration. While is this only half as many as the 32 PCI-E 2.0 lanes available on the Bloomfield/X58 platform, there is low latency advantage attributable to having the PCIe controller built into the CPU die.
Building upon Penryn's implementation of SSE4.1, which was focused on improving video encoding, image/video editing, faster 3D game physics, etc...the Nehalem architecture adds 7 new instrutions, namely Accelerated String and Text New Instructions (STTNI) and Application Targeted Acceleration (ATA), which focus on faster XML parsing, faster search and pattern matching, and other cryptic processor functions.
Keep in mind that with Penryn, the SSE4 instructions were responsible for the most significant performance increases, so we definitely look forward to seeing what Intel can accomplish with these latest instructions.
Although the i5-750 doesn't feature this technology, the Core i7 800 series chips support Hyper-Threading (HT). With HT enabled, a processor with four physical cores is viewed by the operating system as having eight logical cores. A core usually processes the pieces of the different threads one after another, however an HT-enabled core can process two threads in a simultaneous manner. While Hyper-Threading did not perform particularly well on the Pentium 4, Nehalem's architecture was designed to remove many of the processing bottlenecks that had previously crippled feature. Depending on the workload, and how effectively multi-threaded an application is, the performance increases can be 20% or higher.
Nehalem’s Power Control Unit (PCU) is an extremely innovative power management feature that uses an on-chip micro-controller to actively manage the power and performance of the entire processor with the help of numerous integrated power sensors. The PCU can dynamically alter the voltage and frequency of the CPU cores to lower power consumption or provide performance boost in conjunction with the Turbo Mode feature. Also, thanks to a development know as Power Gates, idle cores can be completely shut down and placed in a C6 sleep mode while other cores continue working. This is noteworthy because C6 mode had previously only been featured on mobile processors. On Lynnfield, the PCU has been tweaked to further improve power efficiency, and Intel is claiming that the i5-750's idle power consumption is up to 50% lower than that of the Core 2 Quad series.
Lynnfield's more aggressive Turbo Boost technology has been highly advertised for months, and it is arguably one of this product's best selling points. Much like on the Bloomfield chips, Turbo Boost automatically overclocks the processor based on the workload demand. All Core i5 processors come with four additional speed bins, which is to say that they have four higher multipliers that they can use under certain scenarios. The Core i7 800 series have five extra additional speed bins, which equates to a roughly 666Mhz speed boost. For example, if you are using a single-threaded application, the PCU will down-clock or shut down three cores, thereby freeing up power and lowering heat output while "overclocking" that one core that is in use. If an application is multi-threaded and the processor is not running too hot, the PCU will overclock all the loaded cores up by 2-to-4 speed bins. The only limit to Turbo Mode is the power and thermal headroom, so keeping your processor cool is an important greater priority with Lynnfield chips.
If you are more visually-inclined, the following illustration should help explain the new Turbo Boost implementation:
While the Core i7 900 series can only provide a 133Mhz (single-thread) or 266Mhz (multi-thread) speed boost, Lynnfield can Turbo Boost up by 532Mhz (i5-750) or 666Mhz (i7-860/870). This is a pretty strong selling point, and as you will see shortly, the performance gains are impressive.
As we have stated in the past, the new performance and energy-saving features are what truly distinguish Nehalem as a veritable next-generation microarchitecture. There are several little technologies at work that some users may never know exist, but which ultimately deliver a superior computing experience. With regard to Lynnfield, the downgrade from a triple-channel to dual-channel memory interface is really nothing to fret about, especially given the higher supported memory speeds and the much more impressive Turbo Boost implementation. The integration of the PCIe controller into the chip is novel idea, and we look forward to seeing whether there are any performance advantages/disadvantages to this approach. Overall, for it being marketed as 'mainstream' product, Lynnfield brings a surprisingly robust spec sheet to the table. While some have suggested that Lynnfield is 'Nehalem Lite', we suggest that on paper it appears to be Nehalem 2.0.