What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

The Intel i7-6700K Review; Skylake Arrives

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
While it may not have been apparent due to their dominance in the x86 processor market, Intel has been facing some unique challenges as of late. Their definitive tick / tock cadence of process technology shrinks followed by base architectural changes has been slowing down. 22nm architectures with 3D transistors like Ivy Bridge and Haswell were initially quite difficult to produce while 14nm Broadwell parts were delayed and only launched outside the mobile market over the last few weeks. Now they are already moving on to Skylake, Broadwell’s successor but this time around the DIY PC segment will get their hungry little mitts on these processors first.

Intel’s approach to Skylake’s launch is indeed an interesting one since unlike previous releases which focused on the notebook and small form factor markets for their initial rollouts, this one will benefit enthusiasts and gamers before everyone else. However, additional details about the core architecture, integrated graphics subsystems and feature support will only be discussed in a few weeks at IDF. That means this review will showcase raw performance but the nuts and bolts which allow these 14nm chips to go about their business will need to remain under wraps for the time being.

According to Intel’s own documents, PC gaming has changed from niche market status into something that now drives the PC platform as a whole. With games like Starcraft, DOTA and other titles gaining in popularity and featuring championships that frequently pack stadiums full of people, it’s no wonder that Intel has taken notice. With over 1.2 billion people classifying themselves as PC gamers and exponential year over year growth there’s certainly money to be made by launching products which embrace gaming and overclocking.


While we will get into both new 6th generation Core series processors in a bit, it’s important to mention the changes which will affect users who want to upgrade. For starters Skylake CPUs boast a new architecture that’s a significant departure from the Haswell and Broadwell generations of yesteryear. Like Haswell, that means a new socket (in this case one with 1151 pins) and a new motherboard will be required. The Z170 series boards will be replacing the venerable Z97 and they will also bring forth a whole raft of upgrade possibilities that we’ll go over on the next page.

Gone too is DDR3 as Intel moves to DDR4 across their product range. Not only will this bring higher bandwidth to the table but it should also give the significantly upgraded processor graphics a whole lot more room to operate. DDR4 prices may be high but expect the advent of Skylake and Z170 to usher in a day of more affordable dual channel kits.


For the time being the Skylake lineup will be restricted to a pair of unlocked SKUs: the i7-6700K and i5-6600K. These will be followed up in Q3 2015 by a full range of additional parts to fill out other price points. Expect non-K series CPUs, i3, Pentium and mobile CPUs to quickly arrive after IDF and eventually cascade down into all segments.

The i7-6700K is Intel’s new flagship outside their ultra high-end Haswell-E lineup. It boasts a quartet of cores, eight threads courtesy of Intel’s Hyper Threading technology and 8MB of Smart Cache. The base clock rings in at an even 4GHz while the Turbo frequency of 4.2GHz remains a bit behind the i7-4790K. Intel has worked hard to bring the Base and Turbo speeds of this chip into close proximity to one another, offering consistent performance regardless of how many threads are being utilized. Like its predecessors, the i7-6700K will be priced at around $350.

Intel’s i5-6600K is a more affordable 4-core, quad thread alternative to its higher end sibling and we expect this is the chip many will gravitate towards due to its $250 price point. With that being said, the $100 more expensive processor’s additional threads may come in handy within a DX12 environment. Its architectural specifications do however mirror the i7-6700K with the exception of a lack of Hyper Threading and 6MB of Smart Cache in the place of a full 8MB. This CPU is meant to replace the outgoing i5-4690K and boasts identical clock speeds to boot.

Behind the x86 processing stages of these two new chips lies Intel’s revamped HD 530-series integrated graphics. While the actual architectural details behind them will remain a mystery for the time being, we do know they will operate at 1150MHz and offer full DX12 compatibility. However, the next generation of Intel’s extremely impressive Iris Pro won’t be offered on these unlocked SKUs since most people using them will be using dedicated graphics cards rather than higher end CPU-bound graphics processing capabilities. This situation will likely lead to the Broadwell-series i7-5775C and i5-5765C remaining the solutions of choice for some system integrators while the more expensive K-series Skylake chips will continue to address the needs for enthusiasts and overclockers.

With their expanded graphics stages versus Haswell / Devil’s Canyon and higher clock speeds than 14nm Broadwell chips, the i7-6700K and i5-6600K feature TDPs of 91W. That actually represents a very minimal increase over Devil’s Canyon while still remaining untouchable in the performance per watt bracket by the best AMD has to offer.


As Intel goes about their tick / tock approach, we have seen a continual 10% performance improvement from generation to generation. Skylake is no different but more significantly, users of slightly older CPUs like the 4770K and 3770K will likely see more significant speedups of 20% and 30% respectively while Sandy Bridge processors will likely be beaten by a good 50%. None of these numbers are anything to sneeze at since they’ve been achieved without vast frequency increases but they aren’t exactly earth-shattering either. Much of Intel’s hesitation to really put their hair down is likely due to an utter and complete lack of competition from AMD in the $200+ price bracket, not to mention the 14nm manufacturing process' limits. Until we see something different from AMD other than failed GPU compute initiatives, Intel will likely continue down this route.

Alongside the benefits brought about by higher IPC numbers, Intel is also touting a 20-40% graphics improvement over previous generations. This points towards a significant baseline architectural change but there’s certainly not enough for these initial Skylake CPUs to compete against the i7-5775C or other desktop Broadwell processors.


With Skylake, overclockers have been thrown a few more bones with some additional granularity adjustments being thrown into the equation. For example Haswell and earlier generations had their Base Clock tied to a simple ratio of 100MHz, 125MHz or 166Mhz which limited BCLK to a factor of one of those points. Unlocked K-series Skylake chips on the other hand feature 1MHz increments so a full range of speeds can be achieved. Memory adjustments have also been given an upgrade.

Skylake looks like an interesting architecture even though we don’t know all that much about its inner workings yet. It is certainly heartening to see Intel cater to the wants of overclockers and games after the late launch of Broadwell. We’re also going to see Skylake on the market for longer than previous architectures since the upcoming 10nm node shrink has seen its own delays and has been pushed back to sometime in 2017. For those of you wondering what challenges with the 14nm manufacturing process will do to availability, there doesn’t seem to be any reason to worry since these processors and their accompanying motherboards will be on retailers’ shelves right away.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
The Z170 Chipset; PCI-E Lanes Aplenty

The Z170 Chipset; PCI-E Lanes Aplenty


Much of the news about Skylake will be focused on the processors themselves but the Z170 chipset is arguably the more interesting aspect of this launch. It will usher in a whole new generation of motherboards, ones which are infinitely more capable of adapting to upcoming connectivity technologies. That's is particularly important since it looks like this chipset in one form or another may be with us for the better part of two years.

With so many devices moving over to the PCI-E bus, the outgoing Z97 chipset’s eight integrated Gen 2.0 lanes were proving to be a major hindrance for future expansion opportunities. Granted, some of the Flex I/O ports could be configured as for additional lanes but motherboard manufacturers were still dealing with a primary layout that dated back to the P67 Express days. With NVMe, M.2, USB 3.1 and even the ill-fated SATA Express interface all eating up significant amounts of bandwidth, those eight lanes proved to be woefully inadequate and some boards actually “stole” lanes from the graphics slots to feed higher end storage devices. The situation needed to change in a big way if Intel had any hope of providing a modicum of future-proofing for their newest processors.


Even though a number of design components haven’t been officially announced or detailed yet, a top-level look at the new Z170 PCH and its interface with Skylake processors still provides some interesting insights. First and foremost, the baseline graphics PCI-E capabilities of Skylake processors haven’t changed from previous generations; there’s still a total of 16 PCI-E 3.0 graphics lanes that can be configured in 1x16, 2x8 or 1x8 + 2x4 formats. There’s also support for three display outputs alongside HDMI 2.0 and DisplayPort 1.2 certifications.

Those CPU-bound display and PCI-E lanes seem to be some of the only elements carried over from previous designs. In the place of the old DDR3 controller the Skylake processors boast a new dual-use controller that can switch between standard high speed DDR4 or low voltage DDR3L for SFF and mobile applications. The Direct Media Interface which is charged with communication between the CPU and PCH has also been thoroughly upgraded to a third generation format and boasts drastically increased bandwidth of 8GT/s in order to better handle the chipset’s new capabilities.

Moving down to the Z170 itself, things start to get really interesting. In the place of those aforementioned outdated PCI-E 2.0 lanes there’s now a grand total of 20 PCI-E Gen 3.0 lanes. While Intel hasn’t exactly been forthcoming about how this copious number of lanes is doled out through their Flex I/O interface, some will likely pull duty for Ethernet, SATA and USB connectivity while the remainder can be dedicated towards additional PCI-E slots or PCI-E-based storage devices. This flexibility is why all of the SATA and USB figures are given an “up to” number as motherboard manufacturers are relatively free to spec their own layouts.

The availability of PCI-E 3.0 lanes for high bandwidth storage support is certainly a big step in the right direction but native USB 3.1 and Thunderbolt support are both still missing in action. Those interfaces can be added through third-party controllers and will access the chipset via the PCI-E 3.0 lanes so Intel’s partners can certainly add them if the situation dictates.

Aside from the PCI-E bonanza there are a few other changes to the chipset as well. For example, Intel’s RST service has been brought forward into the PCI-E storage sphere, allowing RAID arrays to be built with today’s fastest drives. Intel’s Smart Cloud Technology is being integrated here too.


ASUS’ Z170 Deluxe represents an excellent example of what this new platform is capable of since it utilizes a full array of controllers and the chipset’s native capabilities to deliver a full array of connectivity options in a great looking (albeit expensive) package. There is quartet of both USB 3.0 and USB 2.0 outputs dedicated towards the board front panel connectors while two additional ports are housed on the board’s rear I/O area for the BIOS Flashback and Keybot features. Meanwhile, the Audio Codec is handled by a Realtek ALC1150, a single PCI-E lane feeds into an Intel 219 LAN network module and there’s a pair of native SATA 6Gbps ports.

High speed internal storage is handled in two different ways. There are two PCI-E lanes and a pair of SATA ports coupled together to provide bandwidth for a single SATA Express port (the two SATA connectors can be used separately as well) while four PCI-E lanes and those same two native SATA interconnects provide communication between the PCH and the onboard NVMe-enabled M.2 slot.


An additional four PCI-E lanes feed into an ASMedia 1480 switch, feeding a pair of additional SATA 6Gbps ports and the single additional PCI-E 3.0 x16 slot (which operates in x2 mode). If a user needs more bandwidth to this slot for higher end storage solutions it can be set to use all four lanes via a simple BIOS input, thereby disabling the attached SATA_5 and SATA_6 ports. Take this little foible into account when installing a x4 PCI-E SSD alongside SATA-based devices.

Things start to get really interesting when looking at how ASUS implemented the Deluxe’s tertiary functions. There’s a single ASMedia 1187e PCI-E switch that takes a single lane and multiplies it into seven individual x1 connections. These are then used to provide bandwidth to the four Gen2 x1 slots as well as an Intel i211 Ethernet controller and a single eSATA port. The final lane provided by this switch is converted to a USB 2.0 interconnect that feeds ASUS’ onboard Bluetooth / WiFi module.


The final part of this somewhat complicated puzzle is the way the Deluxe handles USB 3.1. There are three pairs of PCI-E lanes, each of which funnels its bandwidth towards a dedicated ASMedia 1142-1 PCI-E to USB 3.1 switch. Those three switches each provide sufficient power for this board’s five USB 3.1 Type-A connectors and a single Type-C connector.

While the Deluxe takes an “everything but the kitchen sink” approach to device support, ASUS’ other boards like the Z170-A go down a slightly more straightforward route. While we will still see plenty of support for the likes of USB 3.1, SATA Express, M.2 and PCI-E SSDs, the number of third party switches will be drastically reduced, simply providing less ports rather than eliminating any key functions.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Test Setups & Methodology

Test Setups & Methodology


For this review, we have prepared a number of different test setups, representing many of the popular platforms at the moment. As much as possible, the test setups feature identical components, memory timings, drivers, etc. Aside from manually selecting memory frequencies and timings, every option in the BIOS was at its default setting.


For all of the benchmarks, appropriate lengths are taken to ensure an equal comparison through methodical setup, installation, and testing. The following outlines our testing methodology:

A) Windows is installed using a full format.

B) Chipset drivers and accessory hardware drivers (audio, network, GPU) are installed.

C)To ensure consistent results, a few tweaks are applied to Windows 8.1 and the NVIDIA control panel:
  • UAC – Disabled
  • Indexing – Disabled
  • Superfetch – Disabled
  • System Protection/Restore – Disabled
  • Problem & Error Reporting – Disabled
  • Remote Desktop/Assistance - Disabled
  • Windows Security Center Alerts – Disabled
  • Windows Defender – Disabled
  • Screensaver – Disabled
  • Power Plan – High Performance
  • V-Sync – Off

D) Windows updates are then completed installing all available updates

E) All programs are installed and then updated.

F) Benchmarks are each run three to eight times, and unless otherwise stated, the results are then averaged.

G) All processors had their energy saving options / c-states enabled
 

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
System Benchmarks: AIDA64

AIDA64 Extreme Edition


AIDA64 uses a suite of benchmarks to determine general performance and has quickly become one of the de facto standards among end users for component comparisons. While it may include a great many tests, we used it for general CPU testing (CPU ZLib / CPU Hash) and floating point benchmarks (FPU VP8 / FPU SinJulia).


CPU PhotoWorxx Benchmark

This benchmark performs different common tasks used during digital photo processing. It performs a number of modification tasks on a very large RGB image:

This benchmark stresses the SIMD integer arithmetic execution units of the CPU and also the memory subsystem. CPU PhotoWorxx test uses the appropriate x87, MMX, MMX+, 3DNow!, 3DNow!+, SSE, SSE2, SSSE3, SSE4.1, SSE4A, AVX, AVX2, and XOP instruction set extension and it is NUMA, HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.





CPU ZLib Benchmark

This integer benchmark measures combined CPU and memory subsystem performance through the public ZLib compression library. CPU ZLib test uses only the basic x86 instructions but is nonetheless a good indicator of general system performance.




CPU AES Benchmark

This benchmark measures CPU performance using AES (Advanced Encryption Standard) data encryption. In cryptography AES is a symmetric-key encryption standard. AES is used in several compression tools today, like 7z, RAR, WinZip, and also in disk encryption solutions like BitLocker, FileVault (Mac OS X), TrueCrypt. CPU AES test uses the appropriate x86, MMX and SSE4.1 instructions, and it's hardware accelerated on Intel AES-NI instruction set extension capable processors. The test is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware.




CPU Hash Benchmark

This benchmark measures CPU performance using the SHA1 hashing algorithm defined in the Federal Information Processing Standards Publication 180-3. The code behind this benchmark method is written in Assembly. More importantly, it uses MMX, MMX+/SSE, SSE2, SSSE3, AVX instruction sets, allowing for increased performance on supporting processors.




FPU VP8 / SinJulia Benchmarks

AIDA’s FPU VP8 benchmark measures video compression performance using the Google VP8 (WebM) video codec Version 0.9.5 and stresses the floating point unit. The test encodes 1280x720 resolution video frames in 1-pass mode at a bitrate of 8192 kbps with best quality settings. The content of the frames are then generated by the FPU Julia fractal module. The code behind this benchmark method utilizes MMX, SSE2 or SSSE3 instruction set extensions.

Meanwhile, SinJulia measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified "Julia" fractal. The code behind this benchmark method is written in Assembly, and utilizes trigonometric and exponential x87 instructions.




The AIDA64 tests show an interesting combination of good and spectacular results. The best outings for the Skylake processor seem to be in situations which use newer instruction sets like AVX2.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
System Benchmarks: Cinebench / PCMark 8 / WPrime

CineBench R15 64-bit


The latest benchmark from MAXON, Cinebench R15 makes use of all your system's processing power to render a photorealistic 3D scene using various different algorithms to stress all available processor cores. The test scene contains approximately 2,000 objects containing more than 300,000 total polygons and uses sharp and blurred reflections, area lights and shadows, procedural shaders, antialiasing, and much more. This particular benchmarking can measure systems with up to 64 processor threads. The result is given in points (pts). The higher the number, the faster your processor.




PCMark 8


PCMark 8 is the latest iteration of Futuremark’s system benchmark franchise. It generates an overall score based upon system performance with all components being stressed in one way or another. The result is posted as a generalized score. In this case, we didn’t use the Accelerated benchmark but rather just used the standard Computational results which cut out OpenCL from the equation.



WPrime


wPrime is a leading multithreaded benchmark for x86 processors that tests your processor performance by calculating square roots with a recursive call of Newton's method for estimating functions, with f(x)=x2-k, where k is the number we're squaring, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum. This is a highly multi-threaded workload. Below are the scores for the 1024M benchmark.



Moving on from AIDA, we see a repetition of the results. The improvements over previous generations are right in line with Intel's expectations of 10% better than Devil's Canyon and between 15-22% better than Haswell processors. Not bad at all, especially if you are on an older system.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Productivity Benchmarks: 7-Zip / Blender / Handbrake

7-Zip


At face value, 7-Zip is a simple compression/decompresion tool like popular applications like WinZip and WinRAR but it also has numerous additional functions that can allow encryption, decryption and other options. For this test, we use the standard built-in benchmark which focuses on raw multi-threaded throughput.




Blender


Blender is a free-to-use 3D content creation program that also features an extremely robust rendering back-end. It boasts extremely good multi core scaling and even incorporates a good amount of GPU acceleration for various higher level tasks. In this benchmark we take a custom 1440P 3D image and render it out using the built-in tool. The results you see below list how long it took each processor to complete the test.




Handbrake


Video conversion from one format to another is a stressful task for any processor and speed is paramount. Handbrake is one of the more popular transcoders on the market since it is free, has a long feature list, supports GPU acceleration and has an easy-to-understand interface. In this test we take a 6GB 4K MP4 and convert it to a 1080P MKV file with a H.264 container format. GPU acceleration has been disabled. The results posted indicate how long it took for the conversion to complete.



Real world testing once again shows a yin and yang effect. In programs that use newer instruction sets like Blender, the i7-6700K provides excellent speedups. Meanwhile, in other situations that use legacy optimizations or could be bottlenecked by other system components, the differences between Skylake and older CPUs begins to narrow.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Productivity Benchmarks: POV Ray / WinRAR

POV Ray 3.7


POV Ray is a complex yet simple to use freeware ray tracing program which has the ability to efficiently use multiple CPU cores in order to speed up rendering output. For this test, we use its built-in benchmark feature which renders a high definition scene. The rendering time to completion is logged and then listed below.




WinRAR


WinRAR is one of those free tools that everyone seems to use. Its compression and decompression algorithms are fully multi-core aware which allows for a significant speedup when processing files. In this test we compress a 3GB folder of various files and add a 256-bit encryption key. Once again the number listed is the time to completion.



We don't see any real surprises here but what's evident is that WinRAR doesn't seem to take full advantage of these newer processors.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Single Thread Performance

Single Thread Performance


Even though most modern applications have the capability to utilize more than one CPU thread, single threaded performance is still a cornerstone of modern CPU IPC improvements. In this section, we take a number of synthetic applications and run them in single thread mode. The only addition to our normal benchmarks is Dolphin which uses a simple Nintendo GameCube emulation test on a single core.



Single thread performance benchmarks are always interesting since they typically cut out any multi threaded scheduler optimizations built into newer architectures. As we can see, with those cast aside Skylake's lead shrinks quite a bit but still remains in the 5% to 8% range versus the i7-4790K.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Accelerated Graphics Performance

Accelerated Graphics Performance


With many CPUs using integrated graphics processors, parallel co-processing has become a hot topic of conversation for the last few years. While many were hoping the see a revolution in this field, thus far OpenCL and DirectCompute have failed to gain all that much traction in the development community. However, there are still a few programs that put GPU computing to good use.

In this section, we will be benchmarking a number of applications which support (or claim to support) GPU compute in an effort to highlight the performance benefits which come with this technology. All of these tests are conducted on a system WITHOUT a discrete GPU installed.



AIDA64 GPGPU Benchmark


AIDA64’s integrated graphics benchmark runs the integrated GPU through a number of different tests. In this benchmark we focus on the simple 32-bit and 64-bit integer processing which stresses both the graphics engine and its associated memory.




PCMark 8 - Accelerated


PCMark 8 is the latest iteration of Futuremark’s system benchmark franchise. It generates an overall score based upon system performance with all components being stressed in one way or another. In this test the Accelerated OpenCL path was used.




Blender


Blender is a free-to-use 3D content creation program that also features an extremely robust rendering back-end. It boasts extremely good multi core scaling and even incorporates a good amount of GPU acceleration for various higher level tasks. GPU acceleration was enabled.




Handbrake


Video conversion from one format to another is a stressful task for any processor and speed is paramount. Handbrake is one of the more popular transcoders on the market since it is free, has a long feature list, supports GPU acceleration and has an easy-to-understand interface. In this test we take a 6GB 4K MP4 and convert it to a 1080P MKV file with a H.264 container format. GPU acceleration has been enabled in this case. The results posted indicate how long it took for the conversion to complete.


To call our outing into GPU acceleration an unmitigated disaster is being generous. Quite frankly, the OpenCL support structure in many applications is either missing entirely or broken so we were limited to four benchmarks, one of which (Blender) failed on all systems. Luckily, Intel’s implementation wins the day by seamless incorporation of Quick Sync video into Handbrake and respectable results in the other two tests. Intel’s integrated GPUs aren’t supported yet in Blender.

For AMD however, which lives and dies by their GPU’s capabilities, is in serious trouble since their APU continually crashed in Blender, doesn’t yet support Handbrake (despite their marketing team talking about it for the last two years) and was hobbled by slow processor cores in PCMark.
 
Last edited:

SKYMTL

HardwareCanuck Review Editor
Staff member
Joined
Feb 26, 2007
Messages
13,421
Location
Montreal
Gaming Benchmarks (720P) – Discrete GPU

Gaming Benchmarks (720P) – Discrete GPU





 

Twitter

Top