ECC Memory & AMD’s Ryzen – A Deep Dive
One of the more interesting aspects of the pre- and post-Ryzen launch was how many people were wondering about support for ECC memory, otherwise known as error-correcting code memory. Every forum had at least one thread and most articles with a comments section had people talking about this relatively rare feature. It all started when leaks and then online product pages for AM4 motherboards appeared, since their specifications lists showed support for both ECC and non-ECC DDR4 memory.
The reason that ECC is much more popular among AMD consumers is that it’s a feature that the company has never blocked. While Intel artificially limits ECC support to pricey Xeon processors or low-end Pentium models that no one would ever use for serious work, AMD have always quietly supported ECC memory. It might not be advertised, and it’s certainly not validated, but it’s been there on AM1/AM2/AM3 platforms for the adventurous to make use of and there’s a die-hard contingent that do.
While we are not going to be talking about the merits of ECC memory, nor going into great technical detail about how it works – Crucial has a concise article available – the primary thing that ECC memory protects you from is what is known as a single-bit error (one DRAM bit cell that is stuck or flipped), due to cosmic radiation, electromigration, physical contact issues, or just some type of hardware failure. Uncorrected, bit flips can cause programs to terminate, they can result in errors in output data, or they can have no impact at all if the flipped bit is in a fortunate place. Nowadays, you also have to worry about a row hammer attack, which is a rapid number of malicious accesses that can cause bit flips and give a skilled hacker read-write access to all physical memory. ECC can prevent that from happening.
There are perfectly valid reasons for home-users to be using ECC; some people are running a DIY NAS, others a home lab, some dabble with scientific tasks like LAMMPS, while others just do work that is personally important, and they are all willing to spend a small premium to ensure the reliability of their data and of their results. Those who run DIY NAS software – like FreeNAS – prefer to use ECC memory since it is one element that can help ensure file integrity. Some people believe that ECC is especially important when using file systems like ZFS or btrfs that perform ‘scrubbing’, which is a process that reads all of the data and metadata to see if it still matches the file system checksum. If errors have been introduced, and the checksums no longer match, things can go sideways in a hurry. It is exceedingly rare though, and similar data corruption can happen on any file system.
Dr. Lisa Su is the President and CEO of AMD, Robert Hallock is Head of Global Technical Marketing, and James Prior is a CPU Product Manager. Clearly, these were three people that knew what they were talking about when it comes to Ryzen.
Based on the above, it is pretty clear that AMD has not done any quality assurance testing, and they don’t want to have to provide any official support for ECC on this mainstream consumer platform. However, we would be surprised to not see them validate it on the upcoming high-end desktop platform (HEDT), which will obviously have workstation roots. Instead of pulling an Intel move and simply disabling ECC support altogether, they have opted to allow motherboard manufacturers to implement the feature as they see fit.
While that Reddit AMA confirmed many peoples hopes and dreams, it also created a bunch of other questions, and that’s where we decided to step in. On a side, you will want to check out the numerous hyperlinks that we embedded throughout this article, since most of them are zoomable images, while others are just useful links.
We reached out to Crucial – the consumer arm of semiconductor giant Micron Technology – to see if they could hook us up with some ECC memory, and look what we received in the mail:
These are two modules of Crucial CT8G4WFS824A unbuffered ECC memory. They are based on the ECC UDIMM form factor – otherwise known as EUDIMM – with memory speeds of DDR4-2400, timings of 17-17-17-39, and a default voltage of 1.20V. We specifically requested these modules because they were faster than regular DDR4-2133 ECC modules, and because they were single rank. While using dual rank modules would have worked as well, we just wanted to ensure guaranteed compatibility. Crucial does offer larger dual rank 16GB DDR4-2400 ECC modules, as well as faster dual rank 8GB DDR4-2666 ECC modules, though it seems unlikely that you could run four modules at that latter speed given Ryzen’s current (but constantly improving) memory limitations.
As you can see, these modules are manufactured with Micron D9TBH ICs, which are aren’t exclusively used on ECC memory modules. You can actually find them in various conventional DDR4-2400 memory kits. However, what makes these Crucial ECC modules special is that the ICs are binned better, they have embedded thermal sensors that popular programs like AIDA64 and HWinfo can read, and they feature one additional IC compared to the usual eight. The extra IC is what makes ECC possible. Whereas a standard DRAM is 64-bit (8×8 bit), ECC is 72-bit, and those extra 8 bits of data are used by the error detection and correction mechanisms to prevent the single bit errors that we discussed above.
As you can see in the screenshots, these modules will boot up with 17-17-17-39-56-1T timings, or more accurately 17-17-17-17-39-56-1T timings since that’s how timings are presented in the UEFI on the ASRock X370 Taichi motherboard that we used for this article. These are at least five JEDEC DDR4-2400 profiles loaded onto these modules to ensure maximum compatibility, so no matter what you will be able to run at that frequency.