What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

ECC Memory & AMD's Ryzen - A Deep Dive Comment Thread

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
I've just received most parts for my new desktop (the Asrock Rack system I was testing my ECC on previously will become my NAS). This is a MSI MEG Unify x570 mobo + Ryzen 3900x.

I did a quick test (on Windows 10 only) with the ECC memory from my NAS on this MSI mobo as well:
  • MSI does run with the ECC memory
  • But it doesn't support the ECC functions at all. All programs that previously reported functioning ECC memory on the Asrock Rack, say there is no Error Correction on the MSI. Aida64 is the most precise and says "ECC: Supported, Disabled"
  • It also (logically) didn't report any memory errors after running prime95 with unstable memory settings
  • Anandtech actually reported the same in their MSI x570 Godlike review (which should be similar to the Unify and Ace)
Although it seems like this result is even worse, I actually think it is better to have it disabled then to have enabled but not working (actually pretending to have it).
Asus (Pro WS X570-ACE for example) and Gigabyte (Aurus Prod for example) say some of their boards have full ECC support. Asus says "depending on the CPU", but nowhere specifies which CPUs. Gigabyte say Ryzen-3000 and Ryzen-2000-pro (which is weird, because according to AMD there is no difference in ECC capability between pro and none pro). Anyway, I don't have Asus or Gigabyte, so I can't test those...

Would be nice if someone could... :)
With all the knowledge I gathered so far (and shared in this thread), it was less then a day work.
 

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
I have some good news, which should make reproducing (and validating after fixing) the issue a lot easier for Asrock Rack! It seems like Passmark have updated their MemTest86 product from version 8.2, which didn’t fully support Ryzen 3000, to version 8.3, which does fully support Ryzen 3000 (they forgot to put it in the changelog though).

This is very interesting, as MemTest86 Pro (not the Free version) supports ECC Injection:

ECC injection: Enabled/Disabled (Pro version only) - if CC detection/correction is supported/enabled and CC injecton is supported by the system this option enables/disables injecton of CC errors to simulate how the system responds to real CC errors. CC errors are injected at the start of each individual test. If CC injection is successful the details of the CC error shall be reported and displayed on screen as if an actual CC error was detected.

Notes Although ECC injection may be supported by your hardware, it may be locked by the BIOS. Some BIOS may allot you to unlock the ECC injection feature in the BIOS setup.

And Asrock Rack did do very well on that regard, as there is an option in the BIOS called “Disable Memory Error Injection”:
1577239993891.png

After setting this BIOS setting to false and enabling “ECC Injection” in MemTest86:
1577240006469.png

I ‘ve ran MemTest86 and it re-produces the issue perfectly:
1577240034567.png

As you can see, it successfully injects ECC errors, but doesn’t detect them, which is exactly the same as I was seeing when trying to trigger memory errors using unstable settings.
https://www.passmark.com/forum/memtest86/5984-how-do-you-verify-ecc-error-injection-working

Also I am very curious if this is only a Ryzen 3000 issue, as the motherboard was initially designed for Ryzen 1000 and Ryzen 2000 CPUs alone. Perhaps ECC does work for those older CPUs. Unfortunately I don’t have such a CPU to try this on (feel free to send me one for testing ).

I’ve forwarded this info to Asrock Rack…
 

Attachments

  • 1577240017845.png
    1577240017845.png
    159.2 KB · Views: 2
  • 1577240020701.png
    1577240020701.png
    159.2 KB · Views: 2

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
Bad news I’m afraid… I’ve received a response from Asrock Rack, with "official statement" from AMD on this, regarding ECC on this mobo (and AM4 in general):
Dear Mastakilla,

So many thanks for you detail experience.
We will share this information to RD 

However we got AMD official respond today

* AM4 support ECC function
* AM4 does not support ECC error reporting function

Here is the conclusion:
AM4 platform CPU (Ryzen 1000,2000,3000 series) can all support ECC correction, but not ECC report function

Best regards,
Kevin Hsiueh
Asrock Rack Incorporation
To which I responded:
Hi Kevin,

Thanks for getting back to me!

That is very unfortunate news…

Does this mean that the sensors for “DRAM ECC Error A1/A2/B1/B2” in the IPMI Event Log are unused and always will remain empty, even if memory errors do occur?
Do you know why these sensors then exist on this board? Were they simply copied over from an existing Intel /TR4 / Epyc Board, without testing them? Or were they added explicitly, but weren’t you aware of this missing feature (and also didn’t test it)?

Kind regards,

Mastakilla
And their response:
Dear Mastakilla,

According to AMD, X470 is desktop MB, and our QT won’t test ECC report function on desktop MB.
We follow AMD POR to writes specification.
In order to prevent misunderstanding, we will also remove ”DRAM ECC Error A1/A2/B1/B2” in the IPMI Event Log”.
Thanks for doing so many test and kind remind, and we will pay more attention on similar case in the future.

Best regards,
Kevin Hsiueh
Asrock Rack Incorporation
So no ECC reporting is supported…

Not entirely sure of this, but doesn’t this mean that:
  • there is no way to know for sure ECC is actually doing something or to validate that it actually works (even for Asrock Rack or AMD themselves).
  • there is no way to know if your memory is stable or not (ECC might be correcting errors all the time without you knowing about it). This is especially relevant if you want to overclock it.
I’m also not entirely sure all of this is true. Wendell told me he knew about people who reported logged error corrections on Ryzen. Perhaps AMD / Asrock Rack told me this to stop asking annoying questions about it? I certainly hope so ;) (please prove me wrong)
 

Mr. Friendly

Well-known member
Joined
Nov 21, 2007
Messages
6,802
Location
British Columbia
that really makes no sense, when ECC support was one of the boasting rights. it had made the homelab community so excited. much disappoint. :(
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
that really makes no sense, when ECC support was one of the boasting rights. it had made the homelab community so excited. much disappoint. :(
It doesn't mean its not working, the BMC/Windows cant log that it is .

Its one of those things you will just have to trust is working if its enabled and go with it. logged corrections are exceptionally rare as is (One server of mine has logged 1 correction in 5 years)
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
Of course you need a board that supports it, its not transparent there are lots of things that need to be in place. Including the traces on the motherboard for the check bit pins. Intel is no different, though they force you to go with a new , and functionally identical and most likely way more expensive, chipset.

However if the board supports it and the ECC support is showing as enabled it is working. You just don't get reporting. That much is confirmed.

I do agree it is super disappointing though and pretty stupid but I still am holding out hope this is more of a Agesa/Bios issue that AMD can fix . Will is the bigger problem, there is like 2 AM4 "server" boards. I just don't think its a market they care about or likely can win in. Better served to put out a 4C8T or 8C16T Threadripper.

I wonder if this is why Tyan all but abandoned their EX S8015. There was a lot of fanfair then poof. Feedback was likely it just doesn't work in the way customers want it to.

/Soapbox
You should never overclock ECC memory or any server stuff anyways, the whole point is 24/7 stability not "maybe stable". If your gonna play Russian roulette with your data I don't see why you would even go with the trouble of using ECC. Plug and pray and hope your backups are good.
 
Last edited:

Korenad

New member
Joined
Feb 15, 2020
Messages
1
I’m also not entirely sure all of this is true. Wendell told me he knew about people who reported logged error corrections on Ryzen. Perhaps AMD / Asrock Rack told me this to stop asking annoying questions about it? I certainly hope so ;) (please prove me wrong)
Definitely it's not true for all AM4. What about this initial article? We can see screenshots with the error messages reports. Perhaps AMD answers means only IPMI logging?

It is noteworthy, the ASrock x570 Taichi in the manual disappeared BIOS options for ECC, which were in versions x370 and x470.

But it's still represent in the x570 ASUS.

I don’t have any AM4 motherboard. I’m just looking for motherboard to buy for me for Ryzen 3900-3950 with ECC support. Therefore I saw this discussion.
 

diversity

Member
Joined
Apr 8, 2020
Messages
6
@Mastakilla, exactly what server grade x470 asrock rack mobo did you mention in your first message in this thread?

I have been trying now with an ASrock Rack X470D4U (latest bios with ECC enabled, ECC injection enabled, Platform First error handling disabled) with a Ryzen 9 3950x with no luck.
I am using Passmark Memtest pro 8.4 rc2 Build 1001 but Passmark is still troubleshooting on the basis of my debug logs.

AMD is assuring me that a previous setup I tried
Asrock x570 (AMD AM4 socket, AMD X570 Chipset ) Creator with a Ryzen 9 3950x
does support ECC correction and reporting but I started using memtest86 pro after I had returned that setup.
I am willing to try that setup again in case the audience here will find that useful.

Anyway could it be the problem is not the CPU or Bios persee but perhaps other components?
I have for one on my X470D4U mobo the following:
Processor System
CPU - AMD AM4 Socket Ryzen™ PRO/ Ryzen™ 2nd and 3rd generation series processors
Socket - AM4 PGA 1331
Chipset - AMD Promontory X470
 

clshades

Well-known member
Joined
May 18, 2011
Messages
4,277
Location
Big White Ski Resort
Humbly, I barely understand half of what you guys are talking about, however, if I'm to build a video editting rig, ECC is better even if I'm not getting reports?
 

Twitter

Top