What's new
  • Please do not post any links until you have 3 posts as they will automatically be rejected to prevent SPAM. Many words are also blocked due to being used in SPAM Messages. Thanks!

ECC Memory & AMD's Ryzen - A Deep Dive Comment Thread

Mr. Friendly

Well-known member
Joined
Nov 21, 2007
Messages
6,798
Location
British Columbia
Yeah I am not sure what what it would take to simulate. You need to get the timings such that writes work perfectly fine and just a few reads will fail. Too many, or to big of an error (Unrecoverable) and the system will crash.

I have never overclocked ECC ram, as that is kinda counter productive, so I am not sure what it would take.

Assuming it is even working at all. I would expect them to show up in the IPMI side over the OS if it is a drive issue, and if that isn't working it likely isn't catching them or your just extremely lucky writing 10=reading 10 until you hit a speed then nothing works.
no real reason to overclock now that CPU's support 3200mhz RAM and you can guy 3200mhz ECC.

also, I wonder if Windows 10 for WorkStations or Enterprise would report ECC function correctly?
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
True
also, I wonder if Windows 10 for WorkStations or Enterprise would report ECC function correctly?
Even if it was a windows issue, it should show in the BMC and/or Linux.

ECC is really hard to test and you can go months if not years without a correction naturally. If it says its enabled, I would trust that it is. All you can really do.
 

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
Yeah I am not sure what what it would take to simulate. You need to get the timings such that writes work perfectly fine and just a few reads will fail. Too many, or to big of an error (Unrecoverable) and the system will crash.

I have never overclocked ECC ram, as that is kinda counter productive, so I am not sure what it would take.

Assuming it is even working at all. I would expect them to show up in the IPMI side over the OS if it is a drive issue, and if that isn't working it likely isn't catching them or your just extremely lucky writing 10=reading 10 until you hit a speed then nothing works.
I'll try messing with some timings a little more... advice on which timings I shoot try loosening or tightening is very welcome ;)

no real reason to overclock now that CPU's support 3200mhz RAM and you can guy 3200mhz ECC.

also, I wonder if Windows 10 for WorkStations or Enterprise would report ECC function correctly?
I wasn't able to find 16GB ECC Dimms @ 3200 here. The highest I found was 2666 actually.

True

Even if it was a windows issue, it should show in the BMC and/or Linux.

ECC is really hard to test and you can go months if not years without a correction naturally. If it says its enabled, I would trust that it is. All you can really do.
Then why is there an article like the one linked at start of this thread? :D How did they manage this? I suppose this method has been tested out successfully for another platform at least once, no?
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
This is the only article I have ever seen that recommended or tried making your system unstable to hope that the ram will catch an error while somehow not crashing.
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
If I had to guess I would look at the refresh timings maybe. Something like tRCD .

basically need to have a write succeed, but partially fail when read back with one or more bits flipping in the time between the write and read.

does the bios have a way of doing scrubbing?
 

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
The issue I have with these tests is unless you have a way of knowing for sure. If your not getting any errors then either it’s not unstable enough or it’s not reporting which is maddening 😔

Why most just assume it’s working if it’s reported as working.
 

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
Here are the scrubbing options from the BIOS. To disable ECC, should I mess with these as well?
1572662242730.png1572662256205.png
1572662266221.png 1572662275581.png

edit:
Just spend a large part of the day lowering timings and trying to trigger errors, but still no luck...

The frequency I've increased from 1333Mhz to 1500Mhz (1533Mhz doesn't post at all). The screenshot on the left side below, show the 1500Mzh frequency with the default timings @ 1333Mhz.
On top of that, I've tightened the first 5 timings, as you can see on the screenshot on ther right side below.
1572742216516.png 1572741985174.png

But even with those tighter timings, I'm still not seeing any memory errors at all in both IPMI and Windows / Linux. I'm already in the 2nd memtester loop and still it didn't crash or error out. The system is still either rock-solid-stable (with the above settings or doesn't post at all and requires a CMOS reset (when I lower those timings 1 more step).

I wanted to try and lower the voltage of the RAM, but I can't really find which voltage to lower (and where it is hidden in the BIOS).
 
Last edited:

Mastakilla

Member
Joined
Oct 22, 2019
Messages
13
After a lot of testing I've now finally mastered RAM overclocking in a way that I can vary between stable and unstable settings. The trick is lowering the voltage for unstability, as lowering the timing or increasing the frequency too much mostly will cause it to stop booting instead of becoming unstable.

After figuring this out I've done a lot of testing. I've tested from hardly bootable to slightly unstable using MemTest86, memtester (on Fedora Rawhide with kernel 5.4.0.0.rc3 and 5.4.0.2) and prime95/aida64_bench/Ryzen_Master_test (on a fully updated Windows 10 Pro, first with amd_software_1.09.27.1033.zip and later with amd_chipset_software_1.11.22.454.zip chipset drivers).

To give you an idea of the testing I've done, here is an Excel I've created to keep track of things:
1576106919540.png

In mean time I've had millions of memory errors (in total) in very varied conditions. It seems almost impossible to me if there was not a single single-bit-error or two-bit-error in all these millions of errors.

But... Unfortunately I couldn't find any report of a corrected or logged memory error in either the IPMI Event Log, the Linux edac-util or the Windows Event Viewer (even though all of these report ECC to be active and correctly configured - see my posts above).

Now I know that doesn't mean that no memory error-corrections have happened, but that is only half of what ECC functionality is. Reporting / logging these memory error-corrections is at least as important as the actual correcting itself (How else can you know your RAM is dying or is unstable. That's like having a RAID5 which doesn't notify you that one of your disks is dead :p).

So it seems to me that ECC is not working on this motherboard with a Ryzen 3000 CPU (I don't have the older Ryzen CPUs for testing).

I've reported this to Asrock Rack and they've send me the following response:

Dear Mastakilla,

Due to X470 belongs to desktop series
It’s not like server MB has native support of ECC report.
We are checking with RD and AMD if X470 can support ECC report.
We will reply to you ASAP

Best regards,
Kevin
Asrock Rack Incorporation

I've replied to this with:

Hi Kevin,

Thanks a lot for looking into this! That is greatly appreciated…

I understand that the X470 is indeed a desktop chipset. Also all AM4 CPUs don’t have officially validated ECC support by AMD (although AMD confirmed that it wasn’t disabled).
So you could argue that non-validated half-working (not reporting / logging) ECC support is acceptable. And I also agree with that, for consumer brands like Asrock, Asus, MSI, etc.
However, if a brand like Asrock Rack or SuperMicro creates a X470 motherboard with “Supports 4x DDR4 ECC and non-ECC UDIMM, max. 128 GB” in the specifications and if the IPMI Event Log contains sensors for “DRAM ECC Error A1/A2/B1/B2”, then people (like myself) will assume that it is actually working and validated. In that case, I don’t think that it is acceptable for it not to work 100%, as people buying these brands, actually are expecting it to fully work. I don’t think that is a reputation or name you are looking for, as a brand called “Asrock Rack” 😊

Please let me know if there is anything else I can do to assist.

Kind regards,

Mastakilla

The response from Asrock Rack seems to admit that it currently does not fully support ECC, however, it could also just mean that Kevin is not sure about it... So I'm hoping for a decent response from their R&D.

It would be nice if someone could try some testing with a Ryzen 1000 or Ryzen 2000, to see if ECC works with those CPUs...
 
Last edited:

Entz

Well-known member
Joined
Jul 17, 2011
Messages
1,870
Location
Kelowna
Nice work either way.

It would be a shame that it isn't enabled. Why even list ECC support if it doesn't actually work? Sure it will "boot" but still big fail there.

So hopefully you are correct and this is just a CSR thing, but based on what you have found I have my doubts it actually works.
 

Latest posts

Twitter

Top