HardwareCanucks said:
That is an uncorrected error (UE), otherwise known as a multi-bit error or a hard error. Multi-bit errors cannot be corrected by ECC memory. What is supposed to happen when they occur is that they should be detected, logged and the system should be immediately halted.
I disagree. An uncorrectable error does not mean that the system must be halted.
A machine check exception (MCE) will be raised and the operating system will be informed about the error. The operating system can then look up how the affected memory region is used, and take action based on this.
If it is kernel memory or dirty cache, then usually the system must be halted to prevent further corruption.
If it is unused memory, then nothing needs to happen.
If the memory belongs to a process, then that process can be terminated.
If it is non-dirty cache memory, then the cache page can be discarded.