mcelong reports AMD DRAM Parity Error?

Bill Bogstad bogstad-e+AXbWqSrlAAvxtiuMwx3w at public.gmane.org
Thu Nov 18 12:06:21 EST 2010


On Thu, Nov 18, 2010 at 10:44 AM, Jarod Wilson <jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org> wrote:
> On Nov 18, 2010, at 10:30 AM, Derek Atkins wrote:
>
>> Hey,
>>
>> Back onto my mcelog issue from a while ago..
>
> Crap, I apologize, I'd meant to follow up on this, and it fell
> through the cracks... So I jumped right on it right now.
>
>> I finally updated to the
>> newly released mcelog.x86_64 2:1.0-0.1.pre3.fc13 and when I ran mcelog
>> I got this output:
>>
>> HARDWARE ERROR. This is *NOT* a software problem!
>> Please contact your hardware vendor
>> MCE 0
>> CPU 0 4 northbridge TSC 24b8cb30a62636
>> MISC c008000001000000 ADDR 3c5e80c80
>>  Northbridge DRAM Parity Error
>>       bit34 = err cpu2
>>       bit43 = L3 subcache in error bit 1
>>       bit46 = corrected ecc error
>>       bit59 = misc error valid
>>  memory/cache error 'generic read mem transaction, generic transaction, level generic'
>> STATUS 9c294834001d011b MCGSTATUS 0
>> SOCKETID 0
>>
>> Does this mean I have a busted CPU?  Or busted RAM?
>
> RAM. However, its not a fatal error, its simply a corrected
> ecc error. I'm told this is all a single event here, and the
> event was the corrected ecc error, anyway. So you might want
> to replace some memory at some point, but hey, its ecc memory
> doing what its designed to do here.

Could be neither actually.    Random radiation can flip bits occasionally...

http://en.wikipedia.org/wiki/Soft_error

Unfortunately, over the years I've never seen any numbers on what the
expected rate for such radiation induced memory errors should be.

Bill Bogstad






More information about the Discuss mailing list