Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

decoding MCE Logs? Possible hardware issue?



Ok,

On Tue, September 28, 2010 9:47 pm, Jarod Wilson wrote:
> On Tue, Sep 28, 2010 at 9:48 AM, Derek Atkins <warlord-3s7WtUTddSA at public.gmane.org> wrote:
>> I noticed the following in my mcelog, and I was hoping someone could
>> help be decode this. ?My google fu has not let me to an answer.
>>
>> I'm running a Supermicro H8DA3-2 with two Quad-Core AMD Opteron(tm)
>> Processor 2378 and 16GB of RAM (8 sticks of ACTICA DDR2 667 2GB ECC REG)
>> purchased with the machine in Jan, 2009.
>>
>> Is this a memory issue?
>
> At first glance, it looks to be a bad cpu l3 cache, but hard to say for
> sure...
>
> $ <paste your log into file 'log'>
> $ mcelog --k8 --ascii < log
>
> mcelog: Cannot open /dev/mem for DMI decoding: Permission denied
> MCE 0
> HARDWARE ERROR. This is *NOT* a software problem!
> Please contact your hardware vendor
> MISC c008000001000000 ADDR 1c88309c0
> STATUS 9c6cc450001d017b MCGSTATUS 0
> HARDWARE ERROR. This is *NOT* a software problem!
> Please contact your hardware vendor
> CPU 0 0 data cache MISC c008000001000000 ADDR 1c88309c0
>   Data cache ECC error (syndrome d9)
>        bit42 = L3 subcache in error bit 0
>        bit46 = corrected ecc error
>        bit59 = misc error valid
>   memory/cache error 'evict mem transaction, generic transaction, level
> generic'
> STATUS 9c6cc450001d017b MCGSTATUS 0
> (Fields were incomplete)
>
> I'd run mcelog with root privs on that machine itself and without the
> --k8 flag (I ran on an Intel box) to make sure its got the right cpu
> type and access to /dev/mem for more accurate results...

tail -6 /var/log/mcelog  | mcelog --k8 --ascii
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MISC c008000001000000 ADDR 234909fc0
STATUS 9c524484001d011b MCGSTATUS 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 0 data cache MISC c008000001000000 ADDR 234909fc0
  Data cache ECC error (syndrome a4)
       bit34 = err cpu2
       bit42 = L3 subcache in error bit 0
       bit46 = corrected ecc error
       bit59 = misc error valid
  memory/cache error 'generic read mem transaction, generic transaction,
level generic'
STATUS 9c524484001d011b MCGSTATUS 0
(Fields were incomplete)

So what does this mean?

> Jarod Wilson
> jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org

-derek








BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org