Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Jarod, On Tue, September 28, 2010 10:10 pm, Jarod Wilson wrote: > On Tue, Sep 28, 2010 at 10:02 PM, Derek Atkins <derek-CrUh67yIh4IAvxtiuMwx3w at public.gmane.org> wrote: >> Ok, >> >> On Tue, September 28, 2010 9:47 pm, Jarod Wilson wrote: >>> On Tue, Sep 28, 2010 at 9:48 AM, Derek Atkins <warlord-3s7WtUTddSA at public.gmane.org> wrote: >>>> I noticed the following in my mcelog, and I was hoping someone could >>>> help be decode this. ?My google fu has not let me to an answer. >>>> >>>> I'm running a Supermicro H8DA3-2 with two Quad-Core AMD Opteron(tm) >>>> Processor 2378 and 16GB of RAM (8 sticks of ACTICA DDR2 667 2GB ECC >>>> REG) >>>> purchased with the machine in Jan, 2009. >>>> >>>> Is this a memory issue? >>> >>> At first glance, it looks to be a bad cpu l3 cache, but hard to say for >>> sure... [snip] >> tail -6 /var/log/mcelog ?| mcelog --k8 --ascii >> MCE 0 >> HARDWARE ERROR. This is *NOT* a software problem! >> Please contact your hardware vendor >> MISC c008000001000000 ADDR 234909fc0 >> STATUS 9c524484001d011b MCGSTATUS 0 >> HARDWARE ERROR. This is *NOT* a software problem! >> Please contact your hardware vendor >> CPU 0 0 data cache MISC c008000001000000 ADDR 234909fc0 >> ?Data cache ECC error (syndrome a4) >> ? ? ? bit34 = err cpu2 >> ? ? ? bit42 = L3 subcache in error bit 0 >> ? ? ? bit46 = corrected ecc error >> ? ? ? bit59 = misc error valid >> ?memory/cache error 'generic read mem transaction, generic transaction, >> level generic' >> STATUS 9c524484001d011b MCGSTATUS 0 >> (Fields were incomplete) >> >> So what does this mean? > > Well, mcelog seems to think you have a bad CPU, but I'd have to talk > to some of the hardware folks at work to get a better idea exactly > what's up. Seems possible its just an ecc memory error too though, and > one that was corrected. Do you have any edac modules loaded? Not sure > if that box needs edac_amd64 or something else, and/or when exactly it > was that edac_amd64 finally got merged upstream (and therefore into > the Fedora kernels). Yeah, lemme (try to remember to) poke some folks > who actually work on this code and know the hardware better > tomorrow... Thanks. It *is* ECC memory. I'd much rather replace my 2yo ECC RAM than replace my CPU. In either case it's annoying. I temporarily downgraded from 2.6.34.6-54 to 2.6.27.41-170.2.117 in order to keep my VMs from dying and this seems to be helping. Running 2.6.34 I'd have periodic cases where VMs would spin, md_raid would spin, and the network would drop to all my VMs, and sometimes the VMs would report ATA/SCSI disk errors. I didn't have any of those issues prior to upgrading, and haven't had them since rebooting into the older kernel. But I suspect there's still really a hardware problem somewhere. :( > Jarod Wilson > jarod-ajLrJawYSntWk0Htik3J/w at public.gmane.org -derek
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |