Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fault detection



You could have faulty memory. The memory test on starting the BIOS or any OS, is a very limited test. 

If you have a faulty memory chip, you increase the chances of "hitting it" by running some high-load benchmarks or system tests. 

These are disruptive to a production server, but cpuburn, memtest, and the Linux Test Project are all pretty rough on the kernel & hardware. If there is an intermittent problem, these could help you triage more quickly than waiting for the next crash event.





-----Original Message-----
From: FRamsay at castelhq.com [mailto:FRamsay at castelhq.com]
Sent: Monday, August 19, 2002 3:42 PM
To: discuss at blu.org
Subject: fault detection


Does anyone know of any tools to help figure out why a box rebooted?  One
of our client boxes rebooted
over the weekend for no apparent reason.  The client claimed there was no
power outage, and a quick look
over the logs verifies the UPS didn't shut the computer down.  Also I
didn't see a shutdown or reboot request
in /var/log/messages.  So what tools do people use to figure out why a
Linux system crashed?

the system is running Redhat 7.2  kernel 2.4.9-13

               -fjr


Frank Ramsay
Systems Programmer
Castel, Inc
14 Summer St, 3rd Floor
Malden, MA 02148
(781) 324-0140 (voice)
(781) 324-0277 (fax)
Emal: framsay at castel.com


_______________________________________________
Discuss mailing list
Discuss at blu.org
http://www.blu.org/mailman/listinfo/discuss




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org