fault detection

Scott Prive Scott.Prive at storigen.com
Mon Aug 19 17:51:26 EDT 2002


You could have faulty memory. The memory test on starting the BIOS or any OS, is a very limited test. 

If you have a faulty memory chip, you increase the chances of "hitting it" by running some high-load benchmarks or system tests. 

These are disruptive to a production server, but cpuburn, memtest, and the Linux Test Project are all pretty rough on the kernel & hardware. If there is an intermittent problem, these could help you triage more quickly than waiting for the next crash event.





-----Original Message-----
From: FRamsay at castelhq.com [mailto:FRamsay at castelhq.com]
Sent: Monday, August 19, 2002 3:42 PM
To: discuss at blu.org
Subject: fault detection


Does anyone know of any tools to help figure out why a box rebooted?  One
of our client boxes rebooted
over the weekend for no apparent reason.  The client claimed there was no
power outage, and a quick look
over the logs verifies the UPS didn't shut the computer down.  Also I
didn't see a shutdown or reboot request
in /var/log/messages.  So what tools do people use to figure out why a
Linux system crashed?

the system is running Redhat 7.2  kernel 2.4.9-13

               -fjr


Frank Ramsay
Systems Programmer
Castel, Inc
14 Summer St, 3rd Floor
Malden, MA 02148
(781) 324-0140 (voice)
(781) 324-0277 (fax)
Emal: framsay at castel.com


_______________________________________________
Discuss mailing list
Discuss at blu.org
http://www.blu.org/mailman/listinfo/discuss



More information about the Discuss mailing list