Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fault detection



And as I noted in an email to Frank, don't forget the odd and very unpredictible hardware glitches that may be due to lose or defective chips other than memory. My LX based systems have odd crashes that I can never explain, including random freezes and reboots
that seem to occur with no regular pattern, although it seems my mouse can set off a freeze on occasion.  I did once discover that the SCSI bus was not properly terminated, and that a heavy disk i/o load would then cause a freeze, but these are not that kind
of problem, as they usually occur with no disk i/o or even significant processor loading.

Scott Prive wrote:

> You could have faulty memory. The memory test on starting the BIOS or any OS, is a very limited test.
>
> If you have a faulty memory chip, you increase the chances of "hitting it" by running some high-load benchmarks or system tests.
>
> These are disruptive to a production server, but cpuburn, memtest, and the Linux Test Project are all pretty rough on the kernel & hardware. If there is an intermittent problem, these could help you triage more quickly than waiting for the next crash event.
>
> -----Original Message-----
> From: FRamsay at castelhq.com [mailto:FRamsay at castelhq.com]
> Sent: Monday, August 19, 2002 3:42 PM
> To: discuss at blu.org
> Subject: fault detection
>
> Does anyone know of any tools to help figure out why a box rebooted?  One
> of our client boxes rebooted
> over the weekend for no apparent reason.  The client claimed there was no
> power outage, and a quick look
> over the logs verifies the UPS didn't shut the computer down.  Also I
> didn't see a shutdown or reboot request
> in /var/log/messages.  So what tools do people use to figure out why a
> Linux system crashed?
>
> the system is running Redhat 7.2  kernel 2.4.9-13
>
>                -fjr
>
> Frank Ramsay
> Systems Programmer
> Castel, Inc
> 14 Summer St, 3rd Floor
> Malden, MA 02148
> (781) 324-0140 (voice)
> (781) 324-0277 (fax)
> Emal: framsay at castel.com
>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://www.blu.org/mailman/listinfo/discuss





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org