Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Strange log indications



On Thu, Mar 24, 2011 at 10:25 AM, Jerry Feldman <gaf-mNDKBlG2WHs at public.gmane.org> wrote:
> I have a system where the NICs tend to go offline every few days
> (probably a couple of weeks). I've been looking at the logs for a
> possible indication of problems, buit I'm sure it's a motherboard issue.
> One thing I'm seeing in the logs is when the NICs fail (I have both NICs
> with IP addresses to see if one NIC fails and the other stays up) but
> both fail simultaneously.
>
> The relevant log entries are below. The first at 00:01:41 indicates the
> failure, but the second one 6 minutes later indicates a successful NTP
> sync. The next 2 log entries just confirm the NICs have failed. I have a
> script running on that box to give me some additional info, but it did
> not give me what I want. Note that I have VMWare server 2.0 running on
> this box, but we are planning to move VMWare off to another dedicated
> machine that is on order. My script is just a simple script that does a
> ping and logs success or failure. Rather than fill up the logs, the
> script edits the log with the intent I want to know the time of the
> first and most recent failure.
>
> Mar 24 00:01:41 boslc06 automount[4263]: host bosnas2: lookup failure 2
> Mar 24 00:07:02 boslc06 ntpd[4465]: synchronized to 64.73.32.134, stratum 2
> Mar 24 00:45:06 boslc06 automount[4263]: set_tsd_user_vars: failed to
> get passwd info from getpwuid_r
> Mar 24 00:45:37 boslc06 automount[4263]: host bosnas2: lookup failure 2

Not clear if you want to investigate this further, but you might try modifying
your ping script to gather more information when a failure occurs.  Perhaps
a "arp -an" to see what is in the ARP cache.  "tcpdump/tshark -w" to capture
any packets that are traversing that interface.  If you use a "-c",
you can limit the
number of packets saved so you won't fill up the disk.  This might
tell you if the
failure is in both directions or in just one.   Use
mii-diag/mii-tool/ethtool to capture the
state of Ethernet speed/duplex negotiation from the perspective of the host.
You don't report any errors from the kernel about actual interface
errors which is a bit
odd.  That implies the kernel thought it was successful on outgoing packets.
Try running "dmesg" as well from your script to check on this.

Good Luck,
Bill Bogstad





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org