UNIX process monitor

Mon Nov 21 21:46:06 EST 2005

Over the weekend I received some unusual looking email from one of the 
monitoring tools I run on my mail server, and while investigating it I 
discovered that a bunch of instances of a program I use to download 
email from a Yahoo! account were stuck in endless loops and filling up 
my process table (due to a data provoked bug). (The alert email I 
received had nothing directly to do with the hung processes.)

That made me think that I probably should be running a program to 
monitor the process list and spit out a warning when something looks 
unusual, given that this is a lightly used system, and I rarely have 
occasion to look at the process list myself.

A Freshmeat.Net search did turn up a couple of tools:

Process Change Detection System
http://doornenburg.homelinux.net/scripts/pcds/

Procwatch
http://freshmeat.net/projects/procwatch/

but they don't quite do what I want.

Procwatch notes all changes (process start/stop) and outputs that data 
in a format suitable for logging. If you then ran a sophisticated log 
monitor tool, you could probably get it to trigger alerts only when 
things looked strange. (And given that a proper security setup should 
include such a log monitor anyway, maybe this is the way to go.)

PCDS is a bit closer in concept to what I envisioned. You run it once to 
establish a baseline for your system. It generates a file with process 
names and the count for each. You can then manually edit it to change 
the numbers to ranges, if you wish. (i.e. httpd 5-10)

Then on subsequent runs, it generates a report of how things differ from 
the baseline. But as you might expect, this would lead to a lot of 
unnecessary noise.

So I started down the path of coding up my own tool (borrowing ideas 
from both of the two scripts above) which would do things like trigger 
an alert when the total number of processes changes by more than X 
percent between sampling periods, or if the quantity of a single process 
type (i.e. httpd) changes by more than X percent between sampling 
periods, as well as checking for hard limits on the maximum number of 
processes and process types.

But percentage change doesn't work so hot for a small group of processes 
that are all the same type. Going from 4 instances of smbd to 6 is a big 
percentage change, but in reality not all that noteworthy.

It really needs to be smarter. What I'd really like is a program that 
runs for a week or so in learning mode, develops a database of what is 
"normal" and then sends alerts for when it notices unusual behavior.

Does anyone know of a tool that does this? I'm sure there are intrusion 
detection tools that incorporate this, but following the UNIX 
philosophy, I'd rather use a tool that specifically addressed this need.

Alternatively, do you have thoughts on what data should be recorded by 
such a tool? For example, would it be useful to track how often process 
X is seen (and how many instances) during a day, and then at the end of 
the day calculate an average for it. Maybe the same for week and month 
periods. Then you could say that it is "normal" for process X to appear 
Y times over Z period, and thus be able to detect when things were abnormal.

What I'd like to avoid is triggering an alert when I happen to be doing 
some infrequent maintenance work, while still quickly catching that 
there are too many fetchyahoo processes running (the program that had 
the bug), or that there's an unexpected sshd running (possible 
backdoor). (Though I don't see this tool as being focused on intrusion 
detection.)

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/