Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Doug wrote: > Thinking about the future, does anyone regularly monitor their hard > disk SMART (Self-Monitoring, Analysis, and Reporting Technology) > information? Yes, I use smartmontools on all systems. (I don't own anything that runs OS X, so I can't comment on its use on that platform.) smartmontools includes the smartd daemon, which runs continuously and monitors your drive's SMART data. I have it set to alert me via email for critical problems, and I also have GUI desktop notifiers installed that work with it ('smart-notifier' on Linux). Additionally, logwatch (log file monitoring tool) also reports on anything interesting logged by smartd. > Are there scripts anyone uses that go through the same data and/or run > the short and long tests? The smartd daemon, which when configured to do so, can run tests on a regular schedule. I use the following config on my laptop, for example: /dev/sda -a -I 194 -W 4,45,55 -R 5 -s (L/../../6/03|S/../.././05) -m root -M exec /usr/share/smartmontools/smartd-runner That breaks down as: /dev/sda - the drive -a - turns on a bunch of common options, like reporting of errors and self-test results -I - ignore a specific attribute -W - set temperature limits -R - monitor the raw value of a specific attribute -s (L/../../6/03|S/../.././05) - schedule a weekly long test and daily short test -m root - send emails to root -M exec /usr/... - run this script on errors (smartd-runner is a script bundled with the package that just iterates through all script in /etc/smartmontools/run.d/.) I've forgotten the specifics of why I have the -I and -R switches set. I probably have notes on other systems where I've used those. You do sometimes need to tune the parameters for specific drives. Ignoring an attribute here, or explicitly monitoring an attribute there. smartd does support a DEVICESCAN option where it will find all the drives on your system and monitor them using defaults. That has some low-maintenance appeal (if you add/remove drives, it'll automatically adjust), but I tend to have better luck when I explicitly list each drive, and that also lets me stager the self-test times. > The long tests can last 3 hours. It depends on the drive. The drive will suspend the self-test if there is too much I/O activity. (At least that's what my current drives do.) > ...the replacement...disk is running...48C versus 41C. What temp is > BAD? I googled this recently myself, because a recently replaced drive on a laptop has been frequently exceeding the 45C max, and occasionally exceeding the 55C critical limit. The postings I turned up had no definitive answers. Some mentioned that the drives are typically specified to have a 60C max temperature by the manufacturers. Others referenced a Google whitepaper that said temperatures were optimally kept between 30 and 40C, while below 20C actually increased failure rates. (This is all second hand info. Check primary sources before relying on these numbers.) So clearly greater than 60C is bad. But will greater than 50C reduce the lifespan of your drive? Perhaps. With the space constraints in a laptop, I'm not sure what you can do about it, as long as the airways are free of dust, the area around the machine is clear, and the fans are functioning. I suppose you could hack the fan controls to boost the fan speed. > Now that the test is done, the hot disk is down to 38C. 4 of the 5 times the drive I mentioned above hit the critical limit it was at 3:32 AM, which suggests some scheduled job is triggering it. The long self-test does run at 3 AM, but not on the days when the over temp happened, so it must be something else. What I'd really like to see is a GUI tool that would read in the SMART logs and show temperature graphs over time and average temperature. I'm not concerned at all if the drive is only hitting 55C for a matter of minutes. Jason Normand wrote: > You should be able to setup smartmontools to run as a crown job... Typically you'd use smartd, which has a built-in scheduler. Scott Ehrlich wrote: > Long story short, smart is simply not reliable. Sure, in the sense that SMART is not guaranteed to indicate a failure before it happens, but that's hardly a reason not to use it. The Google whitepaper on drive reliability has some interesting stuff to say about SMART monitoring. (The paper has been mentioned on Discuss before. Check the archives.) -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: http://tmetro.venturelogic.com/
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |