BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] ATA Access Errors For Spinning Disk
- Subject: [Discuss] ATA Access Errors For Spinning Disk
- From: jbk at kjkelra.com (jbk)
- Date: Tue, 19 Dec 2023 08:06:31 -0500
- In-reply-to: <20231217210539.6f085ceb@mydesk.domain.cxm>
- References: <e9882f74-8f25-407a-8efa-bf9554515e32@kjkelra.com> <20231217210539.6f085ceb@mydesk.domain.cxm>
Thank's Steve. I may incorporate some of the recommendations in the future. For the present I'm going to look at the power connection to the disk. I'm also going to grab a newish spare and connect it up to see if it throws similar errors. Jimk On 12/17/23 21:05, Steve Litt wrote: > jbk said on Sun, 17 Dec 2023 10:13:36 -0500 > >> I periodically get access errors for a specific spinning >> disk that I have done these things to diagnose: >> Changed Sata Cable >> Switched Sata bus on MB >> Run E2fsck on the 3 formatted ext4 partitions w/ no errors found >> Run smartctl -a: all results within norms >> Run smartctl -t short: No errors found >> >> Disk operation age is about 7.5 years with around a couple >> hundred starts. It has been in continuous operation for over >> 8 years except during vacations. On occasion the disk >> partitions will become unmounted and a mount -a will remount >> the partitions as a different device from lets say sda to >> sdd. I've not lost any data and I do regular backups to >> another device that's rotated out of system. >> >> I seem to have always had these errors present on this MB >> that is maybe 4 or 5 years in operation. Any thoughts on the >> cause of this issue? Do others see this behavior on occasion >> on systems they manage? >> >> On this same system my Rocky OS on an SSD is showing no >> issues at all. Same operation age as the spinner. > I really like the troubleshooting strategy you've pursued in trying to > find the root cause of this intermittent problem. As we all know, > intermittents are much more difficult to diagnose than reproducible > symptoms. If you look at the Universal Troubleshooting Process (UTP) on > Troubleshooters.Com, you'll see that UTP step 5, Corrective > Maintenance, is extremely powerful and necessary with intermittent > problems. I have some suggestions for Corrective Maintenance and > further diagnostic tests... > > * You get occasional disk errors, any of which could cause data > corruption. To prevent things from getting worse, boot a rescue > distro and ddrescue your current disk to a larger disk, and if you > ever mount that backup disk, mount it read-only. > > * Lubricate all electronic contacts for all cables, daughter cards, RAM > sticks, switches with associated cables, and jacks and plugs for all > peripherals. Apply the lubricant to conductive surfaces on both plug > (male) and jack (female), then insert and remove twenty times to bust > off all corrosion. Please take 10 minutes to read this 20 year old > discussion of electronic lubrication: > http://troubleshooters.com/tpromag/200310/200310.htm > I've used transmission fluid, WD-40, Lube-Job electronics lubricant, > Breakthrough CLP, WD-40, Deoxit Gold, Superslick Slick Stuff, and CRC > QD Contact Cleaner, and was very satisfied with all of them. I > currently use mostly Superslick Slick Stuff. The important thing is > that there's residual lubrication to prevent build-up of Fretting > Corrosion. Stabilant 22 and Deoxit Gold are the safest to prevent > damage to non-metals and prevent conduction between non-mating > surfaces, but they're pretty expensive. My experience has been that > as long as I carefully limit application to the mating conductors. > Lubricating all mating electronic contacts takes 2 or 3 hours, but > doing so can save you weeks of frustration if an intermittent is > being caused by fretting corrosion between electronic contacts. I do > complete electronic contact during the initial build of all my > computers. Because you've observed this intermittent since you bought > the mobo several years ago, lubricating the RAM stick contacts is > especially important, as it's likely those sticks have been in place > since you bought the mobo. > > * Run a complete RAM test overnight by booting a memtest86 CD or thumb > drive. Get rid of any sticks with errors. Intermittents are too > expensive to try to limp along with RAM errors. Note that if you're > not using UEFI, you'll need an older version of memtest86. > > * Temporarily swap in a known good power supply, use for several days, > and see whether the problem has gone away. If so, use the known good > power supply or a known good newly purchased power supply. If the > problem persists, put back the original power supply at the > conclusion of troubleshooting. > > * Power switches and reboot switches can go intermittent and cause > hangs and spontaneous reboots. If I have suspicions of these things, > I disconnect the reboot switch (you can always unplug the computer > for an abrupt shutdown), and temporarily disconnect the power switch, > starting and stopping the computer by CAREFULLY shorting the power > switch pins with a screwdriver. I then run the machine for about 3 > days to see if the problem really went away. If the problem appears > to be the power switch, I replace it with a cheap, wired, no light, 2 > contact doorbell switch, available at home warehouse stores. If you > can't find it there (most doorbell switches are now lighted), I'm > pretty sure that this is what you need: > https://www.ebay.com/itm/155929670486 . You might need extra wire so > your front panel can be removed enough to service the front parts > without needing to disconnect the power button leads and fish them > around the motherboard and through the chassis. > > * If you're overclocked, roll it back to the non-overclocked > frequencies. Often simply telling the BIOS to reset to its factory > state is a great way to rule out a whole bunch of BIOS caused > problems. As always, test for several days to make sure the > intermittent symptom really went away. > > * Use various sensor programs to check various CPU temperatures and > disk temperatures. If temperatures even begin to approach maximum > specs, take > > * Try to observe whether this intermittent symptom occurs significantly > more when running a specific set of software, and act accordingly. > > * Boot a radically different distro, use for several days, and see if > the intermittent symptom still occurs. If so, you've for the most > part ruled out your distro, software, and config settings. If not, > investigate your software and configs. > > * If none of the preceding works, you need to consider how much time, > money and energy you're willing to throw at this intermittent problem. > If you have a known good spinning rust hard disk bigger than the > current one, you could ddrescue the current one onto the new, bigger > one, test for a few days, and if the symptom doesn't recur, the hard > disk had a problem not detected by smartctl. > > * If none of the preceding works, you need to consider how much time, > money and energy you're willing to throw at this intermittent > problem. Personally, at this point, I'd byte the bullet and buy a new > motherboard, ram and processor and processor heat sink. Be sure to > use high quality thermal heat sink compound between processor and heat > sink, be sure to remove any labels the manufacturer stupidly put on > the processor where it should be mating with the heat sink, and clean > all label adhesive residue before applying heat sink compound. Don't > cheap out on the heat sink: A lot of times the heat sink packaged > with the processor is great for email and light web browsing, but > allows overheat in intense operations like compiling a kernel. > Remember, you want this new setup to last for many years. > > * If you're going to buy a new mobo, CPU and RAM anyway, it costs you > nothing to take the very risky step of updating your BIOS. Who knows, > it might work. Because of risks involved in BIOS updates, I don't > recommend them except in cases where your symptom is a well known > effect of your specific BIOS version, or else when you're about to > throw the mobo in the trash anyway. Be sure to run the computer on a > known good uninterruptable power supply when updating your BIOS so > your electric company's problems don't brick your computer. > > I'm very aware of the time and energy the preceding steps require. Your > computer is now 8 years old and probably anemic by today's standards. > If your current computer has enough capability for your needs, you > could probably buy a whole new computer of equal capability for under > $700. If you want to replace it with a modern computer with huge > capacity, you can probably do it for between $1500 and $2300. Remember, > the alternative is all the troubleshooting steps I listed (and probably > other people can think of even more). > > HTH, > > SteveT > > Steve Litt > > Autumn 2023 featured book: Rapid Learning for the 21st Century > http://www.troubleshooters.com/rl21 > _______________________________________________ > Discuss mailing list > Discuss at lists.blu.org > http://lists.blu.org/mailman/listinfo/discuss -- Jim KR
- References:
- [Discuss] ATA Access Errors For Spinning Disk
- From: jbk at kjkelra.com (jbk)
- [Discuss] ATA Access Errors For Spinning Disk
- From: slitt at troubleshooters.com (Steve Litt)
- [Discuss] ATA Access Errors For Spinning Disk
- Prev by Date: [Discuss] Network Solutions E-Mail
- Next by Date: [Discuss] Network Solutions E-Mail
- Previous by thread: [Discuss] ATA Access Errors For Spinning Disk
- Next by thread: [Discuss] ATA Access Errors For Spinning Disk
- Index(es):