BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] ATA Access Errors For Spinning Disk
- Subject: [Discuss] ATA Access Errors For Spinning Disk
- From: slitt at troubleshooters.com (Steve Litt)
- Date: Sun, 17 Dec 2023 21:05:39 -0500
- In-reply-to: <e9882f74-8f25-407a-8efa-bf9554515e32@kjkelra.com>
- References: <e9882f74-8f25-407a-8efa-bf9554515e32@kjkelra.com>
jbk said on Sun, 17 Dec 2023 10:13:36 -0500 >I periodically get access errors for a specific spinning >disk that I have done these things to diagnose: >Changed Sata Cable >Switched Sata bus on MB >Run E2fsck on the 3 formatted ext4 partitions w/ no errors found >Run smartctl -a: all results within norms >Run smartctl -t short: No errors found > >Disk operation age is about 7.5 years with around a couple >hundred starts. It has been in continuous operation for over >8 years except during vacations. On occasion the disk >partitions will become unmounted and a mount -a will remount >the partitions as a different device from lets say sda to >sdd. I've not lost any data and I do regular backups to >another device that's rotated out of system. > >I seem to have always had these errors present on this MB >that is maybe 4 or 5 years in operation. Any thoughts on the >cause of this issue? Do others see this behavior on occasion >on systems they manage? > >On this same system my Rocky OS on an SSD is showing no >issues at all. Same operation age as the spinner. I really like the troubleshooting strategy you've pursued in trying to find the root cause of this intermittent problem. As we all know, intermittents are much more difficult to diagnose than reproducible symptoms. If you look at the Universal Troubleshooting Process (UTP) on Troubleshooters.Com, you'll see that UTP step 5, Corrective Maintenance, is extremely powerful and necessary with intermittent problems. I have some suggestions for Corrective Maintenance and further diagnostic tests... * You get occasional disk errors, any of which could cause data corruption. To prevent things from getting worse, boot a rescue distro and ddrescue your current disk to a larger disk, and if you ever mount that backup disk, mount it read-only. * Lubricate all electronic contacts for all cables, daughter cards, RAM sticks, switches with associated cables, and jacks and plugs for all peripherals. Apply the lubricant to conductive surfaces on both plug (male) and jack (female), then insert and remove twenty times to bust off all corrosion. Please take 10 minutes to read this 20 year old discussion of electronic lubrication: http://troubleshooters.com/tpromag/200310/200310.htm I've used transmission fluid, WD-40, Lube-Job electronics lubricant, Breakthrough CLP, WD-40, Deoxit Gold, Superslick Slick Stuff, and CRC QD Contact Cleaner, and was very satisfied with all of them. I currently use mostly Superslick Slick Stuff. The important thing is that there's residual lubrication to prevent build-up of Fretting Corrosion. Stabilant 22 and Deoxit Gold are the safest to prevent damage to non-metals and prevent conduction between non-mating surfaces, but they're pretty expensive. My experience has been that as long as I carefully limit application to the mating conductors. Lubricating all mating electronic contacts takes 2 or 3 hours, but doing so can save you weeks of frustration if an intermittent is being caused by fretting corrosion between electronic contacts. I do complete electronic contact during the initial build of all my computers. Because you've observed this intermittent since you bought the mobo several years ago, lubricating the RAM stick contacts is especially important, as it's likely those sticks have been in place since you bought the mobo. * Run a complete RAM test overnight by booting a memtest86 CD or thumb drive. Get rid of any sticks with errors. Intermittents are too expensive to try to limp along with RAM errors. Note that if you're not using UEFI, you'll need an older version of memtest86. * Temporarily swap in a known good power supply, use for several days, and see whether the problem has gone away. If so, use the known good power supply or a known good newly purchased power supply. If the problem persists, put back the original power supply at the conclusion of troubleshooting. * Power switches and reboot switches can go intermittent and cause hangs and spontaneous reboots. If I have suspicions of these things, I disconnect the reboot switch (you can always unplug the computer for an abrupt shutdown), and temporarily disconnect the power switch, starting and stopping the computer by CAREFULLY shorting the power switch pins with a screwdriver. I then run the machine for about 3 days to see if the problem really went away. If the problem appears to be the power switch, I replace it with a cheap, wired, no light, 2 contact doorbell switch, available at home warehouse stores. If you can't find it there (most doorbell switches are now lighted), I'm pretty sure that this is what you need: https://www.ebay.com/itm/155929670486 . You might need extra wire so your front panel can be removed enough to service the front parts without needing to disconnect the power button leads and fish them around the motherboard and through the chassis. * If you're overclocked, roll it back to the non-overclocked frequencies. Often simply telling the BIOS to reset to its factory state is a great way to rule out a whole bunch of BIOS caused problems. As always, test for several days to make sure the intermittent symptom really went away. * Use various sensor programs to check various CPU temperatures and disk temperatures. If temperatures even begin to approach maximum specs, take * Try to observe whether this intermittent symptom occurs significantly more when running a specific set of software, and act accordingly. * Boot a radically different distro, use for several days, and see if the intermittent symptom still occurs. If so, you've for the most part ruled out your distro, software, and config settings. If not, investigate your software and configs. * If none of the preceding works, you need to consider how much time, money and energy you're willing to throw at this intermittent problem. If you have a known good spinning rust hard disk bigger than the current one, you could ddrescue the current one onto the new, bigger one, test for a few days, and if the symptom doesn't recur, the hard disk had a problem not detected by smartctl. * If none of the preceding works, you need to consider how much time, money and energy you're willing to throw at this intermittent problem. Personally, at this point, I'd byte the bullet and buy a new motherboard, ram and processor and processor heat sink. Be sure to use high quality thermal heat sink compound between processor and heat sink, be sure to remove any labels the manufacturer stupidly put on the processor where it should be mating with the heat sink, and clean all label adhesive residue before applying heat sink compound. Don't cheap out on the heat sink: A lot of times the heat sink packaged with the processor is great for email and light web browsing, but allows overheat in intense operations like compiling a kernel. Remember, you want this new setup to last for many years. * If you're going to buy a new mobo, CPU and RAM anyway, it costs you nothing to take the very risky step of updating your BIOS. Who knows, it might work. Because of risks involved in BIOS updates, I don't recommend them except in cases where your symptom is a well known effect of your specific BIOS version, or else when you're about to throw the mobo in the trash anyway. Be sure to run the computer on a known good uninterruptable power supply when updating your BIOS so your electric company's problems don't brick your computer. I'm very aware of the time and energy the preceding steps require. Your computer is now 8 years old and probably anemic by today's standards. If your current computer has enough capability for your needs, you could probably buy a whole new computer of equal capability for under $700. If you want to replace it with a modern computer with huge capacity, you can probably do it for between $1500 and $2300. Remember, the alternative is all the troubleshooting steps I listed (and probably other people can think of even more). HTH, SteveT Steve Litt Autumn 2023 featured book: Rapid Learning for the 21st Century http://www.troubleshooters.com/rl21
- Follow-Ups:
- [Discuss] ATA Access Errors For Spinning Disk
- From: jbk at kjkelra.com (jbk)
- [Discuss] ATA Access Errors For Spinning Disk
- From: jbk at kjkelra.com (jbk)
- [Discuss] ATA Access Errors For Spinning Disk
- References:
- [Discuss] ATA Access Errors For Spinning Disk
- From: jbk at kjkelra.com (jbk)
- [Discuss] ATA Access Errors For Spinning Disk
- Prev by Date: [Discuss] ATA Access Errors For Spinning Disk
- Next by Date: [Discuss] Network Solutions E-Mail
- Previous by thread: [Discuss] ATA Access Errors For Spinning Disk
- Next by thread: [Discuss] ATA Access Errors For Spinning Disk
- Index(es):