BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] Help with I/O errors on RAID array?
- Subject: [Discuss] Help with I/O errors on RAID array?
- From: bogstad at pobox.com (Bill Bogstad)
- Date: Tue, 14 Oct 2025 18:34:24 -0400
- In-reply-to: <26860.17140.637176.437320@gargle.gargle.HOWL>
- References: <mailman.1.1749052801.15856.discuss@lists.blu.org> <26860.17140.637176.437320@gargle.gargle.HOWL>
On Sun, Oct 12, 2025 at 8:08?PM Daniel Barrett <dbarrett at blazemonger.com> wrote: > > Hi folks. A few files on my Ubuntu RAID-1 partition (two M.2 SSDs) are > producing input/output errors when read, like this: > > $ cat myfile > cat: myfile: Input/output error > > I can't seem to figure out what to fix. I've run fsck, badblocks, > smartctl, and mdadm, as shown below, and none of them reports any > errors. I'd appreciate any advice, especially (1) what do to next, and > (2) how to figure out which (if any) of the two SSDs is faulty. It's been quite a while since I dealt with this kind of thing so I don't remember specific commands, but these ideas might help. 1. There is a "try really hard" copy command. It will copy a file in large chunks until it gets an error and then use seeks and smaller block size reads to recover as much of the file as possible. It will also retry the bad single block reads multiple times in case it is a temporary problem. (More likely to work for magnetic media.) Since this is an SSD retrying is unlikely to cause any further damage to the drive. Generally I would do this first to recover as much data from the problematic file(s) before making changes. 2. Back up whatever is important on the drive. Since it passes fsck, the metadata blocks are okay; so you can easily mv problematic files to a directory at the root of the filesystem, then ignore that directory while backing up the rest of the disk. You might even consider stopping at this point and just leaving the bad file in its new location and just ignore it. The file you show is under 300 Mbytes so it's not that much of a loss. 3. Most filesystems have a command which will let you determine what blocks on the disk are allocated to a specific file. You can then overwrite the raw blocks on the disk which are allocated to the file which might clear the error. This can be dones with dd and judicious use of the "seek, bs, and count" options on writes to the whole partition. There is probably a specialized tool to do this as well, but I don't remember. In step #1, you will have already copied all the good data out. The same command from step #1 will probably give you the block #s in the file, so you might be able to just overwrite just the bad blocks. Obviously MD (or LVM) complicates this and as suggested by others you will probably need to break the mirror and do each separately. Booting from a rescue USB drive is a good idea in general and required? if we are talking about the root of the whole filesystem tree. I've never wrapped my head around exactly how NVME SSDs work, but they seem to be way more complicated then the essentially magnetic drive emulation that SATA SSDs use. You might get some ideas by digging into NVME specs and specialized tools. There is something called "nvme-cli" that might be helpful. Good luck, Bill Bogstad
- Follow-Ups:
- [Discuss] Help with I/O errors on RAID array?
- From: dsr at randomstring.org (Dan Ritter)
- [Discuss] Help with I/O errors on RAID array?
- From: ron at bclug.ca (Ron)
- [Discuss] Help with I/O errors on RAID array?
- References:
- [Discuss] Help with I/O errors on RAID array?
- From: dbarrett at blazemonger.com (Daniel Barrett)
- [Discuss] Help with I/O errors on RAID array?
- Prev by Date: [Discuss] Boston Linux VIRTUAL Meeting reminder, tomorrow, Wednesday, October 15, 2025 - Jerry's Journey to Viet Nam
- Next by Date: [Discuss] Help with I/O errors on RAID array?
- Previous by thread: [Discuss] Help with I/O errors on RAID array?
- Next by thread: [Discuss] Help with I/O errors on RAID array?
- Index(es):
