Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month, online, via Jitsi Meet.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Help with I/O errors on RAID array?



> On October 13, 2025, markw at mohawksoft.com wrote:
>>Try journalctl | grep -i err
>>It should spew a lot.
>
> Thanks! Among the 3930 lines of output, I did find one relevant error:
>
>   kernel: Buffer I/O error on dev md1p1, logical block 4199909, async page
> read

Yes,that looks like the error. You say when you break the mirror, do both
disks have the same error in the same place?


Try this on each drive:

dd if=/dev/DRIVE of=/dev/null bs=4096

Let it run through the drive and see what happens.

If this is an SSD, you may be mostly ok. SSDs have much lower write life
than spinning disks, however good quality SSDs have a lot of extra blocks
that can get remapped.

I don't remember the process, but it is youtube-able. Find the blocks your
file is on, and write a quick program to write to these blocks. If the
have a heavy wear count, the drive will remap the block.

If you don't mind nuking the contents of the drive, write a program that
reads a number of blocks. If it reads without error, go to the next. If it
gets an error, write over it and that should clear it.


>
> It's from a week ago and appears twice (well, two sets of 4 identical
> messages). Does this point to a culprit?
>
> Also FYI, I ran the SMART long test on both SSDs:
>
>   $ sudo nvme device-self-test /dev/nvme0 -n 1 -s 2
>   $ sudo nvme device-self-test /dev/nvme1 -n 1 -s 2
>
> I believe they both passed (based on "Operation Result" = 0):
>
>   $ sudo nvme self-test-log /dev/nvme0n1
>   Device Self Test Log for NVME device:nvme0n1
>   [...]
>   Self Test Result[1]:
>     Operation Result             : 0
>     Self Test Code               : 1
>     Valid Diagnostic Information : 0
>     Power on hours (POH)         : 0x1c99
>     Vendor Specific              : 0 0
>
>   $ sudo nvme self-test-log /dev/nvme1n1
>   Device Self Test Log for NVME device:nvme1n1
>   [...]
>   Self Test Result[1]:
>     Operation Result             : 0
>     Self Test Code               : 1
>     Valid Diagnostic Information : 0
>     Power on hours (POH)         : 0x1bf6
>     Vendor Specific              : 0 0
>
> Dan
>





Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org