[Discuss] Help with I/O errors on RAID array?
markw at mohawksoft.com
markw at mohawksoft.com
Mon Oct 13 15:07:22 EDT 2025
Try journalctl | grep -i err
It should spew a lot.
> On October 13, 2025, markw at mohawksoft.com wrote:
>>Don't grep for nvm, grep for errors, i.e.
>>dmesg | grep -i err
>>grep -i err /var/log/messages
>
> $ sudo dmesg | grep -i err
> [ 0.000000] unchecked MSR access error: RDMSR from 0xc00102f1 at rIP:
> 0xffffffffa01d3ba3 (mce_setup+0x153/0x190)
> [ 1.916195] ACPI: Using IOAPIC for interrupt routing
> [ 2.001896] ACPI: PCI: Interrupt link LNKA configured for IRQ 0
> [ 2.001983] ACPI: PCI: Interrupt link LNKB configured for IRQ 0
> [ 2.002061] ACPI: PCI: Interrupt link LNKC configured for IRQ 0
> [ 2.002154] ACPI: PCI: Interrupt link LNKD configured for IRQ 0
> [ 2.002239] ACPI: PCI: Interrupt link LNKE configured for IRQ 0
> [ 2.002310] ACPI: PCI: Interrupt link LNKF configured for IRQ 0
> [ 2.002380] ACPI: PCI: Interrupt link LNKG configured for IRQ 0
> [ 2.002451] ACPI: PCI: Interrupt link LNKH configured for IRQ 0
> [ 2.095564] AMD-Vi: Interrupt remapping enabled
> [ 2.108016] pcieport 0000:00:01.2: DPC: error containment capabilities:
> Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
> [ 2.108924] pcieport 0000:20:03.1: DPC: error containment capabilities:
> Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
> [ 2.109806] pcieport 0000:40:01.1: DPC: error containment capabilities:
> Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
> [ 2.973938] acpi_cpufreq: overriding BIOS provided _PSD data
> [ 3.366715] igb 0000:44:00.0: Using MSI-X interrupts. 2 rx queue(s), 2
> tx queue(s)
> [ 19.010884] EXT4-fs (md1p1): re-mounted. Opts: errors=remount-ro. Quota
> mode: none.
> [ 29.632127] Bluetooth: hci0: FW download error recovery failed (-110)
>
> $ grep err /var/log/syslog
>
> Nothing suspicious in this output either.
>
> Dan
>
>
>>
>>If an operation reports an error, it should be in these logs.
>>
>>> On October 13, 2025, markw at mohawksoft.com wrote:
>>>>Look at /var/log/messages and/or run "dmesg" and look for I/O errors.
>>>> It
>>>>should show you which drive is failing.
>>>
>>> Thanks. dmesg hasn't provided any information about any disk errors so
>>> far:
>>>
>>> $ sudo dmesg|grep nvm
>>> [ 3.279193] nvme nvme0: pci function 0000:01:00.0
>>> [ 3.280193] nvme nvme1: pci function 0000:43:00.0
>>> [ 3.286725] nvme nvme1: missing or invalid SUBNQN field.
>>> [ 3.286725] nvme nvme0: missing or invalid SUBNQN field.
>>> [ 3.287359] nvme nvme0: Shutdown timeout set to 8 seconds
>>> [ 3.287821] nvme nvme1: Shutdown timeout set to 8 seconds
>>> [ 3.316978] nvme nvme0: 32/0/0 default/read/poll queues
>>> [ 3.317747] nvme nvme1: 32/0/0 default/read/poll queues
>>> [ 3.324548] nvme0n1: p1
>>> [ 3.326850] nvme1n1: p1
>>>
>>> $ sudo dmesg|grep -w md1
>>> [ 4.207982] md/raid1:md1: active with 2 out of 2 mirrors
>>> [ 4.223497] md1: detected capacity change from 0 to 3906762752
>>> [ 4.224742] md1: p1
>>>
>>> /var/log/syslog has no errors either, except for the usual "I don't
>>> understand the SMART code you just sent me" errors that happen on
>>> every boot:
>>>
>>> Oct 12 11:57:55 myhost smartd[2197]: Device: /dev/nvme0, number of
>>> Error
>>> Log entries increased from 176 to 179
>>> Oct 12 11:57:55 myhost smartd[2197]: Device: /dev/nvme1, number of
>>> Error
>>> Log entries increased from 143 to 146
>>>
>>> Dan
>>>
>>
>
More information about the Discuss
mailing list