BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Help with I/O errors on RAID array?

Subject: [Discuss] Help with I/O errors on RAID array?
From: dbarrett at blazemonger.com (Daniel Barrett)
Date: Mon, 13 Oct 2025 14:43:12 -0400
References: <mailman.1.1749052801.15856.discuss@lists.blu.org> <26860.17140.637176.437320@gargle.gargle.HOWL> <c39dd97e417baa7d7d47cd7695aebe63.squirrel@mail.mohawksoft.com> <26861.17148.608254.92623@gargle.gargle.HOWL> <601f94869958cda82910f884752d6fca.squirrel@mail.mohawksoft.com>

On October 13, 2025, markw at mohawksoft.com wrote:
>Don't grep for nvm, grep for errors, i.e.
>dmesg | grep -i err
>grep -i err /var/log/messages

$ sudo dmesg | grep -i err
[    0.000000] unchecked MSR access error: RDMSR from 0xc00102f1 at rIP: 
0xffffffffa01d3ba3 (mce_setup+0x153/0x190)
[    1.916195] ACPI: Using IOAPIC for interrupt routing
[    2.001896] ACPI: PCI: Interrupt link LNKA configured for IRQ 0
[    2.001983] ACPI: PCI: Interrupt link LNKB configured for IRQ 0
[    2.002061] ACPI: PCI: Interrupt link LNKC configured for IRQ 0
[    2.002154] ACPI: PCI: Interrupt link LNKD configured for IRQ 0
[    2.002239] ACPI: PCI: Interrupt link LNKE configured for IRQ 0
[    2.002310] ACPI: PCI: Interrupt link LNKF configured for IRQ 0
[    2.002380] ACPI: PCI: Interrupt link LNKG configured for IRQ 0
[    2.002451] ACPI: PCI: Interrupt link LNKH configured for IRQ 0
[    2.095564] AMD-Vi: Interrupt remapping enabled
[    2.108016] pcieport 0000:00:01.2: DPC: error containment capabilities: 
Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[    2.108924] pcieport 0000:20:03.1: DPC: error containment capabilities: 
Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[    2.109806] pcieport 0000:40:01.1: DPC: error containment capabilities: 
Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 6, DL_ActiveErr+
[    2.973938] acpi_cpufreq: overriding BIOS provided _PSD data
[    3.366715] igb 0000:44:00.0: Using MSI-X interrupts. 2 rx queue(s), 2 tx 
queue(s)
[   19.010884] EXT4-fs (md1p1): re-mounted. Opts: errors=remount-ro. Quota 
mode: none.
[   29.632127] Bluetooth: hci0: FW download error recovery failed (-110)

$ grep err /var/log/syslog

Nothing suspicious in this output either.

Dan

>
>If an operation reports an error, it should be in these logs.
>
>> On October 13, 2025, markw at mohawksoft.com wrote:
>>>Look at /var/log/messages and/or run "dmesg" and look for I/O errors. It
>>>should show you which drive is failing.
>>
>> Thanks. dmesg hasn't provided any information about any disk errors so
>> far:
>>
>>   $ sudo dmesg|grep nvm
>>   [    3.279193] nvme nvme0: pci function 0000:01:00.0
>>   [    3.280193] nvme nvme1: pci function 0000:43:00.0
>>   [    3.286725] nvme nvme1: missing or invalid SUBNQN field.
>>   [    3.286725] nvme nvme0: missing or invalid SUBNQN field.
>>   [    3.287359] nvme nvme0: Shutdown timeout set to 8 seconds
>>   [    3.287821] nvme nvme1: Shutdown timeout set to 8 seconds
>>   [    3.316978] nvme nvme0: 32/0/0 default/read/poll queues
>>   [    3.317747] nvme nvme1: 32/0/0 default/read/poll queues
>>   [    3.324548]  nvme0n1: p1
>>   [    3.326850]  nvme1n1: p1
>>
>>   $ sudo dmesg|grep -w md1
>>   [    4.207982] md/raid1:md1: active with 2 out of 2 mirrors
>>   [    4.223497] md1: detected capacity change from 0 to 3906762752
>>   [    4.224742]  md1: p1
>>
>> /var/log/syslog has no errors either, except for the usual "I don't
>> understand the SMART code you just sent me" errors that happen on
>> every boot:
>>
>>   Oct 12 11:57:55 myhost smartd[2197]: Device: /dev/nvme0, number of Error
>> Log entries increased from 176 to 179
>>   Oct 12 11:57:55 myhost smartd[2197]: Device: /dev/nvme1, number of Error
>> Log entries increased from 143 to 146
>>
>> Dan
>>
>

Follow-Ups:
- [Discuss] Help with I/O errors on RAID array?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

References:
- [Discuss] Help with I/O errors on RAID array?
  - From: dbarrett at blazemonger.com (Daniel Barrett)
- [Discuss] Help with I/O errors on RAID array?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)
- [Discuss] Help with I/O errors on RAID array?
  - From: dbarrett at blazemonger.com (Daniel Barrett)
- [Discuss] Help with I/O errors on RAID array?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

Prev by Date: [Discuss] Help with I/O errors on RAID array?
Next by Date: [Discuss] Help with I/O errors on RAID array?
Previous by thread: [Discuss] Help with I/O errors on RAID array?
Next by thread: [Discuss] Help with I/O errors on RAID array?
Index(es):
- Date
- Thread