[Discuss] btrfs tip

Sun Jul 4 16:58:21 EDT 2021

On 7/4/2021 12:46 PM, Ian Kelling wrote:
> 
> Matthew Gillen <me at mattgillen.net> writes:
> 
>> Came back from vacation to a dead disk in my main linux server.  Disk
>> had been dying for a while, and we had a power outage at some point
>> while I was gone which must have finished it off.  Kernel had been stuck
>> trying to initialize the disks on boot for 3 days.
>>
>> Protip: if you use btrfs as your root filesystem, and use a
>> RAID1/mirroring configuration, then add the 'degraded' flag to your grub
>> kernel command line.  That will allow it to mount (and boot) even if a
>> disk dies.  Don't know why this isn't the default...
> 
> Will you know if a disk dies in that case? A major problem with linux
> filesystems I've had is that they just try to keep working without any
> notification of things going wrong until they simply don't work, as if
> everyone has some dmesg parsing daemon that magically notifies them of
> anything important, but no one does. If that is the case, then it makes
> sense for it to not be the default.

Good point, but it isn't really different than if the disk dies while
the system is up. Not certain what happens when it dies online, but I
suspect you'd need some cron job (or logcheck) checking with a
notification mechanism.  The linux software RAID mechanism has it's own
e-mailing notification system that doesn't rely on watching syslog or
polling /proc/mdstat, I don't think btrfs has something like that yet.
(there are some nice cron-friendly scripts here
  https://github.com/kdave/btrfsmaintenance
but they don't seem to cover the case of monitoring mirrors)  I can kind
of understand why, RAID-1/mirroring is only one of many things you can
do with btrfs, whereas the software-RAID system has one thing that it does.

The problem is that unless you have hot-swap HD chassis in your server,
you /have/ to bring the system down to replace the disk, and you can't
do the 'btrfs replace' operation until you have the new disk online.  So
at some point you're going to need to mount with the 'degraded' option
in order to fix it.

A cron job to check status could look something like
  btrfs filesystem show | grep missing

---

Further investigation turned up this thread that explains why the
default is to not allow degraded mounts:
https://www.reddit.com/r/btrfs/comments/kguqsg/degraded_boot_with_systemd/

The key comment is by 'cmmurf' and the issue is about timing of device
discovery (the current age of multiple hot-plug interfaces makes this
harder than it used to be).  I'm not sure how to interpret his comment
though, it sounds as if 'degraded' at boot time always runs the risk of
doing the 'split brain' thing, which seems unfortunate; it would mean
that the only safe way to replace a drive on a btrfs root filesystem
involves booting from a rescue environment.

I would have to understand how udevadm works in
/usr/lib/udev/rules.d/64-btrfs.rules to say for sure if the 'split
brain' is a problem with the udev rules as they are, but I am now less
certain that 'degraded' is a good thing to put in your default grub
linux command lines.  'cmmurf' did explain why my system was happily
waiting for three days for the disk to become ready though... (those
very udev rules that prevent problems for btrfs)

Matt