[Discuss] On Btrfs raid and odd-count disks
Derek Atkins
warlord at MIT.EDU
Thu Apr 11 09:28:34 EDT 2013
Richard Pieri <richard.pieri at gmail.com> writes:
> In retrospect, if you're looking at file systems as a means to prevent
> write holes with RAID 5/6 then you're going about it wrong. Write holes
> happen with every RAID level. They happen with RAID 5 and 6. They happen
> with RAID 1 and RAID 10. Do not believe anyone who says that write holes
> are unique to RAID 5/6 and their derivatives. They are mistaken. Any two
> or more storage devices in a RAID set that are not atomically locked
> together can suffer write holes. They can even happen with ZFS.
The reason I'm looking at a filesystem here is that the WAY writes occur
can affect the write-holes you get in RAID5 and RAID6. For example, ZFS
does not overwrite the existing block, it will write to a new block and
then after the write succeeds will it change the block-pointer.
> This is not a RAID issue. RAID is about making the hardware tolerant to
> faults. RAID does not care about the integrity of your data.
And *THAT* is the problem. I was fault-tolerance *AND* data integrity.
Which is why I'm looking towards ZFS and BTRFS as potential solutions
that provide it.
> Write holes happen when power to the storage devices is lost during
> write operations. UPS and redundant power are the primary ways of
> preventing write holes. If the server doesn't lose power, or it has time
> to perform a graceful shutdown when mains fail, then no holes appear in
> the data it holds.
Or power to the CPU (assuming software raid) in the middle of a write.
See above as to how ZFS works around this problem. Note, however, that
ZFS assumes that *MEMORY* is not corrupted, so you definitely need to
use ECC RAM.
> Battery-backed cache is the second line of defense against write holes.
> The battery prevents cache loss if redundant and backup power fail.
> Non-volatile cache (SSD) is becoming a popular alternative to
> battery-backed cache, although flash has its own set of power-related
> problems.
>
> The last line of defense against corruption is a good backup history.
>
> ZFS and Btrfs will detect and if possible correct single-bit errors.
> They may be able to prevent write holes if they can reliably control
> every piece of I/O cache in the data stream. This includes the write
> acceleration cache found on most modern disks' on-board controllers. Not
> all of these reliably honor cache flush instructions from the host and
> because of this they cannot be relied upon to maintain data integrity
> under power fault conditions.
When the drives lie to you it's hard to work around that, sure..
I *do* have a UPS with a good deal of uptime available, and I plan to
get a secondary power backup (which I will probably have installed
before I even get to build my new spiffy NAS), so power shouldn't be a
problem, just potential hardware faults.
-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
More information about the Discuss
mailing list