LVM + RAID follow up
Derek Atkins
warlord at MIT.EDU
Tue Nov 7 15:32:01 EST 2006
"Rich Braun" <richb at pioneer.ci.net> writes:
>> I use RAID-1 ... I don't use LVM on those servers;
>> I just don't see the point. It seems to add complexity to what I view
>> as little gain.
>
> To me it doesn't seem complex. It's a proven, reliable technology dating back
> at least to the AIX O/S I was using back in '91. The point of LVM, when
> combined with RAID, is that you can hot-swap new hardware in place, sync up
> the new drive, and then assign additional storage to your existing
> filesystems. It provides non-stop computing in a commercial environment, and
> easier upgrades in a personal environment.
Perhaps I've found it complex because I tried doing things that LVM
wasn't really designed to do. I was using LVM on a VMware VM, and
then I wanted to make the disk bigger; LVM didn't like this, and I had
to work really hard to get LVM to accept the larger "physical" disk
size.
So I guess to ME it's neither proven nor reliable.. But perhaps
that's only because I didn't use it on AIX (and I found AIX extremely
confusing back in the early 90s when we got it at MIT)... Perhaps
it's my BSD-4.3 and AOS-4 and Ultrix and SunOS background?
I guess that in my case all I tend to want to do is increase
the size of filesystems when I increase the disk size. This
works great for the "last partition on the disk" without LVM...
But then again that doesn't use RAID.
>> I guess this was part of my question (and confusion).. Do I want LVM
>> over RAID or RAID over LVM? Or do I want LVM over RAID over LVM?
>
> RAID is at the bottom of the food chain. LVM lives on top of it. I suppose
> you could do it differently a la the AFS (Andrew) technology of the late 1980s
> but I don't see a benefit.
No, Andrew doesn't do what I want here -- it doesn't give me
read-write data on multiple disks all in a single filesystem. But
your explanation of how to build it later on does make sense..
>> Also, if I want to do a RAID5 for / but RAID-1 for /boot, how do I
>> want to lay that out? With RAID-5 do all my drives/partitions need to
>> be the same size like they do with RAID-1?
>
> You would create a /boot filesystem the same way you do now, small partitions
> of the same size on two of your drives. The partition size for each RAID5
> volume element should be the same size. If your drives are of different sizes
> then you should create multiple RAID5 devices (/dev/md2 etc) so as to take
> advantage of your available physical storage.
I think that makes sense... So I build the MD devices from the
physical drives/partitions and then layer LVM on top of that to build
the actual filesystem "partitions".
>> And then what's my upgrade path if I decide to swap out to larger
>> drives? Let's say that 3-5 years from now I decide to swap out my
>> 400G drives with 1TB drives -- what would be the right process to do
>> that such that my main raid partition size increases? (Then I can
>> resize my ext3 FS).
>
> Well one possibility is that you start out with eight 400 GB units and at your
> first upgrade you decide you want to buy four 1 TB units, leaving four of the
> old ones in place. Ignoring the bit of space given to /boot, let's say you
> set aside one of the 400 GB units as a spare and configure the first RAID5
> array as 7 partitions of 400Gb. You'd still have 2.4 GB available in the
> first array. You'd have four 600 GB partitions to configure in the second
> array, which provides 1.8 TB of storage (without setting aside a spare).
> Total 4.2 TB. Then a year later, with prices still coming down, you swap out
> the 400 GB drives for a set of four 1.8 TB drives. Apply the same logic, you
> get available storage of 6 times 1 TB plus 3 times .8 TB equals 7.8 TB.
I see, so I'd start with RAID(8x400GB). Then I'd convert to
RAID(8x400GB)+RAID(4x600GB) (where these two RAID arrays would
effectively get concatented by LVM into a single filesystem).
Then later I get RAID(8x400GB)+RAID(4x600GB)+RAID(4x1.4TB)?
Or would I get R(8x400)+R(8x600)+R(4x800)?
Another option is to choose some partition quanta (100GB or 200GB or..)
and then only buy drives that divide into that quanta.. and then I
just keep adding into the RAID Array.. So I could do something like:
R(16x200) -> R(28x200) -> R(56x200).. But of course this makes it
much more confusing about which drive is the "spare"..
Man, this is just confusing..
I would think that after a while you'd have a LOT of MDn devices..
and eventually you wind up in this similar situation where you're
breaking down your drives into many "small" partitions.
> If you do this without LVM, you have to save/restore your data after
> re-creating the new filesystems. With LVM, you just make the partitions and
> extend the filesystems into the new space.
Do you? If I assume that the RAID5 is all one partition, then if the
partition size increases I can just run resize2fs to increase the size
of the file system. The only question is the RAID5 parts, will it
be able to build a filesystem out of a 3x400GB + 3x800GB partition?
>> I don't trust reiserfs for a server -- it's not designed to handle
>> catastrophic failures like a power outage. Yeah, you could setup a
>> UPS with auto-shutdown, but why take the extra chance with a fragile
>> filesystem?
>
> Haven't heard Reiser described as "fragile", especially compared to the
> previous-generation Linux filesystems that I used before it came out (and
> compared to NTFS on my XP boxes), but your observation leads me to ask: what
> do you use instead? Can you point to some reading on the relative reliability
> of Reiser vs. alternatives? Thanks!
I've always used ext2/3. Resiser is really bad about making sure your
metadata is actually flushed to disk. Go ahead, go pull out the power
cord on your reiserfs-based server and see what happens! An ext3
filesystem is much more likely to survive.
> -rich
-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Discuss
mailing list