Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Rich Braun wrote: > - Two multi-core systems with two NIC cards and 4-6 SATA drives, 16-32GB of RAM. > - Xen or VirtualBox virtualization. > - Two ethernet switches, at least one of which is jumbo-frame capable (for use > as a SAN switch). > - Open-source SAN with automatic failover of all the storage contained in > these systems. > - Virtual machines capable of running on either of the two hosts. I specc-ed out something similar to that at a previous job, but never got the go-ahead to build it. (Actually, I did finally get the go-ahead, but the economy was crumbling and the company couldn't get credit approval on the equipment lease.) My idea was two identical multi-core machines, each with a lot of RAM and lots of disk capacity (at least 4 physical disks in each box, half one manufacturer, half another manufacturer), each using one ethernet port to talk to the world and each using a second gigabit ethernet port doing jumbo packets to talk to the other machine on a crossover cable (no switch for just two machines). Possibly use two gigabit links between the two machines if DRBD secondary writes prove to be a bottleneck. My intended stack was: - KVM - DRBD split between local disk as primary and remote as secondary - local disk - and - cross-over cable to other box - LVM, to let me resize and snapshot things (wish btrfs were more mature and distributed) - disks used half for local storage, half for remote storage, raid 10 on everything I don't think there was any AoE in my design once I discovered DRBD. A given virtual machine could run on either box. The local disk would be marked primary on DRBD, so reads would be fast. Writes would go to the local DRBD device and the remote DRBD device (secondary) via the cross-over cable to other box. For a conservative VM DRDB would be configured to block until both the local and remote were written, but for some VMs it might be okay to let the remote disk lag. (Faster writing, more lost data in a failure, still ability to migrate.) To migrate apparently both sides get set to primary, tell kvm to migrate, then switch the old machine back to secondary. The Supermicro boxes I had selected had dual hot-swapable power supplies, ECC RAM, hot pluggable disks. I would separate them physically as far apart as rack space permitted. My host OS install would have dual "/" partitions with grub set up to let me boot from either. A script to identify which "/" is current and a script to rsync the current "/" across to the other "/". Before any risky OS manipulations, rsync the current to the other "/", after the risky work...leave everything alone in case a few hours or days from now a problem is discovered--the other "/" is usually lagging. If each machine were capable of running everything then host OS updates can happen at the expense of not being able to migrate during the maintenance window, but nothing is required to stop. I think I would keep all the live VMs running on a single machine, leaving the other machine in warm spare mode, with plenty of CPU available, making it a place to stage changes in specific VMs. The result was very few single points of failure: - cross-over cable (breaks redundancy, but doesn't bring down services except in split-brain case if auto HA setup or admins are dim) - software bugs in host OS, configuration, custom scripts (hard to avoid in an HA installation that is tightly coupled, be careful) - power can fail (physically route dual power carefully, possibly run own local UPSes on one leg if permitted) - facility could burn down, flood, over-heat, collapse in earthquake, be robbed, vandalized, sabotaged (inside job or not) - you could be manually shutdown over unpaided bill, administrative mistake, DMCA stupidities, or court order - fratricide: one of your boxes could fail in a pyrotechnic way (put at least a few feet distance between the two) - unified administration makes a single fat finger dangerous (have defensive procedures, for example: use sudo, have other "/" not mounted or mounted readonly, have tested scripts for common operations that might be easy to do manually but also easy to get slightly wrong, have clear and precise upgrade/rollback plans and checklists before plunging in, use two sets of eyes to vet commands before "enter" is pressed, have common and emergency procedures documented and maintained, maintain documentation of your configuration's vital statistics, keep a maintenance log...etc, be as extreme as you need to be, depending on how much paranoia is warranted) For not that much money one can build a pair of boxes that are more reliable than their admins. Cheap enough that a complete second non-production staging copy is a pretty cheap way to add maybe almost another nine. (Not for sure. A complete second copy is a safe place to do destructive things...except if an admin accidentally types into the wrong window...distinct prompts sound like a good start, maybe some anal rules and mechanisms prohibiting logins to both sites at the same time.) > It would be a bit of a challenge to build this using /four/ machines (a pair > each for storage and for virtualization) but doing this on two would make it a > killer-app platform. Last I looked DRBD was commercial for more than two-nodes. But it looked like a good product and likely worth it. > I say it's "tantalizing", though, after getting various pieces work > individually but not quite integrated: AoE (ATA-over-Ethernet), OCFS2, DRBD, > VirtualBox. I had OCFS2 in my design but it was pointed out to me that I can just serve up a /dev/drbdX device directly to a VM. If only one instance of that VM is running (and primary) on that device at a time, then no OCFS2-like layer is needed. Yes, if you have an application layer that wants a shared file system with another instance, then OCFS2 is useful, but that sounds like a larger-than-one-machine cluster, whereas a set of VMs are probably smaller-than-one-machine. Even if different VMs have some files they want to share, that still sounds like a single volume for that purpose, but not everything. -kb, the Kent who would still like to build such a box.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |