[Discuss] Redundant array of inexpensive servers: clustering?

Bill Ricker bill.n1vux at gmail.com
Mon Mar 31 15:20:23 EDT 2014


On Mon, Mar 31, 2014 at 11:03 AM, Richard Pieri <richard.pieri at gmail.com>wrote:

> Bill Ricker wrote:
>
>> I've seen a big-name commercial block-replication solution duplicate
>> trashed data to the cold spare ... wasn't pretty !
>>
>
> Another great example of how replication is not backup.


Exactly.

Extra copies of blocks in the local SAN or remote SAN don't help if App or
Block device driver  or Multipath software mangles the bits somehow prior
to all the copying.

It was actual backups, restored to a non-replicated test system, that got
those users on-line again.

(FWIW, that was not at my last shop, but a related firm running the same
application. *Our* copy of the app used transaction-replication, not block
replication, for 2nd site disaster recovery only.  HA for ours was
heartbeat-triggered restart on 2nd local node, pulling vDisks with
multipath SAN. The SAN controller served as the 3rd party to avoid split
brain; 2nd node could successfully request vDisk reassignment only if
controller recognized primary was disconnected. Had extra redundancy option
in SAN too, which might have been more trouble than it was worth. )

(Split-brain is why i've avoided remote auto-restart. If you need
distributed HA, you need to architect for hot-hot distributed
load-balancing -- not easily retrofitted to monolithic legacy apps!)

My two cents, I saw more failures from Multipath software's interaction
with other software exposing inadequately tested edge cases in the whole
stack than i saw failures averted by Multipath.

-- 
Bill
@n1vux bill.n1vux at gmail.com



More information about the Discuss mailing list