BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] raid issues

Subject: [Discuss] raid issues
From: jack at coats.org (Jack Coats)
Date: Thu, 19 Jun 2014 20:23:50 -0500
In-reply-to: <53A3856A.7020204@stephenadler.com>
References: <53A3856A.7020204@stephenadler.com>

Such is the problem with having any single point of failure. ... Current
technologies still have single points of failure at many places, but it can
be overcome.  Basically do a SAN with multiple connections from different
controllers.  ... For most situations, since the round brown & spinning are
the most likely items to fail, that is all we do redundantly.  The
controller failure you found is less likely, but it does happen.  So does
memory issues, motherboard/backplane failures, fan failures causing
overheating, etc, etc.

It would be possible to make multiple storage servers and raid the sets of
data across them (yes, I have actually done 2 controllers with raid on each
then mirrored the data between the controllers/raid).  It still does not
keep from needing good backups.

If your data needs are large enough, you could build such systems or go to
EMC or similar to purchase a solution.  To roll your own, you might look
into the open sourced stuff from Backblaze pods (or OpenStoragePod.org ).
 You still need to do the raid/mirroring/etc on top of the bulk data.  But
then you need to look into using redundant NICs (bonded?) to ensure your
connections don't fail.

Map it out, and see you truely have no single redundancies, and set up
monitoring / checking to ensure it all stays working.  All this goes all
the way to power supplies, UPSes, building transformers, generators, power
feeds, and do the same thing for outside network if that is a requirement
too.

I remember doing this as a part of a task force back in the dinosaur
mainframe days.  It did increase the overall reliability of the center, but
was not fast, cheap, or easy to accomplish.  The good thing is that it can
be done, planning, time, and a reasonable budget are your friends.

For true redundancy, you would also need to look into your business
requirements.  Consider HA clusters, application failover to other servers,
5-9's reliability means 30 seconds of unscheduled downtime per year for all
reasons.  It can be done, but is not easy or cheap.

On Thu, Jun 19, 2014 at 7:50 PM, Stephen Adler <adler at stephenadler.com>
wrote:

> Guys,
>
> I'm in the process of ordering a new server for my work and I want to get
> one which can handle 16 to 24 drives. But with so much disk capacity I was
> wondering what people to do make the file systems secure. In the past I've
> used either raid 5 or raid 6 arrays. The idea being that if 1 (raid 5) or 2
> (raid 6) drives died, you were OK, all you had to do was replace the drives
> and the rebuild would occur and you were fine.
>
> But then one day I had 4 drives fail at once. It wasn't the drives that
> failed, but the disk controller. I had added a 4 port SATA card to my
> server so that I could add for more drives to my server. So now with
> getting a server with such large drive capacity, I'm wondering of all this
> raid stuff just gives on the warm fuzzies, but in fact you are just as
> vulnerable since you controllers can go and knock out half your drives (or
> whatever).
>
> Any comments on how to deal with say a 16 disks and what's the current
> lore on making large redundant disk arrays?
>
> Thanks in advance!
>
> Cheers. Steve.
>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>

-- 
><> ... Jack

"Whatever you do, work at it with all your heart"... Colossians 3:23
"If you are not part of the solution, you are part of the precipitate" -
Henry J. Tillman
"Anyone who has never made a mistake, has never tried anything new." -
Albert Einstein
"You don't manage people; you manage things. You lead people." - Admiral
Grace Hopper, USN
"a nanosecond is the time it takes electrons to propigate 11.8 inches" - "
- http://youtu.be/JEpsKnWZrJ8
"Life is complex: it has a real part and an imaginary part." - Martin Terma

References:
- [Discuss] raid issues
  - From: adler at stephenadler.com (Stephen Adler)

Prev by Date: [Discuss] raid issues
Next by Date: [Discuss] raid issues
Previous by thread: [Discuss] raid issues
Next by thread: [Discuss] raid issues
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.