BLU Discuss list archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss] raid issues
- Subject: [Discuss] raid issues
- From: jack at coats.org (Jack Coats)
- Date: Thu, 19 Jun 2014 20:23:50 -0500
- In-reply-to: <53A3856A.7020204@stephenadler.com>
- References: <53A3856A.7020204@stephenadler.com>
Such is the problem with having any single point of failure. ... Current technologies still have single points of failure at many places, but it can be overcome. Basically do a SAN with multiple connections from different controllers. ... For most situations, since the round brown & spinning are the most likely items to fail, that is all we do redundantly. The controller failure you found is less likely, but it does happen. So does memory issues, motherboard/backplane failures, fan failures causing overheating, etc, etc. It would be possible to make multiple storage servers and raid the sets of data across them (yes, I have actually done 2 controllers with raid on each then mirrored the data between the controllers/raid). It still does not keep from needing good backups. If your data needs are large enough, you could build such systems or go to EMC or similar to purchase a solution. To roll your own, you might look into the open sourced stuff from Backblaze pods (or OpenStoragePod.org ). You still need to do the raid/mirroring/etc on top of the bulk data. But then you need to look into using redundant NICs (bonded?) to ensure your connections don't fail. Map it out, and see you truely have no single redundancies, and set up monitoring / checking to ensure it all stays working. All this goes all the way to power supplies, UPSes, building transformers, generators, power feeds, and do the same thing for outside network if that is a requirement too. I remember doing this as a part of a task force back in the dinosaur mainframe days. It did increase the overall reliability of the center, but was not fast, cheap, or easy to accomplish. The good thing is that it can be done, planning, time, and a reasonable budget are your friends. For true redundancy, you would also need to look into your business requirements. Consider HA clusters, application failover to other servers, 5-9's reliability means 30 seconds of unscheduled downtime per year for all reasons. It can be done, but is not easy or cheap. On Thu, Jun 19, 2014 at 7:50 PM, Stephen Adler <adler at stephenadler.com> wrote: > Guys, > > I'm in the process of ordering a new server for my work and I want to get > one which can handle 16 to 24 drives. But with so much disk capacity I was > wondering what people to do make the file systems secure. In the past I've > used either raid 5 or raid 6 arrays. The idea being that if 1 (raid 5) or 2 > (raid 6) drives died, you were OK, all you had to do was replace the drives > and the rebuild would occur and you were fine. > > But then one day I had 4 drives fail at once. It wasn't the drives that > failed, but the disk controller. I had added a 4 port SATA card to my > server so that I could add for more drives to my server. So now with > getting a server with such large drive capacity, I'm wondering of all this > raid stuff just gives on the warm fuzzies, but in fact you are just as > vulnerable since you controllers can go and knock out half your drives (or > whatever). > > Any comments on how to deal with say a 16 disks and what's the current > lore on making large redundant disk arrays? > > Thanks in advance! > > Cheers. Steve. > > _______________________________________________ > Discuss mailing list > Discuss at blu.org > http://lists.blu.org/mailman/listinfo/discuss > -- ><> ... Jack "Whatever you do, work at it with all your heart"... Colossians 3:23 "If you are not part of the solution, you are part of the precipitate" - Henry J. Tillman "Anyone who has never made a mistake, has never tried anything new." - Albert Einstein "You don't manage people; you manage things. You lead people." - Admiral Grace Hopper, USN "a nanosecond is the time it takes electrons to propigate 11.8 inches" - " - http://youtu.be/JEpsKnWZrJ8 "Life is complex: it has a real part and an imaginary part." - Martin Terma
- References:
- [Discuss] raid issues
- From: adler at stephenadler.com (Stephen Adler)
- [Discuss] raid issues
- Prev by Date: [Discuss] raid issues
- Next by Date: [Discuss] raid issues
- Previous by thread: [Discuss] raid issues
- Next by thread: [Discuss] raid issues
- Index(es):