BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Dev Ops - architecture (local not cloud)

Subject: [Discuss] Dev Ops - architecture (local not cloud)
From: blu at nedharvey.com (Edward Ned Harvey (blu))
Date: Sun, 8 Dec 2013 18:18:35 +0000
In-reply-to: <alpine.LRH.2.03.1312080715580.24449@nber.org>
References: <CANaytcfWMyebouGy9FvM2CX18WPi8Ezog6S_99PjJq1AGx7+Rg@mail.gmail.com> <5ef1d6ff85424465a78dbc0d8caed681@CO2PR04MB684.namprd04.prod.outlook.com> <alpine.LRH.2.03.1312080715580.24449@nber.org>

> From: Daniel Feenberg [mailto:feenberg at nber.org]
> 
> On Sun, 8 Dec 2013, Edward Ned Harvey (blu) wrote:
> > A single disk performance is about 1Gbit.  So you need to make your
> > storage network something much faster.  The next logical step up would
> > be 10Gb ether, but in terms of bang for buck, you get a LOT more if you
> > go to Infiniband or Fibrechannel instead.
> 
> How do you then share the disk among multiple machines? We went to 10GB
> ethernet so that multiple computers could access the same file system on a
> NAS box. With a Fibrechannel SAN, I couldn't figure out how to share the
> file system, except to have one of the SAN clients be an NFS server, which
> means we'd need 10GBE to get the good performance anyway. Was I wrong?

It sounds like what you're looking for is a unified namespace to access a distributed filesystem amongst multiple machines.  There are several ways to do that...  The NFS solution is relatively easy to deploy and tune, but as described, has the disadvantage of performance overhead.  You might be able to enhance that performance with 10G ether, in LACP bonding modes, or with multiple interfaces having different subnets, so that each client gets some dedicated bandwidth, etc.  Another way would be as you described, fiber channel or infiniband, which present block devices to the client machines.  If you do that, you'll need to use a clustering filesystem.  (You cannot, for example, share an ext4 volume on a single block device amongst multiple simultaneous clients.)  The precise selection of *which* clustering filesystem depends on the clients involved...  Given that you're currently using NFS, I'm assuming the clients are linux, so I would guess that GFS would be your choice.  In a clustered filesystem, you lose some performance due to synchronization overhead, but in comparison with ethernet & NFS, I think you'll find that it works out to your advantage on the whole.  I am personally biased toward Infiniband over Fibre Channel, but I don't have any solid metrics to back that up.  

Most likely, considering the present state of the world with regards to multipath and SAS, there's probably some way to connect SAS directly, using SAS bus, which is probably considerably cheaper and faster than ether, probably comparable to FC and IB, and probably cheaper.  I think it's worth exploring.

You can actually do the same thing over ethernet, but you should expect ethernet will have substantially more overhead caused by the way they do switching, signaling, and DMA.  So the ethernet performance would likely be several times lower.

Another way would be to use ceph or gluster.  I think, by default, these are meant to be distributed redundant filesystems, for high availability rather than high performance, so in the default configurations, I would expect performance to be worse than NFS.  But BLU a few months ago had a talk given by ... I forget his name, a redhat engineer developer for gluster, who confirmed that you have the ability to tune in such a way that the unified namespace still exists across all the machines, even while you tune the filesystem to access local disk by default and without redundant copies, for maximum performance, maximum distribution.

> We also have a strong need for a very fast /tmp local to each machine. I
> put 2 new Samsung SSD drives in a RAID 0, but for long sequential data
> (our situation) the performance was similar to local 7,200 RPM drives.
> They were attached to the motherboard SATA ports - would a RAID
> controller
> make any difference? Would more drives make a difference? Would SAS
> make a
> difference? The NAS box is much faster, but I don't want to overload the
> network with all the /tmp traffic..

Correct.  For sustained throughput, SSD's are comparable to HDD's.  Also, the number of RPM's makes little to no difference.  You would think higher RPM's would mean higher throughput, because you get more surface to pass under the head per second, but in reality, the bandwidth is limited by the frequency response of the head.  Which is around 1Gbit regardless of the rpm's.  The RPM's help reduce the rotational latency, but the rotational latency is already down to approx 0.1ms, which is very small compared to the head seek.

The SSD performance advantage comes by eliminating the need for head seek.  In fact, if you have a very small data set, if you're able to "short stroke" HDD's, then you can get HDD performance very comparable to SSD's.  Generally speaking, that's not very realistic.  But in some cases, possible.

If you want high performance sequential throughput, I would recommend raid-5 or similar.  If you want high performance sequential throughput, and also high performance random IO, I would recommend raid-10 or similar.  In your case with SSD's, it won't make much difference if you have a RAID card, or if you use soft raid in the OS.  In fact, for sequential IO, it won't make much difference even for HDD's.  But if you have random IO on HDD's, then the raid card makes a difference.  But *more* importantly, if you have a COW filesystem (btrfs or zfs) then you get the best performance (better than raid card) if you have JBOD, and use soft raid.

References:
- [Discuss] Dev Ops - architecture (local not cloud)
  - From: greg at freephile.com (Greg Rundlett (freephile))
- [Discuss] Dev Ops - architecture (local not cloud)
  - From: blu at nedharvey.com (Edward Ned Harvey (blu))
- [Discuss] Dev Ops - architecture (local not cloud)
  - From: feenberg at nber.org (Daniel Feenberg)

Prev by Date: [Discuss] Dev Ops - architecture (local not cloud)
Next by Date: [Discuss] Dev Ops - architecture (local not cloud)
Previous by thread: [Discuss] Dev Ops - architecture (local not cloud)
Next by thread: [Discuss] Dev Ops - architecture (local not cloud)
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org