[Discuss] Distributed file systems

Rich Braun richb at pioneer.ci.net
Sun Jul 5 11:32:09 EDT 2015


> I think the leaders in this space are glusterfs, and ceph.

I set up my home email server as a pair of LXC instances on top of GlusterFS last year. After almost a year of working more-or-less OK, I ditched it for an old-school design: unison running under cron every 5 minutes.

I found 3 problems trying to make GlusterFS work: it burns nearly 1 second of time per inode on every fopen, it creates tens of thousands of sync-status files in a bushy tree under .glusterfs, and over the course of a year I've had about 3 tough-to-diagnose split-brain situations, one of which went undetected for weeks.

Documentation is scant, performance is poor for many routine operations like rsync, and monitoring tools are nonexistent. Its main benefit is relative ease of setup, and if you're a licensed RHEL user, you can get support.

I'm back to square 1 on distributed/clustering solutions. At home I have notes on moosefs, cephfs and others I've tried.

The 8000-lb gorilla is weighing in on this, with ultimate vendor lock-in: AWS is rolling out file-storage solutions that will be tempting for many enterprises, and costly to move off of once large data sets are in place. My employer is merrily going down that road, with petabytes already stored there. Their latest offering provides a mountable volume, but is missing basics like snapshots, quotas, ACLs and monitoring (and probably always will because those are user space concepts that AWS punts to the user).

-rich


More information about the Discuss mailing list