[HH] Your chance to determine the future of HPC...

Tue Jul 15 17:00:44 EDT 2014

Not completely surprising.  IMO the issue is that building large scale systems like this are starting to hit their limit of usefulness vs. cost.  Building even a modest 5000-core cluster is millions of dollars up capital expenses just to get the hardware, then you have the massive ongoing operating expenses for the power/cooling/floor space/system administrators.  When the cluster isn’t at 100% utilization, you’re wasting money.  When it’s at 100% capacity, users are queued waiting for resources and users *never* like waiting.  The research I did for a former employer showed that a compute node by itself was only about 1/4-1/3 of the TCO over its lifetime (4-5 years) while the really useful lifetime of that hardware is about 2-3 years - the logical extension of Moore’s law is still in place and systems purchased 36 months ago have fewer cores using the same number of watts than what is available now.

What you’re starting to see is people just take their individual jobs to the cloud, usually AWS.  The free tier gets you 750 CPU-hours per month which can be split however you want, so 100 cores for 7.5 hours or 1 core for 750 hours.  For jobs that are CPU-intensive but rarely run this winds up being a lot easier for the end user than waiting for a cluster and compete for resources with others.

Those with a bit of budget will just go with AWS for their computing needs, pay the few thousand to spin up something with StarCluster, do their processing, and then shut it down.

This is going to put a lot of pressure on the IT staff since most of their research computing group is no longer needed but instead they’ll have to spend a good deal of money to get a big fat pipe out to the Internet so their data can get out for processing.  Some organizations may choose to go the OpenStack route and build their own cloud, but the result for the user is the same.

Large scale HPC as we know it today is dying, but it’s going to be a few more years before it gets felt elsewhere.  Sure there will be three letter organizations that still have them and maybe some supercomputing facilities, but the cost to keep those operating are going to remain flat or increase while Amazon and the like can just drop a shipping container full of compute hardware wherever the pipe and electricity is cheapest.

-Mark

On Jul 14, 2014, at 3:33 PM, Kurt L Keville <kkeville at MIT.EDU> wrote:

> I don’t know if you have been reading the trades lately but HPC is in trouble. The rate of supercomputer improvement has flattened out, IBM, the 800 pound gorilla that it once was, appears to be in freefall in this space and  you can expect “negative growth” in traditional system sales for the foreseeable future…
> http://www.hpcwire.com/2014/06/23/breaking-detailed-results-top-500-fastest-supercomputers-list/
>  
> that is, unless, you want to get off your lazy duff and do something about it. The OCP HPC launch happens on the 21st at UNH…
> http://www.opencompute.org/community/events/ocp-engineering-workshop-university-of-new-hampshire-21-july
> not sure if this is quite “hacking” since this is more of an effort to deregulate some proprietary ideas into the opensource community through an economy of scale initiative rather than legitimizing off-label repurposing of COTS… but I have been to a couple of these and there are usually a lot of good, meaty, breakout sessions…
> _______________________________________________
> Hardwarehacking mailing list
> Hardwarehacking at blu.org
> http://lists.blu.org/mailman/listinfo/hardwarehacking

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.blu.org/pipermail/hardwarehacking/attachments/20140715/11022e23/attachment.html>