Getting the most out of your cluster is always important. But how exactly is that done? Do you really need to dissect your code and analyze every instruction to get optimal performance? Do you need to build custom kernels? Not necessarily. By testing some basic assumptions, you may be able to eke ten-node performance out of an eight-node cluster. Here’s how.
Cluster computing is great, or so it’s said. Cobble together a few thousand commodity servers, wire the machines together with Ethernet, grab some freely-available software, and with comparatively little expense, you can assemble a machine capable of calculating the meaning of life, the universe, and everything.
Or choose a problem that remains unsolved. According to the Hitchhiker’s Guide to the Galaxy, the computer Deep Thought has already solved the meaning of life, the universe, and everything, taking only a scant seven and a half million years to do so. Next problem, please.
But what if someone had optimized Deep Thought by 20 percent? Besides sparing oodles of generations a lot of waiting, what other gnarly questions could have been solved in those extra 1.5 million years? How about, “Why are hot dogs sold in packs of six, while hot dog buns are sold in packs of eight?”
And what about your cluster? Do precious processor cycles lay undiscovered in your system? Even if you’re not solving intergalactic quandaries or designing planetary-scale supercomputers, the right cluster optimizations can improve jobs of all sizes.
Follow the Money
Optimizations are like a journey. So, let’s start where all journeys begin: getting money.
If your budget was unlimited, there’d be little reason to optimize. With wads of cash, you could just throw money at performance problems, buying mass quantities of the fastest performing hardware man has ever known. However, since most researchers have a budget, constraints apply. Optimization, therefore, is an ongoing attempt to extract the biggest “bang for your buck,” because you only have so many bucks.
In January 2005, ClusterWorld Magazine began a series of articles entitled “The Value Cluster” (http://www.clusterworld.com/value_cluster.shtml) about the design, construction, and operation of an eight-node cluster built from commodity parts for less than $2,500.[ You can also read about" Extreme Linux" columnist Forrest Hoffman’s own low-cost cluster at http://www.linuxdls.com/trails/low-cost/.] The three-part” Value Cluster” series ended with the cluster running some tools, such as LAM/MPI, the Sun Grid Engine (SGE), and some benchmarks.
While some may scoff at the thought of building an entire cluster for less than the cost of a single, “real cluster” node, much can be learned (and computed) on such a minute cluster. (All hail Moore’s Law.) Indeed, the optimization problems found in a diminutive cluster are identical to those found in the largest clusters.
The Value Cluster
Of course, before you can do anything with a cluster, you need to name it. After the system was built (see Figure One), it evoked an old science fiction movie called KRONOS (http://www.bmoviecentral.com/bmoviecentral/reviews/kronos.html). While Hollywood’s KRONOS robot came from outer space and destroyed things, the Kronos cluster came in separate parts delivered by brown trucks.