Welcome to clusterit-2.5 !

ClusterIt 2.5 has been released! Download it here.

Continued maintenance of ClusterIt is made possible by the generous support of Mach1 Computing, LLC.

This is a collection of clustering tools, to turn your ordinary everyday pile of UNIX workstations into a speedy parallel beast.

Initially this work was based on the work of IBM's PSSP, and copied heavily from the ideas there. Its also lightly based on the work pioneered in GLUnix. I've decided to simplify, and complexify it however:

Glunix is a monstrosity. It allows better control over the individual nodes, and much better load sharing. However I'm convinced alot of the speed advantages of having a parallel cluster are lost with the incredible overhead of running the glunix master and daemon services on a host. Glunix does however offer a real parallel programming environment. Something which is totally beyond the scope of this package.

PSSP is also a very powerful set of tools. Not much more than a bunch of staples written in perl, they provide an incredible tool for tying an unwieldy number of UNIX machines into one fast demon of an MPP.

The advantages of both systems are central control of a large number of machines. Unfortunately, they all have drawbacks.. as does my solution.

What my solution provides:

*Fast* parallel execution of remote commands.
C vs. Perl. You do the math.
Heterogeneous cluster makeup.
This makes it very easy to administer a large number of machines, of varying architectures, and operating systems. The fact that my tools are completely architecture independent, make it possible to dsh commands out to machines that aren't even running the same OS! This can be useful for a variety of mass administration tasks an admin may have to undertake.
Choice of authentication.
IBM forces you to use kerberos 4 for authentication on the SP's. This is actually fine for a closed environment like an SP, but for something to be run on just a stack of otherwise useful boxes, you need more freedom. This suite allows you to do whatever you like.. ssh, kerberos, .rhosts. Whatever suits your security needs best.
Sequential node, and random node execution
The idea here is that these dsh-like programs allow you to do something akin to load balanced scripting. For example one could set up an NFS shared build directory, and issue the command:
make -j4 CC="seq 'cd /usr/src/foo ; gcc'"
		

Which would execute a build in parallel, on 4 nodes in your cluster, assigning processes to each node in sequence. The run command is equivalent to saying: "I don't care where you run, just run and tell me how things turned out." Generally speaking, the run command will achieve better results as the size of the cluster increases. If you have only three nodes, the odds of getting the same node over and over are fairly good.
Job sequencing
It is possible using this package to schedule processes on the remote machines, so that no more than one process per machine is active at any one time. This was designed to combat problems with using seq for paralell builds.

When building in paralell with seq, it is possible that a node recieves a task that will take it much longer than the other nodes to complete. It is also possible that as other nodes finish thier jobs faster, the node which has been bogged down is handed another job. When performing large paralell builds, eventually very slow machines will stall the entire build, as they are attempting to compile many objects at once, and are usually at this point near-death from swapping.

The Job Scheduling in ClusterIt can prevent this in two ways. First, the job scheduling will not allow a node to process any more than one command at a time. If more commands than nodes are requested, the excess commands will block until a node has freed up. Second, the scheduler has the ability to register a benchmark number of some sort for each node. This allows the scheduler to allways give out the fastest of the remaining nodes whenever one is requested. This allows a paralell build to more efficiently utilize a heterogenous cluster.

Barrier sync for shell scripting.
This is a new idea. The barrier mechanism consists of a daemon run on a host, and a client which can be used to barrier sync with. An example of use would be:
#!/bin/sh
do something
barrier -h host -k token -s 5
do something else
	

You would then dsh the execution of this script to your hosts. The barrier makes sure that all hosts have completed the first "something" before the continue on to the next something. The -s, is the level of parallelism for the script, ie: how many processes to wait for before continuing.

Distributed Virtual Terminals

This is a parallel interactive execution environment. The user is given windows for each host in the cluster, and a central management window. Keystrokes typed on the central management window, will be relayed to all of the subordinate windows. This allows the user to vi a file on 20 machines simultaneously, for example. You can also select a window, and use it like a normal xterm, to perform actions on just that host.


What my solution does not provide:
A parallel programming API
Use MPI, or PVM, or whatever for that.. thats outside the scope of this suite.

Articles about Clusterit


ClusterIt Manpages


ClusterIt Support

ClusterIt is now hosted on Sourceforge. For general information about the ClusterIt project, go to the Sourceforge Project Page.

In addition, I am now tracking bugs through the Sourceforge Bug Tracking Page. If you have any problems with ClusterIt, I ask that you please fill out a bug request there, as it will record them and keep them from getting lost in my email.

There are some simple forums provided by Sourceforge located here.

Here are quick links to some of the useful features:


Download ClusterIt

Clusterit CVSWEB

ClusterIt releases are still provided on sourceforge. Copies of the files are still available on my server, to avoid breaking any links, but I ask that you please use the links below in the future.

The newest release of Clusterit can allways be found at the ClusterIt Download page hosted at SourceForge

ClusterIt is known to work on all arches of NetBSD 1.3 and later, Solaris 2.51 and 2.6, AIX 4.3 and 4.2.1 and most versions of Linux. Reports of other sucesses (and any patches needed) would be greatly appreciated.

ClusterIt is Free software, with a standard BSD-style License. You are encouraged to download this, work with it, enhance it, or whatever suits your needs. Redistribution can take place if the license stays intact.

Please send any bug reports, enhancements, bricks to: Tim Rightnour


SourceForge.net Logo