Introduction

Setting up the cluster uses the following steps

  1. Add CDROM drive to master node (optionally we can set up the master node to boot using PXE)
  2. Setting up a netboot environment using PXE. We need to configure DHCP and add tftpd
  3. Booting the slave nodes by PXE and set up for fully automated installation

We follow the following guides to setup the linux cluster

  1. master node setup
  2. slave node setup
  3. automated installation

Setting up the Master Node

The Box cluster machine does not have a CD-ROM drive. One option is to attach a CD/DVD drive to the SATA slot. The other option is to boot this node through netboot (PXE). To do this, we should use another machine. In our case, we use the first option - viz. adding a CD-ROM drive.

Software that we need for a basic setup

We are using Ubuntu 12.04 for the cluster. As a first step, we install the server version of Ubuntu to the master node. Additionally we install the following software apt-get install ssh dhcp3-server ruby1.9.1 nfs-common nfs-kernel-server tftpd-hpa apache2 apt-cacher

The purpose and configuration of these software is described in detail here.

Setting up the Slave nodes

The setup of the slave nodes includes mainly two steps: a) keyless SSH access and b) NFS mouting of the home directory from the master node. We also develop some scripts to manage users and software packages for all the nodes from the master node.

Keyless SSH access

Login to each node and set the root passwd using the following steps

sudo su -
passwd

The next step is from the master node. First login as root and create an ssh key. Then copy the .ssh/authorized_keys to all the nodes using ssh. This can be done with the following set of commands

su -
ssh-keygen
cp .ssh/id_pub .ssh/authorized_keys
for NODE in `cat /etc/machines`
do
    rsh $NODE mkdir .ssh
    rcp .ssh/authorized_keys $NODE:.ssh/authorized_keys
done

Keyless access for users

Users can benefit from keyless access by performing the following steps

ssh-keygen
cp .ssh/id_pub .ssh/authorized_keys

Managing the system

The system is managed through a software called puppet. This makes the management of different nodes uniform and easy. Puppet can be used to add new software or configure different aspects of the system from a single file.

Add a new software using puppet

Let us assume that we want to add the editor vim in all the nodes. To do this,perform the following steps as superuser

vi /etc/puppets/manifests/nodes.pp
Edit the file and add a line that says "include vim"
mkdir -p /etc/puppets/modules/vim/{files,templates,manifests}
vi /etc/puppets/modules/vim/manifests/init.pp

In the above file, add the following lines

class vim {
  package { vim:
    ensure => present,
  }
}

That is it! Puppet agents are running on each of the nodes. Every 30 minutes, they will check with the master node to see if there is a new request. If there is a new request, it will perform that request in each of the nodes. In our case, it will install the package vim in all the nodes, including the master node.

Let us add some programming languages and related software using puppet. To do this let us define a module called “programming” as in the following

Edit /etc/puppets/manifests/nodes.pp and add a line called
    "include programming"
    mkdir -p /etc/puppets/modules/programming/{files,templates,manifests}
    vi /etc/puppets/modules/programming/manifests/init.pp

In the above file, add the following to install gcc, gfortran and openmpi

    class programming {
      package { gcc:
        ensure => present,
      }

      package {gfortran:
         ensure => present,
      }
      package {openmpi-bin:
         ensure => present,
      }
    }

It is as easy as that! If you do not want to wait till the puppet agents wake up the next time, please do the following

sudo cluster_run_puppet

This will run the puppet agents in all the nodes and install the software for you right away.

Managing users

Please do not manage users on the system using the regular utilities such as useradd etc. Instead please use

/usr/local/sbin/cluster_useradd # to add a new user
/usr/local/sbin/cluster_passwd # to change password of a given user
/usr/local/sbin/cluster_run_puppet # to run the puppet agents in all the nodes for instant reconfiguration of the nodes
   
/usr/local/sbin/check_nodes # to see which nodes are alive