Tuesday 30 September 2014

Installing Intel TrueScale Fabric HCA Host Software

Background: I have a Haswell system with CentOS 7.0 on which I would like to have InfiniBand software installed.

Removing Previous OFED installation

Before installing the Intel OFED, get the latest OFED from OpenFabrics Alliance. Run the install.pl script to uninstall the previously installed OFED software.

After clean uninstall, reboot the machine. Manually removing any remaining ib modules after reboot.
For example, I would do:
$ rmmod ib_qib
$ rmmod ib_mad
$ rmmod ib_core

Next, install the OFED software by running
$ ./install.pl -k <kernel-version> -s /lib/modules/<kernel-version>/build --umad-dev-rw

Reboot.

Installing the Intel OFED Software

Grab the installer from Intel Download Center

Since I am using CentOS 7, I will choose IntelIB-Basic.RHEL7-x86_64.7.3.0.0.26.tgz

Run the INSTALL script provided with the package with --force option (since the script will complain due to my OS being CentOS 7).

OFED components that I installed include:
  • IpoIB and MPI over uDAPL
  • ib_qib #Intel TrueScale cards
  • libibumad
  • OpenSM #Subnet Manager
  • SDP (Socket Driver Protocol)
  • SRP
  • Perftest
  • Intel MPI
  • Debug info
NOTE: Do not install iWARP.

Installing Intel TrueScale Fabric HCA Host Software

Background: I have a Haswell system with CentOS 7.0 on which I would like to have InfiniBand software installed.

Removing Previous OFED installation

Before installing the Intel OFED, get the latest OFED from OpenFabrics Alliance. Run the install.pl script to uninstall the previously installed OFED software.

After clean uninstall, reboot the machine. Manually removing any remaining ib modules after reboot.
For example, I would do:
$ rmmod ib_qib
$ rmmod ib_mad
$ rmmod ib_core

Next, install the OFED software by running
$ ./install.pl -k <kernel-version> -s /lib/modules/<kernel-version>/build --umad-dev-rw

Reboot.

Installing the Intel OFED Software

Grab the installer from Intel Download Center

Since I am using CentOS 7, I will choose IntelIB-Basic.RHEL7-x86_64.7.3.0.0.26.tgz

Run the INSTALL script provided with the package with --force option (since the script will complain due to my OS being CentOS 7).

OFED components that I installed include:
  • IpoIB and MPI over uDAPL
  • ib_qib #Intel TrueScale cards
  • libibumad
  • OpenSM #Subnet Manager
  • SDP (Socket Driver Protocol)
  • SRP
  • Perftest
  • Intel MPI
  • Debug info
NOTE: Do not install iWARP.

Installing Intel TrueScale Fabric HCA Host Software

Background: I have a Haswell system with CentOS 7.0 on which I would like to have InfiniBand software installed.

Removing Previous OFED installation

Before installing the Intel OFED, get the latest OFED from OpenFabrics Alliance. Run the install.pl script to uninstall the previously installed OFED software.

After clean uninstall, reboot the machine. Manually removing any remaining ib modules after reboot.
For example, I would do:
$ rmmod ib_qib
$ rmmod ib_mad
$ rmmod ib_core

Next, install the OFED software by running
$ ./install.pl -k <kernel-version> -s /lib/modules/<kernel-version>/build --umad-dev-rw

Reboot.

Installing the Intel OFED Software

Grab the installer from Intel Download Center

Since I am using CentOS 7, I will choose IntelIB-Basic.RHEL7-x86_64.7.3.0.0.26.tgz

Run the INSTALL script provided with the package with --force option (since the script will complain due to my OS being CentOS 7).

OFED components that I installed include:
  • IpoIB and MPI over uDAPL
  • ib_qib #Intel TrueScale cards
  • libibumad
  • OpenSM #Subnet Manager
  • SDP (Socket Driver Protocol)
  • SRP
  • Perftest
  • Intel MPI
  • Debug info
NOTE: Do not install iWARP.

Monday 15 September 2014

Setting up a two-node HPC cluster with InfiniBand

In this post, I would share how to build a two-node cluster with InfiniBand (IB) interconnection.

I have installed CentOS 7.0 (server with GUI version) with IB and iWARP support.

To install the necessary software

The following are to be done on every node, unless stated otherwise.

$ yum groupinstall "Infiniband Support"
$ yum install infiniband-diags perftest qperf opensm

OpenSM is the subnet manager.
 
Extra steps (optional):

Edit /etc/default/opensm such that
$ cat /etc/default/opensm
PORTS="0x00117500007005aa 0x0011750000700c2a"

The port GUID can be obtained by doing
$ ibstat -p

Activate the services:
$ chkconfig rdma on
$ chkconfig opensm on #only on master node

$ service rdma start
$ service opensm start #only on master node
$ shutdown -r now

After services are started / reboot, hopefully ibstat will show State "Active" and Physical State "LinkUp".

To check network connectivity

Display all switches
$ ibswitches

Display all hosts visible in the network
$ ibhosts

Reports link info
$ iblinkinfo

Testing with ibping
$ ibping -S #on one server
$ ibping -G 0x0011750000700c2a #on another server

Replace the port GUID above with yours.

Sunday 14 September 2014

Setting up a two-node ethernet cluster

In this post I would like to share on how to set up a two-node cluster.

First: Configuring the ethernet connection

For every node, we have to configure the network script for the ethernet card, /etc/hosts, /etc/hostname, /etc/resolv.conf, /etc/hosts.allow (and /etc/hosts.deny), and the iptables. The goal is the following:
  • Assign private static IPs to every node.
  • Assign hostnames to every node.
  • Assign a virtual gateway and nameserver
  • Allowing communication between the nodes through iptables (if applicable) and hosts.allow.

Let's say we have two nodes and we would like to name them node1 (head node) and node2.

Note: On node1 (chosen as the head node), enp3s0f0 is connected to the internet. So I used enp3s0f1 for the local network.

Tuesday 9 September 2014

Compiling Linux Kernel 3.16.1 for CentOS 7.0 on Haswell Server

Background: I would like to upgrade the existing kernel for CentOS 7.0 (3.10.0-123.6.3.el7.x86_64) to 3.16.1 (latest kernel from kernel.org - not available in repo yet).


Download the latest kernel from www.kernel.org into /usr/src/kernels. Untar and change into the kernel directory (in this case is linux-3.16.1).

Install the necessary packages:
$ yum groupinstall "Development Tools"
$ yum install ncurses ncurses-devel

Do a proper cleanup:
$ make mrproper

Copy current kernel config (in /boot) to .config (in working dir) as a base to use.
Note that 'make mrproper' deletes .config
$ cp /boot/config-3.10.0-123.el7.x86_64 .config

Edit the configuration further if needed:
$ make menuconfig

To adjust compiling to the number of your CPU cores (for faster compilation time):
$ export CONCURRENCY_LEVEL=`getconf _NPROCESSORS_ONLN`

Compile and build the rpms
$ make rpm

Install the kernel
$ rpm -ivh /root/rpmbuild/RPMS/x86_64/kernel-2.6.32.27-1.x86_64.rpm


This should set up the initrd and grub settings. If not, manually do these:
$ mkinitrd /boot/initrd-3.16.1.img 3.16.1
$ grubby --add-kernel=/boot/vmlinuz-3.16.1  --initrd=/boot/initramfs-3.16.1.img --title="CentOS Linux 7.0 3.16.1" --make-default --copy-default